You are on page 1of 362

Introduction: Engineering problems and computational methods

• 20 cm x 30 cm rectangular plate
• Initially at temperature f(x,y). Then, insulated at x=0, zero
temperature at y=0, and convective heat transfer at x=20 and
at y=30
• Find temperature variation with time at the center

Experimental: Advantages: No equations, Arbitrary Shape


Issues: Size, Material, Time, f(x,y)
∞ ∞
( )
cos(β i x )sin (γ j y )
T ( x, y, t ) = ∑∑ cij e
∂T  ∂ 2T ∂ 2T  −α β i2 +γ 2j t
Analytical : ∂t
= α 
 ∂ x 2
+
∂ y

2 

i =1 j =1
Adv: Fast, Adaptable, Accurate; Issues: Ideal conditions
Introduction: Engineering problems and computational methods
Numerical Method: Reduce to algebraic equations
E.g.,

i,j+1
i-1,j i,j i+1,j
i,j-1

t + ∆t t
∂T Ti , j − Ti , j
=
∂t ∆t
Ti +1, j − Ti , j Ti , j − Ti −1, j
− Ti +1, j − 2Ti , j + Ti −1, j
∂ 2T ∆x ∆ x
2
= =
∂x ∆x ∆x 2
Introduction: Engineering problems and computational methods

Results in a set of linear (or nonlinear, if α is a function of


temperature) equations

Adv: Arbitrary shape, initial and boundary conditions; Usually


fast.
Issues: Convergence, Accuracy, Efficiency
Error Analysis: Round-off and Truncation errors
Backward and Forward error analysis
All computational results have errors: Computers have finite
storage length and computations cannot be carried out
infinitely (We do not consider Model error and Data error)

Round-off Error: Due to finite storage length. E.g., with 8


significant digits: √2 is stored as 1.4142136 (actually as
0.14142136x101), an error of about 3x10−6 percent.

Truncation Error: Due to finite steps in computation. E.g., exp(2)


is computed using the infinite series, truncated after a finite
number of terms. True value is 7.389056098930650. 1 term –
1; 2 terms–3, 3–5, 4–6.333, 5-7, 6-7.356,…,15-7.3890561.
Error Analysis: Round-off and Truncation errors
Backward and Forward error analysis
Round-off Error: Considering decimal storage, real numbers are
stored as mantissa (m) x 10 power (p) with 0.1≤ m <1. If mantissa
rounded-off at the tth digit, maximum error = 0.5x10−t.
Maximum relative error = 5x10−t called the round-off unit, u.

Definitions:
True Error (e) = True Value − Approximate Value
Approximate Error (ε) = Current Approximation − Previous Approximation
(e.g., x=x/2+1/x, x=1,1.5, 1.4166667,1.4142157,1.4142136)
True Relative Error (er) = (True Value − Approximate Value) / True Value
Approximate Relative Error (εr) = (Current Approximation − Previous
Approximation) / Current Approximation
Error Propagation

y is a function of n independent variables, x1,x2,…,xn, i.e. vector x


y=f(x)
Due to error in the variables (∆xi = xi − ~xi ), there is error in y
~
y = f (~
x)
~ ~ ∂f n
Error: ∆y = y − y = f ( x ) − f ( x ) = ∑ ∆xi
i =1 ∂xi ~
xi

(Neglecting higher order terms in the Taylor’s series)


Error Propagation
2h
g= 2
t
Height measurement accurate to a mm, time to 100th of sec.
Error in g, if h=100.000 m and t=4.52 s?
∂g 2 ∂g 4h
= 2; =− 3
∂h t ∂t t
~
2 4h
∆g ≅ ~ 2 ∆h − ~ 3 ∆t
t t
= 0.0978933∆h − 4.33156∆t
Max value 0.0434 (Relative error is more meaningful, roughly
0.4%.)
Error Propagation and Condition Number of a problem
Condition number: Ratio of the relative change in the function,
f(x), to the relative change in the variable for a small
perturbation ∆x in the variable x.
f ( x + ∆x ) − f ( x )
f (x ) xf ′( x )
CP = or, as ∆x → 0,
∆x f (x )
x

Well-Conditioned (<1) and Ill-conditioned (>1)


E.g.: Hyperbola xy−y+1=0. Measure x, compute y=f(x)=1/(1−x).
CP = |x/(1-x)| Ill-conditioned for x>0.5 (x=0.8, y=5; x=0.808,y=5.2083)
Similarly, y = 1 + x − 1 for small x ;
CP = x close to but <1. (x=0.01, y=.004988; x=0.0101,y=.005037)
(
2 1+ x − 1+ x )
Error Analysis: Forward and Backward error analysis

Forward Error: requires the T.V.


(Recall that True Error (e) = True Value − Approximate Value)
E.g.: √2 = 1.414213562373095. With 4 digit accuracy (u=5x10−4),
approx. value = 1.414. True error, e = 2.134x10−4. True relative
error = 0.015%.
Backward Error: determines the change necessary in the input
variable (or data) to explain the final error in the output (or
result) obtained through errors introduced at various stages of
calculation.
E.g.: 1.4142=1.999396. Error=6.04x10−4, Relative error = 0.03%.
y = 1 + x − 1 with x=0.001616 gives y = √1.002 − 1 =
0.0009995, which is exact for x=0.002 (error=0.000384, 24%)
Condition Number of an algorithm
• Condition number: Exact input data, x; Corresponding
function value y; Algorithm A operating on x produces yA on a
machine with precision unit u. Input data, xA, which would
produce y, if exact computations are done. Condition number
of the algorithm x − xA
x
CA =
u
Well-Conditioned (<1) and Ill-conditioned (>1)
y = 1 + x − 1 for x =0.001616, we get xA=0.002.
0.001616 − 0.002
0.001616
CA = −4
= 475
5 ×10
x x=0.001616, y=0.0008076,
y = 1+ x −1 =
1+ x +1 xA=0.00161585, CA=0.18
Error Analysis: Vector Norms
What if the computed result is a vector?

[A]{x} = {b}
Error vector
{ε } = {x}− {~x }

Or, relative error vector


x − ~
x
{ε r } =  
 x 
Magnitude: In terms of the “Norm” of the vector
Error Analysis: Vector Norms
Some properties of the vector norms are (α is a scalar):
• x = 0 , only if x is a null vector, otherwise x > 0

• αx = α x

• x1 + x2 ≤ x1 + x2

Magnitude: The Lp norm of an n-dimensional vector, {x}, i.e.,


(x1,x2,…,xn) T, is given by

x p
( p p p
= x1 + x2 + x3 + ... + xn )
p 1/ p
p ≥1
Common Vector Norms
• p=2, Euclidean norm: (length) x 2 = x12 + x22 + ... + xn2

n
• p=1, (total distance ) x 1 = ∑ xi
i =1

• p=∞, Maximum norm x ∞


= max xi
i
Roots of Nonlinear
Equations
Roots of Equations
 

Not trivial to determine


the roots!
Roots of Non-Linear Equations:
f(x) = 0
f may be a function belonging to any class: algebraic, trigonometric, hyperbolic,
polynomials, logarithmic, exponential, etc.

Five types of methods can broadly be classified:


 Graphical method
 Bracketing methods: Bisection, Regula-Falsi
 Open methods: Fixed point, Newton-Raphson, Secant, Muller
 Special methods for polynomials: Bairstow’s
 Hybrid methods: Brent’s

Background assumed (MTH 101): intermediate value theorem; nested interval


theorem; Cauchy sequence and convergence; Taylor’s and Maclaurin’s series; etc.

5
Graphical Method
Involves plotting f(x) curve and finding the
solution at the intersection of f(x) with x-axis.

6
Bracketing Methods
 Intermediate value theorem: Let f be a continuous fn on [a, b]
and let f(a) < s < f(b), then there exists at least one x such that
a < x < b and f(x) = s.
 Bracketing methods are application of this theorem with s = 0
 Nested interval theorem: For each n, let In = [an, bn] be a
sequence of (non-empty) bounded intervals of real numbers
such that and ,
then contains only one point.
 This guarantees the convergence of the bracketing methods to the root.
 In bracketing methods, a sequence of nested interval is generated
such that each interval follows the intermediate value theorem with s
= 0. Then the method converges to the root by the one point specified
by the nested interval theorem. Methods only differ in ways to
generate the nested intervals.
7
Intermediate Value Theorem

8
Mean Value Theorem
If f(x) is defined and continuous on the interval [a,b] and
differentiable on (a,b), then there is at least one number c in the
interval (a,b) (that is a < c < b) such that

A special case of Mean


Value Theorem is Roll’s
Theorem when
Bisection Method
 Principle: Choose an initial interval based on intermediate value
theorem and halve the interval at each iteration step to generate
the nested intervals.
 Initialize: Choose a0 and b0 such that, f(a0)f(b0) < 0. This is done
by trial and error.
 Iteration step k:
 Compute mid-point mk+1 = (ak + bk)/2 and functional value f(mk+1)
 If f(mk+1) = 0, mk+1 is the root. (It’s your lucky day!)
 If f(ak)f(mk+1) < 0: ak+1 = ak and bk+1 = mk+1; else, ak+1 = mk+1 and
bk+1 = bk
 After n iterations: size of the interval dn = (bn – an) = 2-n (b0 – a0),
stop if dn ≤ ε
 Estimate the root (x = α say!) as: α = mn+1 ± 2-(n+1) (b0 – a0)
10
Bisection Method

11
Regula-Falsi or Method of False Position
 Principle: In place of the mid point, the function is assumed to be
linear within the interval and the root of the linear function is chosen.
 Initialize: Choose a0 and b0 such that, f(a0)f(b0) < 0. This is done by
trial and error.
 Iteration step k:
 A straight line passing through two points (ak, f(ak)) and (bk, f(bk)) is
given by:

 Root of this equation at f(x) = 0 is:


 If f(mk+1) = 0, mk+1 is the root. (It’s your lucky day!)
 If f(ak)f(mk+1) < 0: ak+1 = ak and bk+1 = mk+1; else, ak+1 = mk+1 and bk+1 =
bk
 After n iterations: size of the interval dn = (bn – an), stop if dn ≤ ε
 Estimate the root (x = α say!) as:
12
Regula-Falsi or Method of False Position

y = f(x)

f(ak)

mk+1

bk
ak
f(bk)

13
Example Problem

 True solution = 0.5671
 Set up a scheme as follows:
Itr. xl f(xl) xu f(xu) xk f(xk) er(%)
0 0 2 0.6981 186

1 0 0.6981 0.5818 20
2 0 0.5818 0.5687 2.2

14
Example Problem

Find th
, x is in m.

A. Solve by Bisection Method


 Initial guess: xL = 0 m; xU = 2R = 0.11 m Functions.xlsx
 11 iterations to converge to x = 0.062 m
B. Solve by Regula Falsi Method
 Initial guess: xL = 0 m; xU = 2R = 0.11 m Functions.xlsx
 4 iterations to converge to x = 0.062 m 15
Bracketing Methods:

 Convergence to a root is guaranteed (may not get all
the roots, though!)
 Simple to program
 Computation of derivative not needed
 Disadvantages
 Slow convergence
 For more than one roots, may find only one solution
by this approach.
16
Open Methods: Fixed Point
 Problem: f(x) = 0, find a root x = α such that f(α) = 0
 Re-arrange the function: f(x) = 0 to x = g(x)
 Iteration: xk+1 = g(xk)

 Stopping criteria:

 Convergence: after n iterations,


 At the root: α = g(α) or α - xn+1 = g(α) - g(xn)

 Mean Value Theorem: for some

(α - xn+1) = gʹ(ξ)(α - xn) or en+1 = en or


 Condition for convergence: │ │<1

 As , = constant
17
Open Methods: Fixed Point

y = g(x)
y=x y = x y = g(x)

x3 x2 x1 x0 x3 x2 x1 x0
Root α Root α

18
Example Problem

 True solution = 0.5671
 Set up a scheme as follows:
 Iterations x g(x) er (%)
1 0 1 100
2 1 0.3678 172
3 0.3678 0.6922 47
4 0.6922 0.5004 38
5 0.5004 0.6064 17

19
Example Problem

Find th
, x is in m.

A. Solve by Fixed Point Method


 Rearrange:
 Initial guesses: x = 0.001, 0.01, 0.003 m DO NOT
CONVERGE Functions.xlsx
 Initial guess: x = 0.08 m, it takes 5 iterations to converge
to x = 0.062 m
20
Truncation Error:
- error committed when a limiting process is truncated before one
has reached the limiting value

- e.g. )Δ

- Taylor series expansion of a function

! !

! !
Taylor polynomial approximation:

If the remainder is omitted, the right side of equation is the Taylor


polynomial approximation to f(x).

Zeroth order approximation

First order approximation

Second order approximation


Taylor’s Theorem:

If a function can be differentiated (n+1) times in an interval,


I, containing , then, for each in I, there exist between and
such that

(Theorem of Mean for Integrals)


Roots of Nonlinear Equations: Order of Convergence
• Denote by x(i), the estimate of the root at the ith iteration
• If an iterative sequence x(0), x(1), x(2) … converges to the root ,
(which means the true error at iteration i is e ( i )    x (i ) ) and if
e (i 1)
lim C
i  (i ) p
e
then p is the order of convergence and C the asymptotic error
constant. The convergence is called linear if p=1, quadratic if
p=2, and cubic if p=3 (p does not have to be an integer).
• For bisection method, using approx. error as a proxy for true
error (as we approach the root, these tend to be the same)
e ( i 1) 1

e(i ) 2
Implying linear convergence and error constant of 0.5
Roots of Nonlinear Equations: Order of Convergence

• For linearly convergent methods: C must be less than 1


• For superlinearly convergent methods (p>1): Not necessary
• Convergent methods: Approximate Error at any iteration is
representative of True Error at the previous iteration

x ( i 1)
x (i )
 x (i )

• For the bisection method, the approximate error at the nth


iteration is x ( 0 )
2n
which means the number of iterations for a desired accuracy are
known a priori.
Open Methods: Newton-Raphson
 Problem: f(x) = 0, find a root x = α such that f(α) = 0
 Principle: Approximate the function as a straight line
having same slope as the original function at the point of
iteration.

f(x0)

f(x1)

x0
y = f(x)
x1 x2
26
Open Methods: Newton-Raphson
 Problem: f(x) = 0, find a root x = α such that f(α) = 0
 Iteration Step 0: Taylor’s Theorem at x = x0

! !

!
, for some
 Iteration Step k:

! !
, for some

 Assumptions: Neglect 2nd and higher order terms and assume that the root is
arrived at the (k+1)th iteration, i.e., f(xk+1) = 0

 Iteration Formula:

 Start with x = x0 (Guess!) Stopping criteria:

27
Open Methods: Newton-Raphson
 Convergence: Taylor’s Theorem after n iterations,

for some

 Re-arrange: !

or !
or !

 As , = constant

28
Open Methods: Newton-Raphson

Advantages:
y = f(x)
Faster convergence f(x0)
(quadratic) x1
x0
Disadvantages: f(x1)

Need to calculate
derivate

Newton-Raphson
method may get stuck!
29
Open Methods: Secant
 Principle: Use a difference approximation for the slope
or derivative in the Newton-Raphson method. This is
equivalent to approximating the tangent with a secant.
 Problem: f(x) = 0, find a root x = α such that f(α) = 0

f(x0)

f(x1)
f(x2)
x0 x1 x2
y = f(x)
x3 30
Open Methods: Secant
 Problem: f(x) = 0, find a root x = α such that f(α) = 0

 Initialize: choose two points x0 and x1 and evaluate f(x0) and f(x1)

 Approximation: , replace in Newton-


Raphson

 Iteration Formula:

 Stopping criteria:

31
Error Analysis: Vector Norms
What if the computed result is a vector?

[A]{x} = {b}
Error vector
{ε } = {x}− {~x }

Or, relative error vector


x − ~
x
{ε r } =  
 x 
Magnitude: In terms of the “Norm” of the vector
Error Analysis: Vector Norms
Some properties of the vector norms are (α is a scalar):
• x = 0 , only if x is a null vector, otherwise x > 0

• αx = α x

• x1 + x2 ≤ x1 + x2

Magnitude: The Lp norm of an n-dimensional vector, {x}, i.e.,


(x1,x2,…,xn) T, is given by

x p
( p p p
= x1 + x2 + x3 + ... + xn )
p 1/ p
p ≥1
Common Vector Norms
• p=2, Euclidean norm: (length) x 2 = x12 + x22 + ... + xn2

n
• p=1, (total distance ) x 1 = ∑ xi
i =1

• p=∞, Maximum norm x ∞


= max xi
i
Taylor’s Series: Truncation Error
h2 h m [m] h m +1 [ m +1]
f ( x0 + h ) = f (x0 ) + hf ′(x0 ) + f ′′(x0 ) + ... + f ( x0 ) + f (ζ )
2! m! (m + 1)!
ζ ∈ (x0 , x0 + h )

• E.g.: Compute e0.1 (T.V. = 1.105170918075648)


• Use x0=0, h=0.1.
– First-order approx. (m=1), 1+h = 1.1, Error = 0.00517
• From Taylor’s series, residual = 0.005 eζ (range 0.005 to 0.0055)
– Second-order approx. (m=2), 1+h+h2/2 = 1.105, Error =
0.000171
• From Taylor’s series, residual = 0.000167 eζ (range 0.000167 to
0.000184)
Linear Interpolation: Error Analysis
•Linear interpolation between two points, xl and xu,

fu − fl f ′′(ζ )
f (x ) = f (xu ) + (x − xu ) + (x − xu )(x − xl ) ; x, ζ ∈ ( xl , xu )
xu − xl 2
• If we assume the function to be uniformly concave/convex,
one end of the interval, xu(or xl), remains fixed, say, x(0), and

(0) (i )
( i +1) (0) x −x
x =x − f0
f0 − fi
Linear Interpolation: Error Analysis
From the interpolation equation, applied at x=ξ:
( i ) f ′′(ζ 1 )
(
f (ξ ) = 0 = f 0 + ξ − x ( 0 ) )xf− fi
0
(0)
− x (i )
+ ξ ( − x (0)
ξ)(
− x ) 2
; ζ 1 ∈ x (
(i )
, x (0)
)
x (0) − x (i ) ( 0 ) ( i ) f ′′(ζ 1 )
⇒ f0
f0 − fi
(0)
= −e − e e
2 f ′(ζ 2 )
(
; ζ 1 , ζ 2 ∈ x (i ) , x (0) )
And, from the iteration equation

( 0 ) ( i ) f ′′(ζ 1 )
(0) (i )
x − x
ξ − x (i +1) = ξ − x + f0
(0)
⇒e ( i +1)
= −e e
f0 − fi 2 f ′(ζ 2 )

In the limit, the iterations converge to the root, and


e (i +1) ( 0 ) f ′′(ζ 3 )
lim (i ) = e
i →∞ e 2 f ′(ζ 4 )
(
; ζ 3 , ζ 4 ∈ ξ , x (0) )
Previous Lecture: Nonlinear Equations

• Linear Interpolation method: Error Analysis


• Modified False position Method
• Fixed Point Method
• Newton Method

Today:
• Newton
1000 Method: Comments and Example
• Secant800 Method

600
• Muller Method
f(x) 400
• Bairstow
200 Method
0
0 0.5 1 1.5 2
-200
x
Secant Method: Algorithm
f(x)

fi-1
fi

xi+1 xi xi-1
ξ x

(i ) ( i −1)
x − x
x (i +1) = x (i ) + (− f i )
f i − f i −1
Difference from False Position method?
Not necessarily bracketing the root !
Secant Method: Iterations

Generally, x(i-1) is discarded, when x(i+1) is computed. Sometimes,


the point out of i-1, i, i+1, which has largest function magnitude
is discarded.
Secant Method: Example
3

Find the root near −1, starting with x(-1)=−2, x(0)=−1 2

f(x) = x 3 − 1.25020000 x 2 − 1.56249999 x + 1.95343750 = 0


1

(i ) ( i −1)
x −x
-2 -1 0 1 2
-1
( i +1) (i )
Iteration scheme : x =x + (− f i ) -2

f i − f i −1 -3

i x(i) f
(i+1)
x εa (%)
-2 -7.92236
0 -1 1.265737 -1.13776 12.10787
1 -1.13776 0.639987 -1.27865 11.01884
2 -1.27865 -0.18321 -1.24729 2.513998
3 -1.24729 0.016878 -1.24994 0.211612
4 -1.24994 0.000382 -1.25 0.004896
Secant Method: Error Analysis
• Similar to Linear interpolation, applied at the root, ξ,
f i − f i −1 ( i −1) f ′′(ζ )
(
0 = fi + ξ − x (i )
) (i )
x −x ( i −1)
+ ξ (
− x (i )
)(
ξ − x
2
) ; ζ ∈ x ( i −1)
(
, x (i )
,ξ )
(i ) ( i −1)
• Iteration ( i +1) (i ) x − x
x =x + (− f i )
f i − f i −1

( i ) ( i −1) f ′′(ζ 1 )
(i ) ( i −1)
x − x
• Error: ξ − x (i +1) = ξ − x + fi
(i )
⇒e ( i +1)
= −e e
f i − f i −1 2 f ′(ζ 2 )
ζ 1 ∈ (x (i −1) , x (i ) , ξ ); ζ 2 ∈ (x (i −1) , x (i ) )
•Recall: e ( i +1) (i ) p e(i )
1/ p

lim = C ; ⇒ e ( i +1) = C e and e ( i −1) =


i →∞ (i ) p C
e
Secant Method: Error Analysis
• As the iterations approach the root

( i ) 1/ p
f ′′(ξ ) f ′′(ξ )
1 1
(i ) p (i ) e 1+
p (i ) p −1−
Ce =e ⇒C e p =
C 2 f ′(ξ ) 2 f ′(ξ )
• Therefore, p − 1 − 1/p = 0 => p=1.618 (Golden Ratio)

and
f ′′(ξ )
0.618

C=
2 f ′(ξ )

• Better than bisection and false position


• Not as good as Newton
• Can we improve the order by using three points and
approximating the function by a quadratic instead of linear?
Muller Method: Algorithm
f(x)

fi-2

fi-1
fi
x

ξ xi+1 xi xi-1 xi-2

( i +1) (i ) (i )
x = x + ∆x

∆x(i) is obtained by interpolating a quadratic function through the


three points (i, i-1, i-2) and finding its intersection with the x-axis
Muller Method: Algorithm
• It can be shown that
2
(i ) ± b − 4ac − b 2c
∆x = =−
2a 2
b ± b − 4ac
• Where
f i − f i −1 f i −1 − f i − 2
(i ) ( i −1)
− (i −1) (i −2)
a= x − x x − x
(i ) (i −2)
x −x
f i − f i −1
b = (i )
x − x (i −1)
+ a (
x (i )
− x ( i −1)
)
c = fi
• Two roots, choose the smaller magnitude (i.e., +/- based on sign(b))
Muller Method: Example
3

Find the root near −1, starting with x(-2)=−2, x(-1)=−1, x(0)=0 2

f(x) = x 3 − 1.25020000 x 2 − 1.56249999 x + 1.95343750 = 0


0
-2 -1 0 1 2
-1

2c
-2

-3

( i +1) (i )
Iteration scheme : x =x −
2
b ± b − 4ac
i x(i) f a b c ∆x ∆x2 x(i+1) εa (%)
-2 -7.922
-1 1.2657
0 0 1.9534 -4.25 -3.562 1.9534 0.3779 -1.216 0.3779
1 0.3779 1.2383 -1.872 -2.6 1.2383 0.375 -1.764 0.753 49.808
2 0.753 0.495 -0.119 -2.027 0.495 0.2408 -17.23 0.9938 24.233
3 0.9938 0.1474 0.8745 -1.233 0.1474 0.1319 1.2778 1.1257 11.718
4 1.1257 0.0368 1.6223 -0.625 0.0368 0.0725 0.3126 1.1982 6.05
5 1.1982 0.0066 2.0675 -0.266 0.0066 0.0335 0.0953 1.2317 2.7176
6 1.2317 0.0008 2.3054 -0.095 0.0008 0.0131 0.028 1.2447 1.0488
7 1.2447 7E-05 2.4244 -0.027 7E-05 0.0042 0.0071 1.2489 0.3342
Muller Method: Error Analysis
• Quadratic interpolation, applied at the root, ξ,
f i − f i −1 f i −1 − f i − 2

f i − f i −1
(
0 = fi + ξ − x (i )
)
x ( i ) − x ( i −1)
+ ξ −( x (i )
ξ )(
− x ( i −1)
) x (i ) − x (i −1) x (i −1) − x (i − 2 )
x (i ) − x (i −2)
f ' ' ' (ζ )
( )( )(
+ ξ − x (i ) ξ − x (i −1) ξ − x (i − 2 ) ) 6
(
; ζ ∈ x (i − 2 ) , x (i −1) , x (i ) , ξ )
• From the Iteration scheme, it can be shown (as before) that
f ′′′(ξ )
( i +1) ( i ) ( i −1) ( i − 2 )
e = −e e e
6 f ′(ξ )
• Order of convergence: p3 − p2 − p − 1=0 => p=1.839
f ′′′(ξ )
0.4196
and
C=
6 f ′(ξ )
• Better than Secant
Roots of polynomials: General
• Polynomial equations are very common in Eigenvalue
problems and approximations of functions
• Any of the methods discussed so far should work
• After finding one root, we may deflate the polynomial and
find other roots successively
• If some roots are complex, we may run into problems. This
may happen even with polynomials which have all real
coefficients.
• We will look at polynomials with real coefficients only
• The complex roots will occur in conjugate pairs, implying that
a quadratic factor with real coefficients will be present
(x − [a + ib])(x − [a − ib]) = x 2 − 2ax + a 2 + b 2
• Bairstow Method
Roots of polynomials: Bairstow Method
• Find a quadratic factor of the polynomial f(x) as x
2
− α1 x − α 0
• Find the two roots (real or complex conjugates) as

(
r1, 2 = 0.5 α1 ± α12 + 4α 0 )
n
• Algorithm: Express the given function as f ( x) = ∑ c j x j
j =0

• Perform a synthetic division by the quadratic factor


) (
x 2 − α1 x − α 0 cn x n + cn −1 x n −1 + cn − 2 x n − 2 + cn −3 x n −3 + ... + c1 x + c0 cn x n − 2 +(cn-1 + α1cn )x n −3 + ..
cn x n − α1cn x n −1 − α 0 cn x n − 2
(cn-1 + α1cn )x n−1 + (cn- 2 + α 0cn )x n−2 + cn−3 x n−3
(cn-1 + α1cn )x n−1 + α1 (cn-1 + α1cn )x n−2 + α 0 (cn-1 + α1cn )x n−3
Bairstow Method: Algorithm
• For writing a recursive algorithm, we express the quotient as a
polynomial of degree n−2 and the remainder as linear
n n−2
j
(
f ( x ) = ∑ c j x = x − α1 x − α 0
2
) ∑ (d j +2 x)j
+ d1 ( x − α1 ) + d 0
j =0 j =0

• Equating the coefficients of different powers of x, we get


d n = cn
d n −1 = cn −1 + α1d n
d j = c j + α1d j +1 + α 0 d j + 2 for j = n − 2 to 0

• The target is to choose α0 and α1 in such a way that d0 and d1


become zero
• Iterative solution using Newton method
Linear Interpolation: Error Analysis

• Linearly convergent
• Asymptotic error constant is not constant (was ½ for bisection),
but depends on the nature of the function
(0) f ′′(ζ 3 )
C=e
2 f ′(ζ 4 )

For the function used in the example,


f ( x) = x 3 − 1.2500000 x 2 − 1.5625250 x + 1.9530938

The iterations show one end fixed at −2, the root is −1.25; e(0)=0.75.
|f”/2f’| varies from 0.5 to 0.8 over (−2, −1.25), indicating C
between 0.35 and 0.6.
Newton Method: Example
3

•Find the root near −1, starting with x(0)=−1 2

f(x) = x 3 − 1.25020000 x 2 − 1.56249999 x + 1.95343750 = 0


1

(i )
f (x )
-2 -1 0 1 2
-1
( i +1) (i )
Iteration scheme : x =x − -2

f ' ( x (i ) ) -3

f ' ( x) = 3 x 2 − 2.50040000 x − 1.56249999


Iteration, i x(i) f f' x(i+1) εa (%)
0 -1 1.265737 3.9379 -1.32142 24.32409
1 -1.32142 -0.47231 6.980078 -1.25376 5.397022
2 -1.25376 -0.02357 6.288132 -1.25001 0.299805
3 -1.25001 -7E-05 6.250613 -1.25 0.0009

i x(i) f f' x(i+1) εa (%)


• To find root near 1: 0 1 0.140738 -1.0629 1.132409 11.69268
1 1.132409 0.032999 -0.54693 1.192745 5.058566
(Note linear convergence,
2 1.192745 0.008036 -0.27692 1.221763 2.375112
which is due to double root) 3 1.221763 0.001985 -0.13928 1.236013 1.152908
4 1.236013 0.000493 -0.06984 1.243076 0.56821
Newton Method: Multiple Roots
• Instead of f, find zeroes of f/f ’
( i +1) (i ) f ( x (i ) ) / f ' ( x (i ) )
fi fi ' (i )
x =x − =x −
d (i )
[ (i )
f (x ) / f '(x ) ( f i ') − f i f i ' '
2
]
dx
f "(x) = 6 x − 2.50040000

x(i)
(i+1)
i f f' f" x εa (%)
0 1.000000 0.140738 -1.062900 3.499600 1.234750 19.011928
1 1.234750 0.000585 -0.076048 4.908098 1.250052 1.224103
2 1.250052 0.000000 -0.000242 4.999910 1.250034 0.001403
3 1.250034 0.000000 -0.000329 4.999805 1.250029 0.000371
Previous Lecture: Nonlinear Equations

• Newton Method: Single and multiple roots


• Secant Method
• Muller Method
• Bairstow Method

Today: f(x)
• Bairstow Method : Algorithm and Example
• Linear Simultaneous Equations: Introduction
fi-
• Matrix Norms fi- 2
fi
• Condition Number 1 x
ξ xi+ xi xi- xi-
• Eigenvalue 1 2
Roots of polynomials: Bairstow Method
• Find a quadratic factor of the polynomial f(x) as x
2
− α1 x − α 0
• Find the two roots (real or complex conjugates) as

(
r1, 2 = 0.5 α1 ± α12 + 4α 0 )
n
• Algorithm: Express the given function as f ( x) = ∑ c j x j
j =0

• Perform a synthetic division by the quadratic factor


) (
x 2 − α1 x − α 0 cn x n + cn −1 x n −1 + cn − 2 x n − 2 + cn −3 x n −3 + ... + c1 x + c0 cn x n − 2 +(cn-1 + α1cn )x n −3 + ..
cn x n − α1cn x n −1 − α 0 cn x n − 2
(cn-1 + α1cn )x n−1 + (cn- 2 + α 0cn )x n−2 + cn−3 x n−3
(cn-1 + α1cn )x n−1 + α1 (cn-1 + α1cn )x n−2 + α 0 (cn-1 + α1cn )x n−3
Bairstow Method: Algorithm
• For writing a recursive algorithm, we express the quotient as a
polynomial of degree n−2 and the remainder as linear
n n−2
j
(
f ( x ) = ∑ c j x = x − α1 x − α 0
2
) ∑ (d j +2 x)j
+ d1 ( x − α1 ) + d 0
j =0 j =0

• Equating the coefficients of different powers of x, we get


d n = cn
d n −1 = cn −1 + α1d n
d j = c j + α1d j +1 + α 0 d j + 2 for j = n − 2 to 0

• The target is to choose α0 and α1 in such a way that d0 and d1


become zero
• Iterative solution using Newton method
Bairstow Method: Algorithm
• d0 and d1 are functions of α0 and α1. The values for α0 and α1
at the (i+1)th iteration are obtained from

α 0(i +1) = α 0(i ) + ∆α 0(i ) ; α1(i +1) = α1(i ) + ∆α1(i )

• And then choosing the increments to make the residual zero

(i ) (i )
 ∂d 0   ∂d 0 
d ( i +1)
0 =0=d (i )
0 + ∆α 0  +  ∆α1 
 ∂α 0   ∂α1 
(i ) (i )
 ∂d1   ∂d1 
d( i +1)
1 =0=d (i )
1 + ∆α 0  +  ∆α1 
 ∂α 0   ∂α1 
Bairstow Method: Algorithm
• The partial derivatives could be written as
∂d n
=0
∂α 0
∂d n −1
=0
∂α 0
∂d j ∂d j + 2 ∂d j +1
= d j +2 + α 0 + α1 for j = n − 2 to 0
∂α 0 ∂α 0 ∂α 0
and
∂d n
=0
∂α1
∂d n −1
= dn
∂α1
∂d j ∂d j + 2 ∂d j +1
= d j +1 + α 0 + α1 for j = n − 2 to 0
∂α1 ∂α1 ∂α1
Bairstow Method: Algorithm
• These may be combined in a single recursive equation by
defining
∂d j −1 ∂d j
δj = =
∂α 0 ∂α1
to obtain
δ n −1 = d n
δ n − 2 = d n −1 + α1δ n −1
δ j = d j +1 + α1δ j +1 + α 0δ j + 2 for j = n − 3 to 0

• The new estimates of α0 and α1 are obtained by solving

δ1(i ) ∆α 0(i ) + δ 0 (i ) ∆α1(i ) = −d 0(i )


(i ) (i )
δ 2 ∆α + δ1 ∆α1(i ) = −d1(i )
(i )
0

• Repeat till convergence.


Bairstow Method: Example
d n = cn δ n −1 = d n
d n −1 = cn −1 + α1d n δ n − 2 = d n −1 + α1δ n −1
d j = c j + α1d j +1 + α 0 d j + 2 ; j = n − 2 to 0 δ j = d j +1 + α1δ j +1 + α 0δ j + 2 ; j = n − 3 to 0
δ1(i ) ∆α 0(i ) + δ 0 (i ) ∆α1(i ) = −d 0(i ) ; δ 2 (i ) ∆α 0(i ) + δ1(i ) ∆α1(i ) = −d1(i )

Solve: x 5 − 5.05 x 4 + 12.2 x 3 − 16.48 x 2 + 12.5644 x − 4.28442 = 0


j Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6
α0= -1 α0= -1.174 α0= -1.582 α0= -1.986 α0= -2.018 α0= -2.02
α1=1 α1=1.467 α1=1.936 α1=2.196 α1=2.198 α1=2.2
dj δj dj δj dj δj dj δj dj δj dj δj
0 1.130 -2.096 0.484 -0.283 0.204 0.154 0.010 -0.363 0.001 -0.475 0.000
1 0.134 0.870 0.202 0.864 0.138 0.603 0.016 0.294 0.001 0.206 0.000
2 -5.280 3.100 -3.809 1.491 -2.669 0.728 -2.145 0.516 -2.123 0.460 -2.121
3 7.150 -3.050 5.770 -2.116 4.589 -1.177 3.947 -0.658 3.913 -0.653 3.910
4 -4.050 1.000 -3.583 1.000 -3.114 1.000 -2.854 1.000 -2.852 1.000 -2.850
5 1.000 1.000 1.000 1.000 1.000 1.000
∆α0=-0.174 ∆α0=-0.407 ∆α0=-0.404 ∆α0=-0.032 ∆α0=-0.002
∆α1=0.467 ∆α1=0.470 ∆α1=0.260 ∆α1=0.002 ∆α1=0.002

The two roots are: 1.1±0.9 i (


r1, 2 = 0.5 α1 ± α12 + 4α 0 )
System of Linear Equations: Introduction
• Example:
• N machines make N types of strings, requiring tms
time on machine m to produce unit length of string s
• If total times on each machine, Ti ( i=1,2,..N) are
given, find length of each type of string, lj ( j=1,2,…N)
 t11 t12 ... t1N   l1   T1 
t t ... t   l  T 
 21 22 2 N  2   2
 = 
 . . . .  .   . 
    
t N 1 t N 2 ... t NN  l N  TN 

Commonly used notation : [A]{x} = {b}


System of Linear Equations: Introduction
• Is a solution possible? a11 x1 + a12 x2 = b1
• Take a 2-dimensional example a21 x1 + a22 x2 = b2
• Solution exists for a11=1, a12=1, a21=2,a22=3,b1=3,b2=8
• No solution for a11=1, a12=1, a21=2,a22=2,b1=3,b2=8
• Infinite solutions for a11=1, a12=1, a21=2,a22=2,b1=4,b2=8
x 2 x 2
4.0 4.0
3.5 3.5
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 x1 0.0 x1
0.0 0.5 1.0 1.5 2.0 x2 0.0 0.5 1.0 1.5 2.0
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0 x1
0.0 0.5 1.0 1.5 2.0
System of Linear Equations: Introduction
• We assume that a solution exists, and [A] is an n x n
non-singular matrix
• How sensitive is the solution to small changes in [A]
and/or {b}? (idea of a condition number)
• Small change in {b} : Already discussed Vector Norm
• Small change in [A] : A Matrix Norm is needed
Vector Norm Matrix Norm
x ≥ 0 (0 only for null vector) A ≥ 0 (0 only for null matrix)
αx = α x αA = α A
x1 + x2 ≤ x1 + x2 A+ B ≤ A + B
AB ≤ A B
For consistent (compatible) norm Ax ≤ A x
Matrix Norm
• Earlier we had defined Lp norm of a vector
x p
( p p p
= x1 + x2 + x3 + ... + xn )
p 1/ p
p ≥1
which could be Euclidean distance (p=2), total
distance (p=1), largest “axis distance” (p=∞) etc.
• For a matrix, we could define a norm based on what
happens when the matrix is multiplied with a unit
vector (known as the induced or subordinate norm).
Some other norms are element-wise norms, e.g.
square-root of sum of squares of all elements
(Frobenius norm).
• The norm will be large if the multiplication leads to a
large magnitude vector.
• We define the p-norm of a matrix as
Ax p
A p = max Ax = max
x p
=1 p x p
≠0 x p
Matrix Norm – The 1-norm

• The 1-norm of the matrix is written as A 1 = max


x 1 =1
Ax 1
 a11   a12   a1n 
a  a  a 
• We could write  
21  
22  2 n 
Ax = x1  .  + x2  .  + ... + xn  . 
 .   .   . 
     
an1  an 2  ann 

• For x 1 = x1 + x2 + ... + xn = 1 , what is the maximum L1


norm of Ax?
• If we think of it as weighted sum of column-L1 norms,
maximum will occur when |x| corresponding to the
column with maximum L1 norm is 1 and all others 0
n
• Also known as the column-sum norm A 1 = max ∑ a ij
1≤ j ≤ n
i =1
Matrix Norm – The ∞-norm

• The ∞-norm of the matrix is written as A ∞ = max


x ∞ =1
Ax ∞

 a11 x1 + a12 x2 + ... + a1n xn 


• Write a x + a x + ... + a x 
 21 1 22 2 2n n 

Ax =  . 
 . 
 
an1 x1 + an 2 x2 + ... + ann xn 

• For x ∞ = max( x1 , x2 ,..., xn ) = 1 , what is maximum L∞-


norm of Ax?
• This will occur when all x are 1 with appropriate sign
such that the row-sum is of maximum magnitude
n
• Also known as the row-sum norm A = max a∞ 1≤i ≤ n

j=1
ij
Matrix Norm – The 2-norm
• The 2-norm of the matrix is written as A = max Ax
2 x 2 =1 2

• If we transform a unit vector, {x}, to another vector,


{b}, by multiplying with matrix [A], what is maximum
“length” of {b}?
• This may be posed as a constrained optimization
problem: Maximize {x}T[A]T[A]{x} subject to {x}T{x} =1
• Use of the Lagrange multiplier method results in
[ T T
( T
)]
∇ x A Ax − λ x x − 1 = 0 ⇒ A Ax = λx T

• Which leads to A 2 = xT AT Ax = λ
• Also known as the spectral norm
System of Linear Equations: Condition Number
• We now come back to the question: how sensitive is
the solution to small changes in [A] and/or {b}?
• Look at the worst-case scenario: upper bound of error
• Effect of change in {b}:
A( x + δx ) = (b + δb ) ⇒ δx = A−1δb ⇒ δx ≤ A−1 . δb

δx A−1 . δb b A−1 . δb A . x −1 δb
≤ ≤ ≤ A .A
x x b x b b

• Effect of change in [A]


( A + δA)(x + δx ) = b ⇒ δx = − A−1δA(x + δx ) ⇒ δx ≤ A−1 . δA . x + δx
δx A δA
≤ A . δA
−1 −1
≤ A .A
x + δx A A

• Condition Number of matrix A, C(A), is, therefore, A . A−1


Condition Number: Example
x1 + x2 = 2.00
0.99 x1 + 1.01x2 = 2.00
 1 1  T 1.9801 1.9999 
A=  A A= 
 0 .99 1 . 01 1 . 9999 2 . 0201
 1.01 − 1 2.0002 − 2
−1
A = 50 
− 0 .99 1  (A )−1 T −1
A = 2500 
− 2 2 
   

• The Matrix norms are (How to find the 2-norm is described later):
A 1 = 2.01 A∞ =2 A2 =2
A−1 = 100 A−1 = 100.5 A−1 = 100
1 ∞ 2
Condition Number: Example
x1 + x2 = 2.00 Solution: 1,1
0.99 x1 + 1.01x2 = 2.00

x1 + x2 = 1.98
Solution: −1.01,2.99
0.99 x1 + 1.01x2 = 2.02

x1 + x2 = 2.00
Solution: 2,0
1.00 x1 + 1.01x2 = 2.00
Previous Lecture: Nonlinear Equations

• Bairstow Method : Algorithm and Example


• Linear SimultaneousDivide the polynomial
Equations: by a quadratic factor
Introduction
• Matrix Norms Whether a solution exists? If yes, how sensitive it is?
How to find the " size" of a matrix?

Today:
• Matrix Norms
• Eigenvalue
• Condition Number
• Direct and Iterative Methods
Matrix Norms: Review
A ≥ 0 (0 only for null matrix) Ax p
A p = max Ax = max
αA = α A x p
=1 p x p
≠0 x p

A+ B ≤ A + B n

AB ≤ A B A 1 = max
1≤ j ≤ n
∑a
i =1
ij

Ax ≤ A x
n
A ∞ = max
1≤i ≤ n
∑a
j=1
ij
Matrix Norm – The 2-norm
• The 2-norm of the matrix is written as A 2 = max Ax 2
x 2 =1
• If we transform a unit vector, {x}, to another vector,
{b}, by multiplying with matrix [A], what is maximum
“length” of {b}?
• We could view [A] as operating on vector {x} to
generate another vector {b}
• It will, in general, lead to a rotation as well as
stretching (or shortening) of the vector {x}
• E.g., consider
1.25 0.75
[ A] =  
 0. 75 1 .25 
1.25 0.75
[ A] =  
 0. 75 1 . 25 
1.5

0.5

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-0.5

-1

-1.5
Eigenvalue
• If Ax = λx , there is no rotation. λ is called an
Eigenvalue of [A] and {x} is the corresponding
Eigenvector
• For symmetric matrices, the maximum stretching
and/or shortening will occur along its Eigenvectors,
all of which are mutually orthogonal
• For others, it will occur along some other direction
(we will see later that it is eigenvector of ATA).
• How to find these will be considered after we look at
the methods of solving the linear system.
1.25 0.75
[ A] =  
 0 .1 1 .25 
1.5

0.5

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-0.5

-1

-1.5
T 1.5725 1.0625
[ A] [ A] =  
1 . 0625 2 . 125 
3

0
-3 -2 -1 0 1 2 3

-1

-2

-3
Matrix Norm – The 2-norm
• The 2-norm of the matrix is written as A 2 = max Ax 2
x 2 =1
• If we transform a unit vector, {x}, to another vector,
{b}, by multiplying with matrix [A], what is maximum
“length” of {b}?
• This may be posed as a constrained optimization
problem: Maximize {x}T[A]T[A]{x} subject to {x}T{x} =1
• Use the Lagrange multiplier method

∇ [
x A Ax − ( x x )]
Maximize f ( xT1 , x2T) subject to g T( x1 , x2 ) = 0
λ − 1 = 0 ⇒ AT
A necessary condition is : ∇[ f ( x1 , x2 ) − λg ( x1 , x2 )] = 0 Ax = λx
For example, finding maximum value of x1 + x2 on the unit circle :
• Which leads to
0 T T

 x1 
[ ( )]
x1 + x2 − λ x1 + x2 − 1A=2= ⇒
2 2

0
x 1A− 2Ax = 0;1λ− 2λx2 = 0 ⇒ x1 = x2 =
λx1 =
1
 2
∂ 
•x2 Also
 known as the spectral norm
System of Linear Equations: Condition Number
• We now come back to the question: how sensitive is
the solution to small changes in [A] and/or {b}?
• Look at the worst-case scenario: upper bound of error
• Effect of change in {b}:
A( x + δx ) = (b + δb ) ⇒ δx = A−1δb ⇒ δx ≤ A−1 . δb

δx A−1 . δb b A−1 . δb A . x −1 δb
≤ ≤ ≤ A .A
x x b x b b

• Effect of change in [A]


( A + δA)(x + δx ) = b ⇒ δx = − A−1δA(x + δx ) ⇒ δx ≤ A−1 . δA . x + δx
δx A δA
≤ A . δA
−1 −1
≤ A .A
x + δx A A

• Condition Number of matrix A, C(A), is, therefore, A . A−1


Condition Number: Example
x1 + x2 = 2.00
0.99 x1 + 1.01x2 = 2.00
 1 1  T 1.9801 1.9999 
A=  A A= 
 0 .99 1 . 01 1 . 9999 2 . 0201
 1.01 − 1 2.0002 − 2
−1
A = 50 
− 0 .99 1  (A )−1 T −1
A = 2500 
− 2 2 
   

• The Matrix norms are (How to find the 2-norm is described later):
A 1 = 2.01 A∞ =2 A2 =2
A−1 = 100 A−1 = 100.5 A−1 = 100
1 ∞ 2
Condition Number: Example
x1 + x2 = 2.00 Solution: 1,1
0.99 x1 + 1.01x2 = 2.00

x1 + x2 = 1.98
Solution: −1.01,2.99
0.99 x1 + 1.01x2 = 2.02

x1 + x2 = 2.00
Solution: 2,0
1.00 x1 + 1.01x2 = 2.00
Methods of Solution of Systems of Linear Equations
• Given Ax = b , how to find {x} for known A and b?
• Easiest: when A is a diagonal matrix
• A little more difficult, but still easy, if A is a triangular
matrix (either upper triangular or lower triangular)
• Otherwise, we may perform some operations on the
system to reduce A to one of these forms, and then
solve the system directly (Direct methods). These
will arrive at the solution in a finite number of steps.
• The other option is to start with a guess value of the
solution and then use an iterative scheme to improve
these (Iterative methods). Number of steps depends
on the convergence properties of the system and the
desired accuracy. May not even Converge!
Direct Methods: Gauss Jordan Method
• Reduce A to a diagonal matrix (generally Identity matrix)
• Take the first equation and divide it by a11 to make the
diagonal element unity. Then, express x1 as a function of
other (n-1) variables: b1 − a12 x2 −a13 x3 ... − a1n xn
x1 =
a11
• Using this, eliminate x1 from all other equations. This will
change the coefficients of these equations. E.g., the
second equation:
b1 − a12 x2 −a13 x3 ... − a1n xn
a21 + a22 x2 + a23 x3 + ... + a2 n xn = b2
a11
 a12   a1n  b1
 a22 − a21  x2 + ... +  a2 n − a21  xn = b2 − a21
 a11   a11  a11
Direct Methods: Gauss Jordan Method
• Written compactly: R’1=R1/a11; R’i=Ri−ai1xR’1 i=2 to n
• After this step, the first column of [A] is the same as that
of an n x n Identity matrix [I]
• Now, take the second equation and divide it by a’22 to
make the diagonal element unity. Then, express x2 as a
function of other (n-2) variables :
b2' − a23
'
x3 − a24 '
x4 ... − a2' n xn
x2 = '
a22
• Using this, eliminate x2 from all other equations,
including the first one. E.g.
' ' '
" ' ' b " ' ' a " ' ' a
b3 = b3 − a32 ' ; a33 = a33 − a32 ' ;...; a3n = a3n − a32 2' n
2 23
a22 a22 a22
• Compactly: R”2=R’2/a’22; R”i=R’i−a’i2xR’2 i=1, 3 to n
Direct Methods: Gauss Jordan Method
• After this step, first two columns are same as [I]. Also
note that the row-modifications in this step need to be
done only for columns 3 to n.
• Similarly, for the third equation, computations needed
for only columns 4 to n.
• Repeat till the last equation and the modified {b} vector
is the solution!
• Note that any Pivot element aii should not become 0.
Otherwise, the order of the equations needs to be
changed.
• Changing the order of equations to bring a non-zero
element at the pivot position is called pivoting.
Gauss Jordan Method: Example
2 4 1  x1  13  1 2 0.5  x1  6.5 1 2 0.5  x1  6.5
1 2 3  x  = 14 ; 1 2 3   x  =  14 ; 0 0 2.5  x  = 7.5
  2      2      2   
3 1 5  x3  20 3 1 5   x3   20  0 − 5 3.5  x3  0.5

1 2 0.5  x1  6.5 1 2 0.5   x1   6.5  1 0 1.9   x1   6.7 


0 − 5 3.5  x  = 0.5; 0 1 − 0.7   x  = − 0.1; 0 1 − 0.7   x  = − 0.1
  2      2      2   
0 0 2.5  x3  7.5 0 0 2.5   x3   7.5  0 0 2.5   x3   7.5 
Pivoting

1 0 1.9   x1   6.7  1 0 0  x1  1 
0 1 − 0.7   x  = − 0.1 0 1 0   x  = 2 
  2      2   
0 0 1   x3   3  0 0 1  x3  3
Direct Methods: Gauss Elimination Method
• Reduce A to upper-triangular matrix (forward elimination)
• Solve the triangular system (back substitution)
• Using the first equation express x1 as a function of other (n-
1) variables: b1 − a12 x2 −a13 x3 ... − a1n xn
x1 =
a11
• Eliminate x1 from all other equations. This will change the
coefficients of these equations. E.g., the second equation:
b1 − a12 x2 −a13 x3 ... − a1n xn
a21 + a22 x2 + a23 x3 + ... + a2 n xn = b2
a11
 a21   a21  a21
 a22 − a12  x2 + ... +  a2 n − a1n  xn = b2 − b1
 a11   a11  a11
Gauss Elimination Method
• Written compactly: R’i=Ri−(ai1/a11)xR1 i=2 to n
• After this step, the first column of [A] has a11 on the
diagonal and zeroes everywhere else
• Now, take the second equation, express x2 as a function
of other (n-2) variables :
b2' − a23
' '
x3 − a24 x4 ... − a2' n xn
x2 = '
a22
• Using this, eliminate x2 from all other equations,
excluding the first one. E.g.
' ' '
" ' a32 ' " ' a ' " ' a '
b3 = b3 − ' b2 ; a33 = a33 − ' a23 ;...; a3n = a3n − ' a2 n
32 32
a22 a22 a22
• Compactly: R”i=R’i−(a’i2/a’22)xR’2 i= 3 to n
Gauss Elimination Method
• After this step, first two columns have all below-diagonal
elements as zero.
• Repeat till the last equation to obtain an upper triangular
matrix and a modified {b} vector.
• The last equation can now be used to obtain xn directly.
• The second-last equation has only two “unknowns,” xn-1
and xn, and xn is already computed. Solve for xn-1.
• Repeat, going backwards to the first equation, to obtain
the complete solution.
• The total flops are, n  n  n 2 2 3
∑ [i + 2i(i − 1)] + ≈ ∑ i  ≈ ∑ 2i
i =1 i =1 i =1

3
n

• More efficient by about 50%


Gauss Elimination Method: Example
Forward Elimination
2 4 1  x1  13  2 4 1   x1   13 
1 2 3  x  = 14 ; 0 0 2.5  x  = 7.5;
  2      2   
3 1 5  x3  20 0 − 5 3.5  x3  0.5

2 4 1   x1   13 
0 − 5 3.5  x  = 0.5;
  2   
0 0 2.5  x3  7.5
Make the pivot non-zero

Back substitution

7.5 0.5 − 3.5 × 3 13 − 4 × 2 − 1× 3


x3 = = 3; x2 = = 2; x1 = =1
2.5 −5 2
Gauss Jordan Method: Computational Effort
• The method looks simple, and is easily programmable, then
why should we not use it?
• Consider the number of floating point operations (flops)
• First step requires n division, and, for each of the other n−1
rows, n−1 multiplications, and n−1 subtractions
• Second step requires n−1 division, and, for each of the
other n−1 rows, n−2 multiplications, and n−2 subtractions
• The total flops are, then, n i + (n − 1)(2i ) = n (2n − 1)i ≈ n 3

i =1

i =1

• There is a more efficient method, Gauss Elimination, which


requires 2/3 times the computational effort.
Previous Lecture: Linear Equations

• Matrix Norms How to find the " size" of a matrix?


• Eigenvalue and Eigenvector which do not rotate on multiplying by [A
• Condition Number How sensitive is {x} to changes in [A] or {b}
• Direct and Iterative Methods Gauss Jordan and Gauss elimination

Today:
• Reducing round-off errors: Pivoting, scaling, equilibration
• LU decomposition
• Cholesky decomposition
• Banded matrix: Thomas algorithm
Gauss Elimination Method: Pivoting
• First Step: R’i=Ri−(ai1/a11)xR1 i=2 to n
• Second step: R”i=R’i−(a’i2/a’22)xR’2 i= 3 to n; and so on..
• The multiplying factor at each step is equal to the
corresponding element of the row divided by the pivot
element
• If we want round-off errors to be attenuated, this factor
should be small in magnitude.
• Since order of equations is immaterial, we can
interchange rows without affecting the solution
• Partial Pivoting: Scan the elements below the diagonal in
that particular column, find the largest magnitude, and
interchange rows so that the pivot element becomes the
largest.
Pivoting
• Complete Pivoting: Scan the elements in the entire
submatrix, find the largest magnitude, and interchange
rows and columns, so that the pivot element becomes
the largest.

Partial Complete
Pivoting
• Parial Pivoting does not need any additional bookkeeping.
Complete pivoting, due to an exchange of columns, needs
to keep track of this exchange.
• If we exchange columns k and l: after the solution, the
values of xk and xl have to be interchanged.
Examples: Partial Pivoting:
2 4 1  x1  13  3 1 5  x1  20 3 1 5   x1   20 
1 2 3  x  = 14 ; 1 2 3  x  = 14 ;    x  = 22 / 3
  2      2     0 5 / 3 4 / 3  2   
3 1 5  x3  20 2 4 1  x3  13  0 10 / 3 − 7 / 3  x3  − 1 / 3

3 1 5   x1   20  3 1 5   x1   20  Soln:
0 10 / 3 − 7 / 3  x  = − 1 / 3; 0 10 / 3 − 7 / 3  x  = − 1 / 3;
  2   
  2    1,2,3
0 5 / 3 4 / 3   x3  22 / 3 0 0 5 / 2   x3  15 / 2 
Example: Complete (or Full) Pivoting
2 4 1  x1  13  3 1 5  x1  20 5 1 3  x3  20
1 2 3  x  = 14 ; 1 2 3  x  = 14 ; 3 2 1   x  = 14 ;
  2      2      2   
3 1 5  x3  20 2 4 1  x3  13  1 4 2  x1  13 
• Note the interchange, of the variables x1 and x3. While computing, we
deal with only [A] and {b}. So we make a note of this exchange and do
it after the solution is obtained.

5 1 3  20 5 1 3  20
0 1.4 − 0.8{x} =  2 ; 0 3.8 1.4 {x} =  9 ;
       
0 3.8 1.4  9 0 1.4 − 0.8 2
   

5 1 3   20 
0 3.8 1 . 4 {x} =  9 ; Solution : 3,2,1. Interchange :1,2,3
   
0 0 − 25 / 19 − 25 / 19
 
Other considerations
• Round-off errors are typically large if the elements of the [A]
matrix are of very different magnitudes.
• Scaling of variable and equilibration of equation are used to
make the coefficient matrix elements roughly of the same order
• For example, 0.02 4 1  x1  13  may be because x1 is
 0.01 2 3  x  = 14  in cm and others in m
  2   
0.03 1 5  x3  20

• Scaling of x1 to x’1=x1/100 will result in 2 4 1  x1  13 
1 2 3  x2  = 14 ;
3 1 5  x3  20

• Similarly,  2 4 may be because the


1   x1   13 
 1 2 3   x  =  14 ;
 the third equation is in hr
 2   
1 / 20 1 / 60 1 / 12  x3  1 / 3
and others are in minutes
• Equilibrating by multiplying the third equation by 60.
Gauss Elimination Method, LU decomposition
• GE reduces the matrix [A] to upper triangular form
• If there are several RHS vectors, an “augmented” [A]
matrix could be used to obtain the solution for all of
these through the same algorithm
• Sometimes, an RHS vector depends on the solution for
the previous RHS vector.
• The LU decomposition method is then more efficient,
since solution of two triangular systems involves much
less computational effort
[ A] = [ L][U ] ⇒ [ L]{ y} = {b} where, { y} = [U ]{x}
• The effort in decomposition is needed only once (The
assumption, of course, is that [A] is not changing!)
LU decomposition: Alternative ways
 a11 a12 . . a1n  l11 0 . . 0  u11 u12 . . u1n 
a
 21 a22 . . a2 n  l21 l22 . . 0   0 u22 . . u2 n 
 . . . . . = . . . . .  . . . . . 
    
 . . . . .  . . . . .  . . . . . 
an1 an 2 . . ann  ln1 ln 2 . . lnn   0 0 . . unn 

• n2 equations, n2+n “unknowns”. The way the “floating” n


values are fixed: different algorithms
• Doolittle: All diagonal elements of [L] are 1
• Crout: All diagonal elements of [U] are 1
• Cholesky (some call it Choleski): Works only for
symmetric positive definite [A]: n(n+1)/2 eqns. [U]=[L]T
LU decomposition: Doolittle
 a11 a12 . . a1n   1 0 . .
0 u11 u12 . . u1n 
a
 21 a22 . . a2 n  l21 1 . 0  0 u 22
. . . u 2 n 
 . . . . . = . . . .  .
. . . . . 
    
 . . . . .  . . . .
.  . . . . . 
an1 an 2 . . ann  ln1 ln 2 . . 1  0 0 . . u nn 
 u11 u12 . . u1n 
l u . . 
 21 11 l21u12 + u 22 l21u1n + u 2 n 
= . . . . . 
 
 . . . . . 
ln1u11 ln1u12 + ln 2u22 . . ln1u1n + ln 2u 2 n +... + unn 

• n2 equations, n2 “unknowns”: (n2-n)/2 in [L] and (n2+n)/2 in


[U]
• Sequential computations: u11=a11, u12=a12 ,…, u1n=a1n ;
l21=a21/u11, l31=a31/u11,…, ln1=an1/u11; u22=(a22-l21xu12), u23=(a23-
l21xu13), …, u2n=(a2n-l21xu1n);l32=(a32-l31xu12)/u22,…
LU decomposition: Crout
 a11 a12 . . a1n  l11 0 . . 0  1 u12 . . u1n 
a
 21 a22 . . a2 n  l21 l22 . . 0  0 1 . . u2 n 
 
 . . . . . = . . . . .  . . . . . 
    
 . . . . .  . . . . .  . . . . . 
an1 an 2 . . ann  ln1 ln 2 . . lnn  0 0 . . 1 
l11 l11u12 . . l11u1n 
l l21u12 + l22 . . l21u1n + l22u2 n 
 21 
= . . . . . 
 
. . . . . 
ln1 ln1u12 + ln 2 . . ln1u1n + ln 2u 2 n +... + lnn 

• n2 equations, n2 “unknowns”: (n2+n)/2 in [L] and (n2-n)/2 in


[U]
• Sequential computations: l11=a11, l21=a21 ,…, ln1=an1 ;
u12=a12/l11, u13=a13/l11,…, u1n=a1n/l11; l22=(a22-l21xu12), l32=(a32-
l31xu12), …, ln2=(an2-ln1xu12);u23=(a23-l21xu13)/l22,…
LU decomposition: Cholesky
 a11 a12 .
. a1n  l11 0 . . 0  l11 l21 . . ln1 
a
 21 a22. a2 n  l21 l22 . . 0   0
. l22 . . ln 2 
 . . . . = .
. . . . .  . . . . . 
    
 . . .
. .  . . . . .  . . . . . 
an1
an 2 . ann  ln1 ln 2 . . lnn   0
. 0 . . lnn 
Symmetric: aij=aji  l112 l11l21 . . l11ln1 
 2 2 
l l
 11 21 l 21 + l 22 . . l21ln1 + l22ln 2 
= . . . . . 
 
 . . . . . 
l l . ln21 + ln22 + ... + lnn2 
 11 n1 l21ln1 + l22ln 2 .

• n(n+1)/2 equations, n(n+1)/2 “unknowns” in [L]


• Sequential computations: l11=√a11, l21=a21 /l11,…, ln1=an1 /l11 ;
l22=√(a22-l221), l32=(a32-l31xl21)/l22, …, ln2=(an2-ln1xl21)/ l22;
LU decomposition: Examples
 9 3 − 2  x1  10
 3 6 1   x  = 10
  2   
− 2 1 9   x3   8 

Doolittle:
 9 3 − 2  1 0 0 u11 u12 u13   1 0 0 9 3 − 2 
 3 6 1  = l
   21 1 0  0 u22 u23  =  1 / 3 1 0 0 5 5 / 3
− 2 1 9  l31 l32 1  0 0 u33  − 2 / 9 1 / 3 1 0 0 8 

 1 0 0  y1  10 9 3 − 2   x1   10 
 1/ 3   y  = 10 0 5 5 / 3  x  = 20 / 3
 1 0  2      2   
− 2 / 9 1 / 3 1  y3   8  0 0 8   x3   8 
LU decomposition: Examples
 9 3 − 2  x1  10
 3 6 1   x  = 10
  2   
− 2 1 9   x3   8 

Crout:
 9 3 − 2 l11 0 0  1 u12 u13   9 0 0 1 1 / 3 − 2 / 9
 3 6 1  = l
   21 l22 0  0 1 u23  =  3 5 0 0 1 1 / 3 
− 2 1 9  l31 l32 l33  0 0 1  − 2 5 / 3 8 0 0 1 

9 0 0  y1  10 1 1 / 3 − 2 / 9  x1  10 / 9
3   y  = 10 0 1 1 / 3 x  =  4 / 3 
 5 0  2      2   
− 2 5 / 3 8  y3   8  0 0 1   x3   1 
LU decomposition: Examples
 9 3 − 2  x1  10
 3 6 1   x  = 10
  2   
− 2 1 9   x3   8 

Cholesky:
 9 3 − 2 l11 0 0  l11 l21 l31   3 0 0  3 1 − 2 / 3
 3 6 1  = l
   21 l22 0   0 l22 l32  =  1 5 0  0 5 5 / 3
− 2 1 9  l31 l32 l33   0 0 l33  − 2 / 3 5 /3 8  0 0 8 

 3 0 0   y1  10 3 1 − 2 / 3  x1   10 / 3 
0    
 1 5
   
0   y2  = 10  5 5 / 3  x2  = 4 5 / 3

− 2 / 3 5 /3 8   y3   8  0 0 8   x3   8 
Banded and Sparse matrices
Several of the matrices in engineering applications are sparse, i.e.,
have very few non-zero elements. For example, solution of
differential equations:

i,j+1
i-1,j i,j i+1,j
i,j-1

Ti +1, j − Ti , j Ti , j − Ti −1, j
t + ∆t t − Ti +1, j − 2Ti , j + Ti −1, j
∂T Ti , j − Ti , j ∂ 2T ∆x ∆x
= ; 2 = =
∂t ∆t ∂x ∆x ∆x 2
If the non-zero elements occur within a “narrow” band around the
diagonal, it is called banded matrix. The fact that most of the
elements are zero can be used to save computational effort.
Banded Matrix
× × . 0 0
× × . . 0

. . . . .
 
0 . . × ×
0 0 . × ×

• Band width: If main diagonal, adjacent r row elements,


and adjacent c column elements maybe non-zero, the
band-width is r+c+1. In GE method, since we know that
elements beyond the band are already zero, we do not
need to perform the computations to make them zero.
• Simplest case: Tridiagonal, r=c=1 (e.g., in 1-d finite diff.)
• Thomas Algorithm
Thomas Algorithm
d1 u1 
l d2 u2 0 
2 
 l3 d3 u3   x1   b1 
    
 . . .   x2  = b2 
 . . .  .   . 
    
 0 . . .   xn  bn 
 ln −1 d n −1 un −1 
 
 ln d n 
• Apply Gauss elimination. Forward: R’i+1= Ri+1 – li/di-1 x Ri
• Use α to represent the modified diagonal term and β to
represent the modified RHS
li li
• The algorithm is: 1 1 i i α i −1 1 1 i i α β i −1
α = d ; α = d − u ; β = b ; β = b −
i −1 i −1
• And the back-substitution:
βn β i − ui xi +1
xn = ; xi =
αn αi
Thomas Algorithm: Example
 2 − 1 0 0   x1  0
− 1 2 − 1 0   x  0
   2  =  
 0 − 1 2 − 1  x3  1
    
 0 0 − 1 1   x4  0
li li βn β −u x
α1 = d1 ; α i = d i − ui −1 ; β1 = b1 ; β i = bi − β i −1 xn = ; xi = i i i +1
α i −1 α i −1 αn αi

l2 −1 3 −1 4 −1 1
α1 = 2; α 2 = d 2 − u1 = 2 − (−1) = ; α 3 = 2 − (−1) = ; α 4 = 1 − (−1) =
α1 2 2 3/ 2 3 4/3 4
l2 −1 −1 −1 3
β1 = 0; β 2 = b2 − β1 = 0 − 0 = 0; β 3 = 1 − 0 = 1; β 4 = 0 − 1=
α1 2 3/ 2 4/3 4

β4 β − u x 1 − (−1)3 0 − (−1)3 0 − (−1)2


x4 = = 3; x3 = 3 3 4 = = 3; x2 = = 2; x1 = =1
α4 α3 4/3 3/ 2 2
Thomas Algorithm (for Tridiagonal)

• No need to store n2 + n elements!


• Store only 4n elements in the form of four
vectors l, d, u and b
• ith equation is: li xi −1 + di xi + ui xi +1 = bi
• Notice: l1 = un = 0
Thomas Algorithm
• Initialize two new vectors α and β as α1 =
d1 and β1 = b1
• Take the first two equation and eliminate
x1 : α1x1 + u1x2 = β1
l2 x1 + d 2 x2 + u2 x3 = b2

• Resulting equation is: α 2 x2 + u2 x3 = β 2


where,  l2   l2 
α 2 = d2 −  u1 β 2 = b2 −   β1
 α1   α1 
• Similarly, we can eliminate x2, x3 ……
Thomas Algorithm
• At the ith step: α i −1xi −1 + ui −1xi = βi −1
li xi −1 + d i xi + ui xi +1 = bi

• Eliminate xi-1 to obtain: α i xi + ui xi +1 = βi


where,  li   li 
α i = di −  ui −1 β i = bi −   β i −1
 α i −1   α i −1 
• Last two equations are: α n−1xn−1 + un−1xn = β n−1
ln xn −1 + d n xn = bn
• Eliminate xn-1 to obtain: α n xn = β n

 ln   ln 
α n = d n −  un−1 β n = bn −   β n−1

 α n−1   α n−1 
Thomas Algorithm
• Given: four vectors l, d, u and b
• Generate: two vectors α and β as
α1 = d1 and β1 = b1
 li   li 
α i = d i −  ui −1 β i = bi −   β i −1
 α i −1   α i −1 

i = 2, 3, ….. n

• Solution: βn β i − ui xi +1
xn = xi =
αn αi
i = n-1 …… 3, 2, 1
• FP operations: 8(n-1) + 3(n-1) + 1 = 11n - 10
Thomas Algorithm: Example
 2 − 1 0 0   x1  0
− 1 2 − 1 0   x  0
   2  =  
 0 − 1 2 − 1  x3  1
    
 0 0 − 1 1   x4  0
li li βn β −u x
α1 = d1 ; α i = d i − ui −1 ; β1 = b1 ; β i = bi − β i −1 xn = ; xi = i i i +1
α i −1 α i −1 αn αi

l2 −1 3 −1 4 −1 1
α1 = 2; α 2 = d 2 − u1 = 2 − (−1) = ; α 3 = 2 − (−1) = ; α 4 = 1 − (−1) =
α1 2 2 3/ 2 3 4/3 4
l2 −1 −1 −1 3
β1 = 0; β 2 = b2 − β1 = 0 − 0 = 0; β 3 = 1 − 0 = 1; β 4 = 0 − 1=
α1 2 3/ 2 4/3 4

β4 β − u x 1 − (−1)3 0 − (−1)3 0 − (−1)2


x4 = = 3; x3 = 3 3 4 = = 3; x2 = = 2; x1 = =1
α4 α3 4/3 3/ 2 2
Iterative Methods
a11x1 + a12x2 + a13x3 • • • • • • • • + a1nxn = b1
a21x1 + a22x2 + a23x3 • • • • • • • • + a2nxn = b2
a31x1 + a32x2 + a33x3 • • • • • • • • + a3nxn = b3
• • • • •
ai1x1 + ai2x2 + ai3x3 • • • • • • • • + ainxn = bi
• • • • •
an1x1 + an2x2 + an3x3 • • • • • • • • + annxn = bn

• Assume (initialize) a solution vector x


• Compute a new solution vector xnew
• Iterate until ║ x- xnew ║∞ ≤ ε
• We will learn two methods: Jacobi and Gauss Seidel
Jacobi and Gauss Seidel
• Jacobi: for the iteration index k (k = 0 for the
initial guess) n
(k )
bi − ∑ aij x j
j =1, j ≠i
xi( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
aii

• Gauss Seidel: for the iteration index k (k = 0 for


the initial guess)
i −1 n
bi − ∑ aij x (jk +1) − ∑ ij j
a x (k )

j =1 j =i +1
xi( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
aii
Stopping Criteria

• Generate the error vector (e) at each


iteration as
xi( k +1) − xi( k )
ei( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
xi( k +1)

• Stop when: ║ e ║∞ ≤ ε

Let us see an example?


Iterative Methods (Example)
Solve the following system of equations using x1 + 2 x2 − x4 = 1
Jacobi and Gauss Seidel methods and using
initial guess as zero for all the variables with x2 + 2 x3 = 1.5
error less than 0.01%. Compare the number − x3 + 2 x4 = 1.5
iterations required for solution using two
methods: x1 + 2 x3 − x4 = 2

1 − 2 x2( k ) + x4( k ) ( k +1) 1.5 − 2 x3( k )


x1( k +1) = x2 =
1 1
Jacobi 1.5 − 2 x4( k ) 2 − x1( k ) − 2 x3( k )
x3( k +1) = x4( k +1) =
−1 −1

1 − 2 x2( k ) + x4( k ) ( k +1) 1.5 − 2 x3( k )


x1( k +1) = x2 =
Gauss 1 1
Seidel 1.5 − 2 x4( k ) 2 − x1( k +1) − 2 x3( k +1)
x3( k +1) = x4( k +1) =
−1 −1
Iterative methods (Example)
If you iterate, both the methods will diverge
Jacobi Gauss Seidel
Iter x1 x2 x3 x4 Iter x1 x2 x3 x4
0 0 0 0 0 0 0 0 0 0
1 1 1.5 -1.5 -2 1 1 1.5 -1.5 -4
2 -4 4.5 -5.5 -4 90.9 2 -6 4.5 -9.5 -27 85.2
3 -12 12.5 -9.5 -17 76.5 3 -35 20.5 -55.5 -148 81.8
4 -41 20.5 -35.5 -33 70.7 4 -188 112.5 -297.5 -785 81.1
5 -73 72.5 -67.5 -114 71.1 5 -1009 596.5 -1571.5 -4154 81.1
6 -258 136.5 -229.5 -210 71.7 6 -5346 3144.5 -8309.5 -21967 81.1
7 -482 460.5 -421.5 -719 70.8 7 -28255 16620.5 -43935.5 -116128 81.1
8 -1639 844.5 -1439.5 -1327 70.6 8 -149368 87872.5 -232258 -613885 81.1
9 -3015 2880.5 -2655.5 -4520 70.6 9 -789629 464516.5 -1227772 -3245174 81.1
10 -10280 5312.5 -9041.5 -8328 70.7 10 -4174206 2455545 -6490350 -1.7E+07 81.1
11 -18952 18084.5 -16657.5 -28365 70.6 11 -2.2E+07 12980701 -3.4E+07 -9.1E+07 81.1
12 -64533 33316.5 -56731.5 -52269 70.6 12 -1.2E+08 68619633 -1.8E+08 -4.8E+08 81.1

Is the problem ill conditioned? The answer is NO!


Iterative methods (Example)
Original Problem
x1 + 2 x2 − x4 = 1 A= 1 2 0 -1 b= 1
x2 + 2 x3 = 1.5 0 1 2 0 1.5
− x3 + 2 x4 = 1.5 0 0 -1 2 1.5
x1 + 2 x3 − x4 = 2 1 0 2 -1 2
Pivoting: Columns 2 to 1, 3 to
2, 4 to 3 and 1 to 4. After Pivoting
This is equivalent to change of A= 2 0 -1 1 b= 1
variables:
x1 (new) = x2 (original)
1 2 0 0 1.5
x2 (new) = x3 (original) 0 -1 2 0 1.5
x3 (new) = x4 (original) 0 2 -1 1 2
x4 (new) = x1 (original)
Iterative Methods (Example)
New Iteration Equations after A= 2 0 -1 1 b= 1
pivoting (variable identifiers 1 2 0 0 1.5
in the subscript are for the
new renamed variables):
0 -1 2 0 1.5
0 2 -1 1 2

1 + x3( k ) − x4( k ) ( k +1) 1.5 − x1( k )


x1( k +1) = x2 =
2 2
Jacobi 1.5 + x2( k ) 2 − 2 x2( k ) + x3( k )
x3( k +1) = x4( k +1) =
2 1
1 + x3( k ) − x4( k ) ( k +1) 1.5 − x1( k +1)
x1( k +1) = x2 =
Gauss 2 2
Seidel 1.5 + x2( k +1) 2 − 2 x2( k +1) + x3( k +1)
x3( k +1) = x4( k +1) =
2 1
Solution: Jacobi
Iter x1 x2 x3 x4 ||e|| 32 0.148 0.656 1.089 1.721 3.529 65 0.168 0.667 1.083 1.751 0.174
0 0.000 0.000 0.000 0.000 33 0.184 0.676 1.078 1.777 3.131 66 0.166 0.666 1.084 1.749 0.159
1 0.500 0.750 0.750 2.000 34 0.151 0.658 1.088 1.726 2.940 67 0.167 0.667 1.083 1.751 0.145
2 -0.125 0.500 1.125 1.250 60.000 35 0.181 0.675 1.079 1.772 2.612 68 0.166 0.666 1.084 1.749 0.133
3 0.438 0.813 1.000 2.125 41.176 36 0.153 0.659 1.087 1.730 2.449 69 0.167 0.667 1.083 1.751 0.121
4 -0.063 0.531 1.156 1.375 54.545 37 0.179 0.673 1.080 1.768 2.186 70 0.166 0.666 1.084 1.749 0.111
5 0.391 0.781 1.016 2.094 34.328 38 0.156 0.661 1.087 1.733 2.035 71 0.167 0.667 1.083 1.751 0.101
6 -0.039 0.555 1.141 1.453 44.086 39 0.177 0.672 1.080 1.765 1.827 72 0.166 0.666 1.083 1.749 0.093
7 0.344 0.770 1.027 2.031 28.462 40 0.157 0.662 1.086 1.736 1.697 73 0.167 0.667 1.083 1.751 0.084
8 -0.002 0.578 1.135 1.488 36.483 41 0.175 0.671 1.081 1.763 1.525 74 0.166 0.666 1.083 1.749 0.077
9 0.323 0.751 1.039 1.979 24.778 42 0.159 0.662 1.086 1.738 1.413 75 0.167 0.667 1.083 1.751 0.070
10 0.030 0.588 1.125 1.537 28.717 43 0.174 0.671 1.081 1.761 1.275 76 0.166 0.666 1.083 1.749 0.064
11 0.294 0.735 1.044 1.949 21.123 44 0.160 0.663 1.085 1.740 1.177 77 0.167 0.667 1.083 1.750 0.059
12 0.048 0.603 1.117 1.574 23.771 45 0.173 0.670 1.082 1.759 1.064 78 0.166 0.667 1.083 1.750 0.054
13 0.271 0.726 1.051 1.912 17.637 46 0.161 0.664 1.085 1.742 0.982 79 0.167 0.667 1.083 1.750 0.049
14 0.070 0.614 1.113 1.599 19.537 47 0.172 0.669 1.082 1.757 0.888 80 0.166 0.667 1.083 1.750 0.045
15 0.257 0.715 1.057 1.885 15.143 48 0.162 0.664 1.085 1.743 0.818 81 0.167 0.667 1.083 1.750 0.041
16 0.086 0.622 1.108 1.627 15.827 49 0.171 0.669 1.082 1.756 0.742 82 0.166 0.667 1.083 1.750 0.037
17 0.240 0.707 1.061 1.864 12.734 50 0.163 0.665 1.084 1.744 0.682 83 0.167 0.667 1.083 1.750 0.034
18 0.098 0.630 1.103 1.647 13.200 51 0.170 0.669 1.082 1.755 0.619 84 0.166 0.667 1.083 1.750 0.031
19 0.228 0.701 1.065 1.844 10.664 52 0.164 0.665 1.084 1.745 0.569 85 0.167 0.667 1.083 1.750 0.028
20 0.111 0.636 1.100 1.663 10.858 53 0.169 0.668 1.082 1.754 0.516 86 0.167 0.667 1.083 1.750 0.026
21 0.219 0.695 1.068 1.829 9.054 54 0.164 0.665 1.084 1.746 0.474 87 0.167 0.667 1.083 1.750 0.024
22 0.120 0.641 1.097 1.679 8.940 55 0.169 0.668 1.083 1.754 0.431 88 0.167 0.667 1.083 1.750 0.022
23 0.209 0.690 1.070 1.816 7.568 56 0.165 0.665 1.084 1.747 0.395 89 0.167 0.667 1.083 1.750 0.020
24 0.127 0.645 1.095 1.690 7.458 57 0.169 0.668 1.083 1.753 0.360 90 0.167 0.667 1.083 1.750 0.018
25 0.203 0.686 1.073 1.804 6.344 58 0.165 0.666 1.084 1.747 0.330 91 0.167 0.667 1.083 1.750 0.017
26 0.134 0.649 1.093 1.700 6.157 59 0.168 0.668 1.083 1.753 0.300 92 0.167 0.667 1.083 1.750 0.015
27 0.197 0.683 1.074 1.796 5.344 60 0.165 0.666 1.084 1.748 0.275 93 0.167 0.667 1.083 1.750 0.014
28 0.139 0.652 1.091 1.708 5.110 61 0.168 0.667 1.083 1.752 0.250 94 0.167 0.667 1.083 1.750 0.013
29 0.192 0.680 1.076 1.788 4.458 62 0.165 0.666 1.084 1.748 0.229 95 0.167 0.667 1.083 1.750 0.011
30 0.144 0.654 1.090 1.715 4.260 63 0.168 0.667 1.083 1.752 0.209 96 0.167 0.667 1.083 1.750 0.010
31 0.188 0.678 1.077 1.782 3.736 64 0.166 0.666 1.084 1.748 0.191 97 0.167 0.667 1.083 1.750 0.010

Number of iteration required to achieve a relative error of < 0.01% = 97


Solution: Gauss Seidel
Iter x1 x2 x3 x4 ||e||
0 0.000 0.000 0.000 0.000
1 0.500 0.500 1.000 2.000
2 0.000 0.750 1.125 1.625 30.769
3 0.250 0.625 1.063 1.813 13.793
4 0.125 0.688 1.094 1.719 7.273
5 0.188 0.656 1.078 1.766 3.540
6 0.156 0.672 1.086 1.742 1.794 Number of iteration
7 0.172 0.664 1.082 1.754 0.891 required to achieve a
8 0.164 0.668 1.084 1.748 0.447 relative error of <
9 0.168 0.666 1.083 1.751 0.223 0.01% = 14
10 0.166 0.667 1.083 1.750 0.112
11 0.167 0.667 1.083 1.750 0.056
12 0.167 0.667 1.083 1.750 0.028
13 0.167 0.667 1.083 1.750 0.014
14 0.167 0.667 1.083 1.750 0.007

So, what makes the methods diverge? When do we need pivoting


or scaling or equilibration for the iterative methods? Let’s
analyze for the convergence criteria!
Iterative Methods

 a11 a12 ⋅ ⋅ a1n   x1   b1 


a a22 ⋅ ⋅ a2 n   x2   b2 
 21    
 a31 a32 ⋅ ⋅ a3n   x3  =  b3 
 ⋅ ⋅ ⋅ ⋅ ⋅  ⋅   ⋅ 
    
am1 am 2 ⋅ ⋅ amn   xn  bm 

How the iteration schemes look in the


matrix form ?
Iterative Methods
A L
 a11 a12 ⋅ ⋅ a1n   0 0 ⋅⋅ ⋅ ⋅ ⋅⋅⋅⋅⋅ 0 
a a22 ⋅ ⋅ a2 n  = a 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0  +
 21   21 
 a31 a32 ⋅ ⋅ a3n   a31 a32 0 0 0 
 ⋅ ⋅ ⋅ ⋅ ⋅   ⋅ ⋅ ⋅ ⋅ ⋅ 
   
 an1 an 2 ⋅ ⋅ ann  an1 an 2 an3 ⋅ 0

D U
 a11 0 ⋅ ⋅ 0 ⋅ 0  0 a12 a13 ⋅ a1n 
0 a 0 0 ⋅ 0  + 0 0 a 
 22  23 ⋅ a 2n 

 0 0 a33 0 ⋅ 0  0 0 0 ⋅ a3n 
 ⋅ ⋅ ⋅ ⋅ ⋅   
  ⋅ ⋅ ⋅ ⋅ ⋅
 
 0 0 0 ⋅ ⋅ ⋅ann   0 0 ⋅ ⋅ 0 
Iterative Methods
• A= L+ D + U
• Ax = b translates to (L + D + U)x = b
• Jacobi: for an iteration counter k
Dx ( k +1) = −(U + L) x ( k ) + b
x ( k +1) = − D −1 (U + L) x ( k ) + D −1b

• Gauss Seidel: for an iteration counter k


( L + D) x ( k +1) = −Ux ( k ) + b
x ( k +1) = −( L + D) −1Ux ( k ) + ( L + D) −1 b
Iterative Methods: Convergence
• All iterative methods: x ( k +1) = Sx ( k ) + c
• Jacobi: S = − D −1 (U + L) c = D −1b

• Gauss Seidel: S = −( L + D) −1U c = ( L + D) −1 b

• For true solution vector (x): x = Sx + c


• True error: e(k) = x - x(k)
• e(k+1) = Se(k) or e(k) = Ske(0)
lim ( k )
• Methods will converge if: e =0
k →∞
lim k
S =0
k →∞
Iterative Methods: Convergence
• For the solution to exist, the matrix should have full
rank (= n)
• The iteration matrix S will have n eigenvalues {λ j }j =1
n

and n independent eigenvectors {v j }nj=1 that will form


the basis for a n-dimensional vector space
n
• Initial error vector: e = ∑ C j v j
( 0 )

j =1
n
• From the definition of eigenvalues: e
(k )
= ∑ C j λkj v j
j =1

• Necessary condition: ρ(S) < 1


• Sufficient condition: S < 1 because ρ ( A) ≤ A
Iterative Methods: Convergence
• For the solution to exist, the matrix should have full
rank (= n)
• The iteration matrix S will have n eigenvalues {λ j }j =1
n

and n independent eigenvectors {v j }nj=1 that will form


the basis for a n-dimensional vector space
n
• Initial error vector: e = ∑ C j v j
( 0 )

j =1
n
• From the definition of eigenvalues: e
(k )
= ∑ C j λkj v j
j =1

• Necessary condition: ρ(S) < 1


• Sufficient condition: S < 1 because ρ ( A) ≤ A
Iterative Methods: Convergence

Using the definition of S and using row-sum norm for


matrix S, we obtain the following as the sufficient
condition for convergence for both Jacobi and Gauss
Seidel:
n
aii > ∑ aij , i = 1,2,⋅ ⋅ ⋅ ⋅ n
j =1, j ≠i

If the original matrix is diagonally dominant, it will


always converge!
Previous Lecture: Linear Equations

• Iterative methods Gauss - Jacobi and Gauss - Seidel


• Convergence Criterion Diagonal dominance

Today:
• Eigenvalues
• Next Topic: System of nonlinear equations
Eigenvalues and Eigenvectors
• The system [A]{x}={b}: [A] operating on vector {x} to transform it
to another vector, {b}.
• It will, in general, lead to a change in “direction” as well as the
“length” of the vector {x}. Example, for a unit vector {x}:
1.5
1.25 0.75
[ A] =   1
 0 . 75 1 . 25 
0.5
cos θ 
{x} =   0
 sin θ  -2 -1
-0.5
0 1 2

1.25 cos θ + 0.75 sin θ 


{b} =  
-1

0.75 sin θ + 1.25 cos θ  -1.5

• For a particular vector, {x}, if [A]{x}=λ{x}, there is no rotation.


• λ is called an Eigenvalue of [A] and {x} is the corresponding
Eigenvector (it will only give a direction of the eigenvector, the
magnitude is arbitrary). λ denotes the change in “length” of {x}
Eigenvalues: Some properties
• Ax=λx => A2x=A Ax=A λx = λ2x. Eigenvalue of Ak will be λk
• Ax=λx => A−1x= (1/λ) x. Eigenvalue of A−1 will be 1/λ
• Since (A- λI)x=0, I being the n x n identity matrix,
determinant of (A- λI) must be zero
• Det (A- λI) is a polynomial of degree n: “Characteristic
Polynomial” of A (German word Eigen means inherent,
characteristic). The n eigenvalues may be obtained by using
methods described earlier, e.g., Bairstow
• Symmetric matrices: maximum length of Ax is along one of
the eigenvectors [the 2-norm of A is equal to its eigenvalue
of the maximum magnitude (also known as the spectral
radius of A, ρ(A), since the set of eigenvalues is called the
spectrum of the matrix. ρ(A) ≤ A , for all consistent norms]
• On the other hand, the 2-norm of A−1 will be 1/λmin
Eigenvalues: Some properties
• For symmetric matrices, therefore, if we use the 2-norm
−1
the condition number of A, C(A)= A . A , is equal to the
ratio λmax /λmin (Indicating that it is always ≥ 1, the lower the
better!)
• For a general matrix, the condition number is equal to the
square-root of the ratio λmax /λmin , where the eigenvalues
correspond to the symmetric matrix ATA.
• It means that finding the largest and smallest eigenvalues
of a matrix has great practical significance.
• We first look at the methods of finding these and then look
at the methods for finding ALL eigenvalues.
Eigenvalues: Finding the largest
• If we assume that A has n independent eigenvectors, xi,
i=1,…,n; any vector, say z(0), may be written as z(0) = c1 x1 + c2
x2 +…+ cn xn
• Then, Az(0) = c1 λ1x1 + c2 λ2x2 +…+ cn λnxn
• And, Akz(0) = c1 λ1k x1 + c2 λ2k x2 +…+ cn λnk xn
• We now assume that A has a single dominant eigenvalue,
say, λ1 (|λ1| > |λ2|≥ |λ3|≥ … |λn-1|≥ |λn|)
• As k becomes large, the resulting vector tends to the
dominant eigenvector. However, its length tends to become
infinite (|λ1|>1) or zero (|λ1|<1). Therefore, we normalize
the length at each step to make it of unit norm (typically L2,
but L1 or L∞ may be used, L∞ being the most convenient)
4

0
0 2 4 6 8 10 10

0
0 5 10 15 20 25
Largest Eigenvalue : Power method
• The algorithm is written as:
 Choose an arbitrary unit vector z(0)
 Multiply z(0) by A and normalize to get z(1)
(i +1) (i ) (i )
 Repeat till z and z are the same (
(i) (i+1) z = Az / Az )
 z is the eigenvector and the normalization factor is the
corresponding eigenvalue
 2 1  1
• Example: . Choose starting vector as  
1 2  0
 

• Iterations: 2; 1 2; 1 5; 1 5; 1 14; 1 14;


1  5 1  5 4  41 4 41 13 365 13

1  41 1  41 1 122 1 122 1 365


 ;  ;  ;  ;..;  
365 40 3281 40 3281 121 29525 121 265721 364
Power Method
1
• Converging towards the Eigenvector  
1
265721
• The eigenvalue is approximately =3
29525

• If we use L∞ norm:

2  1  2.5 5 / 6  8 / 3  16 / 17 49 / 17


 ;  ;  ;  ;  ;  ;  
1  0.5  3   1  17 / 6  1  50 / 17 
148 / 17  148 / 149 445 / 149 445 / 446
 ;  ;  ;  
149 / 17   1  446 / 149  1 
• Again, same eigenvector and eigenvalue=446/149, nearly 3
Smallest Eigenvalue – Inverse power method
• Inverse of A has eigenvalues which are reciprocal of those
of A. Power method to get the largest eigenvalue of A−1 will
give us the smallest eigenvalue of A, provided it is unique.
 2 1   2 / 3 − 1 / 3 1
• Example: Inverse of  is
 
; starting vector 
 1 2 
 − 1 / 3 2 / 3  0
• Iterations:
 2 / 3   1   5 / 6   1   14 / 15   41 / 45 
 ;  ;  ;  ;  ;  
− 1 / 3 − 1 / 2 − 2 / 3 − 4 / 5 − 13 / 15 − 40 / 45
1
• Converging towards −1 with eigenvalue 1. Smallest
 
eigenvalue for A is, therefore, 1/1=1.
• Since computation of inverse is time-consuming, it is more
efficient to write z (i+1) = A−1 z (i ) / A−1 z (i ) as Az (i+1) = z (i ) followed
by normalization. The system is efficiently solved by using
LU decomposition!
Eigenvalue “closest to θ“ – Inverse power with shift
• The eigenvalues of A−θI are (λ−θ). Applying inverse power
method to this matrix gives us the smallest eigenvalue of
A−θI, which implies that by adding θ to it, we get the
eigenvalue of A which is closest to θ.
• Example: Find the eigenvalue of 2 1 closest to 2.5
1 2 
 
− 1 / 2 1  2 / 3 4 / 3 1
• A−θI =   and the inverse is   . Use  
 1 − 1 / 2 4 / 3 2 / 3 0

• Iterations: 2 / 3; 1 / 2; 5 / 3;  1 ; 26 / 15; 13 / 14;  41 / 21


       
 4 / 3 1
  4 / 3 4 / 5  28 / 15  1  40 / 21
• Converges to (1,1), eigenvalue is about 2. Smallest
eigenvalue of A−θI is ½=0.5. Closest to 2.5 is, therefore, 3.
Eigenvalues: Finding All
• Directly from the Characteristic Equation: How to efficiently
obtain the Characteristic Polynomial – Fadeev-Leverrier
• Using similarity transformation: Reduce to diagonal or
triangular form – QR decomposition
• The characteristic equation, Det (A- λI)=0, may be written
(−1) (λ − an−1λ − ... − a1λ − a0 ) = 0
as n n
− an − 2 λ
n −1 n−2

• It may be seen that a =


n −1 ∑a
i =1,...,n
ii , i.e., trace(A)

• Fedeev-Leverrier came up with an algorithm for obtaining


all the coefficients of the polynomial.
Characteristic Polynomial: Fadeev-Leverrier method
• Set An-1=A and, as seen, an-1=trace(An-1)
• For i=n-2, n-3, …,1,0:
• Ai=A(Ai+1-ai+1 I); ai=trace(Ai)/(n-i)
• Solve using any of the methods discussed earlier to get all
the eigenvalues.
3 4 1 3 4 1 − 4 − 2 − 1 
• Example: 3 5 1 ;A2=3 5 1 a2=9;A1= − 1 − 6 0  a1=-7
 
2 2 1 2 2 1 − 4 2 − 4
1 0 0
0 1 0 
A0=   a0=1 => Characteristic polynomial is
0 0 1
(-1) (λ - 9 λ +7 λ-1): Roots 8.16, 0.66, 0.19
3 2
Finding All Eigenvalues : Similarity Transform
• If A=S-1BS, the eigenvalues of B will be same as those of A
• S-1BS x = λx => By= λy, where y=Sx => det (B-λI)=0
• A and B are said to be similar
• If we could perform the similarity transformation in such a
way that B is diagonal or triangular, the diagonal elements
will give us the eigenvalues!
• One of the methods is the QR method, in which Q is an
orthogonal matrix and R is upper triangular. Since Q is
orthogonal, its inverse is equal to its transpose.
• An iterative method is followed to achieve the
transformation.
Finding All Eigenvalues : QR method
• The iterative sequence is written as:
• A0=A
• For k=0,1,2,….till convergence
• Ak=QkRk : Perform a QR decomposition of A
• Ak+1=QkT Ak Qk = Rk Qk
System of nonlinear equations
• Example: x1 + x2 = 2 ⇒ f1 ( x1 , x2 ) = x1 + x2 − 4 = 0
2 2 2 2 2

x12 − x24 = 1 ⇒ f 2 ( x1 , x2 ) = x12 − x24 − 1 = 0


• Plot the functions:
3

2.5

• Bracketing does 2

not work 1.5

0.5

Solution: 0
x2

±1.64, ± 1.14 -0.5

-1

-1.5

-2

-2.5

-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0

• Similar to the single equation method, we write:


x1 = φ1 ( x1 , x2 ,..., xn )
x2 = φ2 ( x1 , x2 ,..., xn )
• Iterations: {x(i+1)}={φ(x(i))} .
.
in which {x} and {φ} are vectors
xn = φn ( x1 , x2 ,..., xn )
Fixed Point Method: Example
+ y y 11.72; x y =
xx = + y x 6.71

(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y

• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y

( x(i ) , y(i ) )

y ( i +1)
(
= 11.72 − x )
x 1/ y

( x ( i +1) , y ( i ) )

• We could use a Jacobi style scheme, but Seidel is


preferred – update values as these are being computed
• Computations, starting with x=y=2
i x y Φ1(x,y) Φ2(x,y) E1a(%) E2a(%) max(E1a,E2a)

0 2 2 1.646208 3.073785

1 1.646208 3.073785 0.716856 2.177338 -21.4913 34.93364 34.933643

2 0.716856 2.177338 2.087117 2.456267 -129.643 -41.1717 129.6428016

3 2.087117 2.456267 0.503584 2.655634 65.65331 11.35583 65.65330821

4 0.503584 2.655634 1.843432 2.251655 -314.453 7.507304 314.452503

5 1.843432 2.251655 1.432151 2.786309 72.68224 -17.9414 72.68223984

10 1.739954 2.445766 1.319329 2.592682 34.87487 -0.94789 34.87487135

15 1.605152 2.520952 1.391191 2.506223 12.05857 3.04661 12.05857347

20 1.521199 2.525046 1.46397 2.486363 1.01676 2.296381 2.296380772

25 1.492373 2.512254 1.497092 2.489679 -1.89271 0.927234 1.892713526

31 1.505144 2.49584 1.49943 2.504048 0.953095 -0.29328 0.953095181


Fixed Point Method: Convergence
e ( i +1)
ξ−x
= ( i +1)
φ (ξ ) − φ ( x )
= (i )

• ξ is also a vector

∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1 
x1
∂x2 
x2
∂xn 
xn

• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn

convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x

( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )

(If J is easily invertible, else solve linear system J∆x=-f)


Newton Method: Example
+ y y 11.72; x y =
xx = + y x 6.71
f1 ( x, y ) = x x + y y − 11.72 = 0; f 2 ( x, y ) = x y + y x − 6.71 = 0
f1'x = x x (1 + ln x); f1'y = y y (1 + ln y ); f 2' x = yx y −1 + y x ln y; f 2' y = x y ln x + xy x −1

• Iterations:
( k +1)
 x (1 + ln x)
x
y (1 + ln y ) 
y
 ∆x   x x + y y − 11.72
 y −1    = − y 
 yx + y x
ln y x y ln x + xy x −1  ( x ( k ) , y ( k ) ∆y  x
 x + y − 6.71 ( x ( k ) , y ( k )

• Computations, starting with x=1, y=2


• (Will not converge, if we start with 2,2. Fixed-point would
not have converged if we started with 1,2!)
i x1 x2 f1(x1,x2) f2(x1,x2) f1x1` f1x2` f2x1` f2x2` J(x1,x2) J-1(x1,x2) ∆x

0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392

3.3863 1.0000 0.1544 -0.0456 0.8683

1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435

16.2721 7.9513 0.0251 -0.0076 -0.2533

2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868

10.0186 4.4150 0.0437 -0.0135 -0.1020

3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086

8.3839 3.5681 0.0545 -0.0171 -0.0128

4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001

8.2181 3.4910 0.0560 -0.0176 -0.0002


Previous Lecture: Eigenvalues
• Significance of eigenvalues
• Finding largest eigenvalue: Power method
• Smallest eigenvalue: Inverse Power method
• Eigenvalue closest to θ: Inverse Power method with shift

Today:
• Finding All eigenvalues
• Eigenvectors and multiplicity
Finding All Eigenvalues
• Directly from the Characteristic Equation: How to
efficiently obtain the Characteristic Polynomial –
Faddeev-Le Verrier
• Using similarity transformation: Reduce to diagonal
or triangular form – QR decomposition
• The characteristic equation, det (A- λI)=0, may be
written as
(−1) (λ − an−1λ
n n n −1
− an − 2 λ
n−2
− ... − a1λ − a0 ) = 0
• It may be seen that an−1 = ∑ aii , i.e., trace(A)
• and a = (− 1) det (A)
n +1 i =1,...,n
0
Finding All Eigenvalues
• Fedeev-Leverrier came up with an algorithm for
obtaining all the coefficients of the polynomial.
• Set An-1=A and, as seen, an-1=trace(An-1)
• For i=n-2, n-3, …,1,0:
Ai=A(Ai+1-ai+1 I); ai=trace(Ai)/(n-i)
• Solve using any of the methods discussed earlier
to get all the eigenvalues.
• Another side-benefit is that we get the inverse of
A as A−1=[ A1-a1I ]/a0
Faddeev-Le Verrier method: Example
• Ai=A(Ai+1-ai+1 I); ai=trace(Ai)/(n-i) A−1=[ A1-a1I ]/a0
3 4 1 3 4 1
3 5 1 3 5 1
• Example: A=   A2=   a2=9;
2 2 1 2 2 1
− 4 − 2 − 1  1 0 0

A1=  − 1 − 6 0  a =-7; A = 0 1 0  a0=1
 1 0
 
− 4 2 − 4 0 0 1
• Characteristic polynomial is
(-1) (λ3- 9 λ2+7 λ-1); Roots: 8.16, 0.66, 0.19
 3 − 2 − 1
• Inverse is  −1 1 0
 
− 4 2 3 
Finding All Eigenvalues : Similarity Transform

• If A=S-1BS, the eigenvalues of B will be same as


those of A
• S-1BS x = λx => By= λy, where y=Sx => det (B-λI)=0
• A and B are said to be similar
(Note: Eigenvectors are NOT same –> x for A and
Sx for B)
• If we could perform the similarity transformation
in such a way that B is diagonal or triangular, the
diagonal elements will give us the eigenvalues!
Finding All Eigenvalues : QR Method
• One of the methods is the QR method, in which
Q is an orthogonal matrix and R is upper
triangular. Since Q is orthogonal, its inverse is
equal to its transpose.
• An iterative method is followed to achieve the
transformation. Assumption: A has n linearly
independent eigenvectors.
• Orthogonal matrix is generated by using the
Gram-Schmidt orthogonalization technique
Orthogonalization: Gram-Schmidt Method
• If there are n linearly independent vectors, say,
xi, we can generate a set of n orthonormal
vectors, say, yi.
• Take the first orthogonal unit vector, y1, in the
direction of any one of these, say, x1.
• To generate the second unit vector, y2, which is
orthogonal to y1, project x2 on y1 and take

y2 =
( T
x2 − x y1 y1
2 )
( T
x2 − x y1 y1
2 )
3

3 1
Take x1 =  ; x2 =   2.5
1 3
1 3 2

y1 =  
10 1 1.5

T 6
x y1 =
2
1

10
 − 4 / 5
0.5

( T
)
x2 − x y1 y1 = 
2 
 12 / 5  -1
0
0 1 2 3

1 − 1
y2 =  
10  3  y2 =
( T
x2 − x y1 y1
2 )
( T
x2 − x y1 y1
2 )
Orthogonalization: Gram-Schmidt Method
• It is easy to show that y2 is orthogonal to y1:

(x − (x y )y )
2
T
2 1 1
T T
( T
) T
y1 = x y1 − x y1 y y1 = 0
2 2 1

• Similar philosophy is used to generate the other


vectors of the orthogonal set.
• The orthogonality may be proved by showing
that if the yi are orthogonal up to i=k, yk+1 is
orthogonal to all yi from i=1 to k. And we have
already shown it to be true for k=2 (in fact, k=1
will work!)
Orthogonalization: Gram-Schmidt Method
• To generate the third unit vector, y3, which is
orthogonal to both y1 and y2,

y3 =
( T
)
x3 − x y1 y1 − x y2 y2
3 ( T
3 )
( T
)
x3 − x y1 y1 − x y2 y2
3 ( T
3 )
• The general equation is
k
xk +1 − ∑ x ( T
y yi
k +1 i )
yk +1 = i =1
k
xk +1 − ∑ x ( T
y yi
k +1 i )
i =1
Finding All Eigenvalues : QR method
• We generate an orthogonal matrix Q
• We know that if A=S-1BS, the eigenvalues of B
will be same as those of A
• Also, if Q is orthogonal, its transpose is its
inverse
• If A=QTBQ for some Q and B, and if B is
diagonal or triangular, we get the eigenvalues
• For example, since a symmetric matrix has
orthogonal eigenvectors, we could construct
Q by using the eigenvectors as its columns
Finding All Eigenvalues : QR method

Q = [{x1} {x2 } . . {xn }]


AQ = [λ1{x1} λ2 {x2 } . . λn {xn }] = QD
where D is a diagonal matrix with eigenvalues
on the diagonal
Consequently, A=QDQT
Since we don’t know the eigenvectors, how to
construct Q to obtain the triangular form of
B? (It is easier to achieve a triangular B than a
diagonal B!)
Finding All Eigenvalues : QR method
• An iterative technique is used
• The iterative sequence is written as:
• A0=A
• For k=0,1,2,….till convergence
Ak=QkRk: Perform a QR decomposition of A
Ak+1=QkT Ak Qk = Rk Qk

y1 , y2 , y3 are vectors of Q after Gram-Schmidt


r11 = y1T x1, r12 = y1T x2, r13 = y1T x3
Finding All Eigenvalues : QR method
• The QR decomposition of A is obtained by using the
Gram-Schmidt orthogonalization with columns of A as
the x vectors. A=QR => R=QTA. When would R be
upper triangular?
2nd col of Q is orthogonal to 1st col of A: r21=0
3rd col of Q is orthogonal to 1st and 2nd columns of
A: r31=r32=0. And so on.
• If we take the first col of Q as a unit vector along the
first column vector of A, Q will be orthogonal
Take the first column of Q as a unit vector in the
same direction as the first column of A:
{q}1={a}1/|{a}1|
Follow the orthogonalization procedure described
earlier, using subsequent columns of A
QR method: Example
• Ak=QkRk ; Rk=QkT Ak; Ak+1 = Rk Qk
y1 , y2 , y3 are vectors of Q after Gram-Schmidt
r11 = y1T x1, r12 = y1T x2, r13 = y1T x3
2 1 1 0.81650 - 0.49237 - 0.30151 2.44949 2.04124 2.04124
A = 1 2 1; Q = 0.40825 0.86164 - 0.30151; R =  0 1.35401 0.61546
1 1 2 0.40825 0.12309 0.90453   0 0 1.20605 

3.6667 0.8040 0.4924 0.9685 - 0.2155 - 0.1249 3.7859 1.0619 0.6503


A = RQ = 0.8040 1.2424 0.1485; Q = 0.2124 0.9765 - 0.0377 ; R =  0 1.0414 0.0497
0.4924 0.1485 1.0909   0.1301 0.0099 0.9915   0 0 1.0145 
 3.977 0.2276 0.1319
A = RQ = 0.2276 1.0174 0.0101
0.1319 0.0101 1.0058 
• After 3 iterations A is nearly diagonal
Q 0.9999 -0.0143 -0.0083
0.0143 0.9999 -0.0002
0.0083 0.0000 1.0000

R 3.9991 0.0717 0.0414


0.0000 1.0002 0.0002
0.0000 0.0000 1.0001

A 3.9999 0.0144 0.0083


0.0144 1.0001 0.0000
0.0083 0.0000 1.0000

• Eigenvalues of 4,1, and 1


Finding Eigenvectors
• Once the eigenvalues are obtained, use
[A-λI]{x}={0}
to solve for {x}
• Note that a unique solution does not exist
• It will only give the direction of the vector
• Example: 3 1
A= 
1 3 Eigenvalues: 2 and 4
 For 2, x1+x2=0 => Eigenvector is {1,-1}T
 For 4, -x1+x2=0 => Eigenvector is {1,1}T
Finding Eigenvectors: Multiple eigenvalues
• What if an eigenvalue is repeated? Known as the
algebraic multiplicity. E.g., in the QR method,
eigenvalue “1” had an algebraic multiplicity of 2.
2 1 1 

A = 1 2 1 
1 1 2
• For this value, we get x1+x2+x3=0
• Eigenvectors may be taken as {1,-1,0}T and
{1,0,-1}T . Two linearly independent eigenvectors
for the same eigenvalue: called geometric
multiplicity.
Finding Eigenvectors: Multiple eigenvalues
• Another example: 2 1 
A= 
0 2
• Eigenvalue “2” has algebraic multiplicity of 2
• For this value, we get x2=0
• A single eigenvector {1,0}T : geometric
multiplicity is 1.
• Geometric multiplicity is always ≤ Alg. Mult
• A defective matrix has GM<AM for some λ
Finding eigenvalues for given eigenvectors
• Straightforward: Ax=λx => xTAx= λxTx

• Therefore: λ= xTAx/(xTx)

• Known as Rayleigh’s quotient


System of nonlinear equations
• Example: x1 + x2 = 2 ⇒ f1 ( x1 , x2 ) = x1 + x2 − 4 = 0
2 2 2 2 2

x12 − x24 = 1 ⇒ f 2 ( x1 , x2 ) = x12 − x24 − 1 = 0


• Plot the functions:
3

2.5

• Bracketing does 2

not work 1.5

0.5

Solution: 0
x2

±1.64, ± 1.14 -0.5

-1

-1.5

-2

-2.5

-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0

• Similar to the single equation method, we write:


x1 = φ1 ( x1 , x2 ,..., xn )
x2 = φ2 ( x1 , x2 ,..., xn )
• Iterations: {x(i+1)}={φ(x(i))} .
.
in which {x} and {φ} are vectors
xn = φn ( x1 , x2 ,..., xn )
Fixed Point Method: Example
+ y y 11.72; x y =
xx = + y x 6.71

(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y

• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y

( x(i ) , y(i ) )

y ( i +1)
(
= 11.72 − x )
x 1/ y

( x ( i +1) , y ( i ) )

• We could use a Jacobi style scheme, but Seidel is


preferred – update values as these are being computed
• Computations, starting with x=y=2
i x y Φ1(x,y) Φ2(x,y) E1a(%) E2a(%) max(E1a,E2a)

0 2 2 1.646208 3.073785

1 1.646208 3.073785 0.716856 2.177338 -21.4913 34.93364 34.933643

2 0.716856 2.177338 2.087117 2.456267 -129.643 -41.1717 129.6428016

3 2.087117 2.456267 0.503584 2.655634 65.65331 11.35583 65.65330821

4 0.503584 2.655634 1.843432 2.251655 -314.453 7.507304 314.452503

5 1.843432 2.251655 1.432151 2.786309 72.68224 -17.9414 72.68223984

10 1.739954 2.445766 1.319329 2.592682 34.87487 -0.94789 34.87487135

15 1.605152 2.520952 1.391191 2.506223 12.05857 3.04661 12.05857347

20 1.521199 2.525046 1.46397 2.486363 1.01676 2.296381 2.296380772

25 1.492373 2.512254 1.497092 2.489679 -1.89271 0.927234 1.892713526

31 1.505144 2.49584 1.49943 2.504048 0.953095 -0.29328 0.953095181


Fixed Point Method: Convergence
e ( i +1)
ξ−x
= ( i +1)
φ (ξ ) − φ ( x )
= (i )

• ξ is also a vector

∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1 
x1
∂x2 
x2
∂xn 
xn

• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn

convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x

( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )

(If J is easily invertible, else solve linear system J∆x=-f)


Newton Method: Example
+ y y 11.72; x y =
xx = + y x 6.71
f1 ( x, y ) = x x + y y − 11.72 = 0; f 2 ( x, y ) = x y + y x − 6.71 = 0
f1'x = x x (1 + ln x); f1'y = y y (1 + ln y ); f 2' x = yx y −1 + y x ln y; f 2' y = x y ln x + xy x −1

• Iterations:
( k +1)
 x (1 + ln x)
x
y (1 + ln y ) 
y
 ∆x   x x + y y − 11.72
 y −1    = − y 
 yx + y x
ln y x y ln x + xy x −1  ( x ( k ) , y ( k ) ∆y  x
 x + y − 6.71 ( x ( k ) , y ( k )

• Computations, starting with x=1, y=2


• (Will not converge, if we start with 2,2. Fixed-point would
not have converged if we started with 1,2!)
i x1 x2 f1(x1,x2) f2(x1,x2) f1x1` f1x2` f2x1` f2x2` J(x1,x2) J-1(x1,x2) ∆x

0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392

3.3863 1.0000 0.1544 -0.0456 0.8683

1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435

16.2721 7.9513 0.0251 -0.0076 -0.2533

2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868

10.0186 4.4150 0.0437 -0.0135 -0.1020

3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086

8.3839 3.5681 0.0545 -0.0171 -0.0128

4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001

8.2181 3.4910 0.0560 -0.0176 -0.0002


Previous Lecture: Eigenvalues
• Finding all eigenvalues: Faddeev-Le Verrier, QR
• Finding eigenvectors

Today:
• Eigenvectors and multiplicity
• Iterative methods for linear equations: Convergence
• System of nonlinear equations
Finding Eigenvectors
• Once the eigenvalues are obtained, use
[A-λI]{x}={0}
to solve for {x}
• Note that a unique solution does not exist
• It will only give the direction of the vector
• Example: 3 1
A= 
1 3 Eigenvalues: 2 and 4
 For 2, x1+x2=0 => Eigenvector is {1,-1}T
 For 4, -x1+x2=0 => Eigenvector is {1,1}T
Finding Eigenvectors: Multiple eigenvalues
• What if an eigenvalue is repeated? Known as the
algebraic multiplicity. E.g., in the QR method,
eigenvalue “1” had an algebraic multiplicity of 2.
2 1 1 

A = 1 2 1  Eigenvalues : 4, 1, 1
1 1 2
• For this value, we get x1+x2+x3=0
• Eigenvectors may be taken as {1,-1,0}T and
{1,0,-1}T . Two linearly independent eigenvectors
for the same eigenvalue: called geometric
multiplicity of “2”.
Finding Eigenvectors: Multiple eigenvalues
• Another example: 2 1 
A= 
0 2
• Eigenvalue “2” has algebraic multiplicity of 2
• For this value, we get x2=0
• A single eigenvector {1,0}T : geometric
multiplicity is 1.
• Geometric multiplicity is always ≤ Alg. Mult
• A defective matrix has GM<AM for some λ
(called defective eigenvalue): It will not have n
linearly independednt eigenvalues.
Finding eigenvalues for given eigenvectors
• Straightforward: All components are multiplied
by the factor λ. Ratio of the L1, L2, or L∞ norm of
Ax and x could be used.

• Ax=λx => xTAx= λxTx

• Therefore: λ= xTAx/(xTx)

• Known as Rayleigh’s quotient


Iterative methods for linear equations

• What are the conditions of convergence


for the iterative methods?

• Rate of convergence? Can we make


them converge faster?
Iterative Methods
A L
 a11 a12 ⋅ ⋅ a1n   0 0 ⋅⋅ ⋅ ⋅ ⋅⋅⋅⋅⋅ 0 
a a22 ⋅ ⋅ a2 n  = a 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0  +
 21   21 
 a31 a32 ⋅ ⋅ a3n   a31 a32 0 0 0 
 ⋅ ⋅ ⋅ ⋅ ⋅   ⋅ ⋅ ⋅ ⋅ ⋅ 
   
 an1 an 2 ⋅ ⋅ ann  an1 an 2 an3 ⋅ 0

D U
 a11 0 ⋅ ⋅ 0 ⋅ 0  0 a12 a13 ⋅ a1n 
0 a 0 0 ⋅ 0  + 0 0 a 
 22  23 ⋅ a 2n 

 0 0 a33 0 ⋅ 0  0 0 0 ⋅ a3n 
 ⋅ ⋅ ⋅ ⋅ ⋅   
  ⋅ ⋅ ⋅ ⋅ ⋅
 
 0 0 0 ⋅ ⋅ ⋅ann   0 0 ⋅ ⋅ 0 
Iterative Methods
• A= L+ D + U
• Ax = b translates to (L + D + U)x = b
• Jacobi: for an iteration counter k
Dx ( k +1) = −(U + L) x ( k ) + b
x ( k +1) = − D −1 (U + L) x ( k ) + D −1b

• Gauss Seidel: for an iteration counter k


( L + D) x ( k +1) = −Ux ( k ) + b
x ( k +1) = −( L + D) −1Ux ( k ) + ( L + D) −1 b
Iterative Methods: Convergence
• All iterative methods: x ( k +1) = Sx ( k ) + c
• Jacobi: S = − D −1 (U + L) c = D −1b

• Gauss Seidel: S = −( L + D) −1U c = ( L + D) −1 b

• For true solution vector (ξ): ξ = S ξ + c


• True error: e(k) = ξ − x(k)
• e(k+1) = Se(k) or e(k) = Ske(0)
• Methods will converge if:
lim (k ) lim
e = 0; i.e., Sk = 0
k →∞ k →∞
Iterative Methods: Convergence
• For the solution to exist, the matrix should have full
rank (= n)
• The iteration matrix S will have n eigenvalues {λ j }j =1
n

and n independent eigenvectors {v j }nj=1


• Initial error vector: (0) n
e = ∑ C jv j
j =1
n
• From the definition of eigenvalues: e( k ) = ∑ C λk v
j j j
j =1

• Necessary condition: ρ(S) < 1


• Sufficient condition: S < 1 because ρ ( A) ≤ A . Why?
Ax = λx ⇒ λ x = Ax ⇒ λ x ≤ A . x ⇒ λ ≤ A
Jacobi Convergence
 aij
− for i ≠ j
a
 ii
S = − D−1(L+U) sij = 
 0 for i = j


If we use infinity (row-sum) norm:
max n max n aij
S ∞
= ∑
1 ≤ i ≤ n j =1
sij = ∑
1 ≤ i ≤ n j =1, j ≠i aii
n
aii > ∑a
j =1, j ≠ i
ij , i = 1,2,⋅ ⋅ ⋅ ⋅ n
Iterative Methods: Convergence

Using the definition of S and using row-sum norm for


matrix S, we obtain the following as the sufficient
condition for convergence for Jacobi
n
aii > ∑ aij , i = 1,2,⋅ ⋅ ⋅ ⋅ n
j =1, j ≠i

If the original matrix is diagonally dominant, Jacobi


method will always converge!
(Gauss Seidel convergence is a little harder to prove, but
diagonal dominance is sufficient for that also)
Rate of Convergence
For large k:
e(k )
≅ [ρ (S )]
k
or (0)
e
Rate of Convergence
Number of iteration (k) required to decrease the
initial error by a factor of 10−m is then given by:
(k )
e
≅ [ρ (S )] = 10
k −m
(0)
e

or
Improving Convergence
Denoting: ρ (S ) = λ max

e ( k +1)
≅ λ max e (k )
or e ( k +1)
−e (k )
≅ λ max (e (k )
−e ( k −1)
)

For any iterative method: x(k+1) = x(k) + d(k)


Improving Convergence
Recall Gauss Seidel:
i −1 n
bi − ∑ aij x (jk +1) − ∑ ij j
a x (k )

j =1 j =i +1
xi( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
aii
Rewrite as:
Successive Over/Under Relaxation

0 < ω < 1 : Under relaxation


ω = 1 : Gauss Seidel
1 < ω < 2 : Over Relaxation

Non-convergent if ω is outside the range (0,2)


System of nonlinear equations
• Example: x1 + x2 = 2 ⇒ f1 ( x1 , x2 ) = x1 + x2 − 4 = 0
2 2 2 2 2

x12 − x24 = 1 ⇒ f 2 ( x1 , x2 ) = x12 − x24 − 1 = 0


• Plot the functions:
3

2.5

• Bracketing does 2

not work 1.5

0.5

Solution: 0
x2

±1.64, ± 1.14 -0.5

-1

-1.5

-2

-2.5

-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0

• Similar to the single equation method, we write:


x1 = φ1 ( x1 , x2 ,..., xn )
x2 = φ2 ( x1 , x2 ,..., xn )
• Iterations: {x(i+1)}={φ(x(i))} .
.
in which {x} and {φ} are vectors
xn = φn ( x1 , x2 ,..., xn )
Fixed Point Method: Example
+ y y 11.72; x y =
xx = + y x 6.71

(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y

• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y

( x(i ) , y(i ) )

y ( i +1)
(
= 11.72 − x )
x 1/ y

( x ( i +1) , y ( i ) )

• We could use a Jacobi style scheme, but Seidel is


preferred – update values as these are being computed
• Computations, starting with x=y=2
i x y Φ1(x,y) Φ2(x,y) E1a(%) E2a(%) max(E1a,E2a)

0 2 2 1.646208 3.073785

1 1.646208 3.073785 0.716856 2.177338 -21.4913 34.93364 34.933643

2 0.716856 2.177338 2.087117 2.456267 -129.643 -41.1717 129.6428016

3 2.087117 2.456267 0.503584 2.655634 65.65331 11.35583 65.65330821

4 0.503584 2.655634 1.843432 2.251655 -314.453 7.507304 314.452503

5 1.843432 2.251655 1.432151 2.786309 72.68224 -17.9414 72.68223984

10 1.739954 2.445766 1.319329 2.592682 34.87487 -0.94789 34.87487135

15 1.605152 2.520952 1.391191 2.506223 12.05857 3.04661 12.05857347

20 1.521199 2.525046 1.46397 2.486363 1.01676 2.296381 2.296380772

25 1.492373 2.512254 1.497092 2.489679 -1.89271 0.927234 1.892713526

31 1.505144 2.49584 1.49943 2.504048 0.953095 -0.29328 0.953095181

φ1 ( x, y ) = (6.71 − y )
x 1/ y
(
; φ2 ( x, y ) = 11.72 − x )
x 1/ y
Fixed Point Method: Convergence
e ( i +1)
ξ−x
= ( i +1)
φ (ξ ) − φ ( x )
= (i )

• ξ is also a vector

∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1 
x1
∂x2 
x2
∂xn 
xn

• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn

convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x

( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )

(If J is easily invertible, else solve linear system J∆x=-f)


Newton Method: Example
+ y y 11.72; x y =
xx = + y x 6.71
f1 ( x, y ) = x x + y y − 11.72 = 0; f 2 ( x, y ) = x y + y x − 6.71 = 0
f1'x = x x (1 + ln x); f1'y = y y (1 + ln y ); f 2' x = yx y −1 + y x ln y; f 2' y = x y ln x + xy x −1

• Iterations:
( k +1)
 x (1 + ln x)
x
y (1 + ln y ) 
y
 ∆x   x x + y y − 11.72
 y −1    = − y 
 yx + y x
ln y x y ln x + xy x −1  ( x ( k ) , y ( k ) ∆y  x
 x + y − 6.71 ( x ( k ) , y ( k )

• Computations, starting with x=1, y=2


• (Will not converge, if we start with 2,2. Fixed-point would
not have converged if we started with 1,2!)
( k +1)
 x (1 + ln x)
x
y (1 + ln y ) 
y
 ∆x   x x + y y − 11.72
 y −1    = − y 
 yx + y x
ln y x y ln x + xy x −1  ( x ( k ) , y ( k ) ∆y  x
 x + y − 6.71 ( x ( k ) , y ( k )

i x1 x2 f1(x1,x2) f2(x1,x2) f1x1` f1x2` f2x1` f2x2` J(x1,x2) J-1(x1,x2) ∆x

0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392

3.3863 1.0000 0.1544 -0.0456 0.8683

1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435

16.2721 7.9513 0.0251 -0.0076 -0.2533

2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868

10.0186 4.4150 0.0437 -0.0135 -0.1020

3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086

8.3839 3.5681 0.0545 -0.0171 -0.0128

4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001

8.2181 3.4910 0.0560 -0.0176 -0.0002


y = Sx ⇒ (D + L ) y = −Ux
a jj y j + ∑ a jk yk = −∑ a jk xk
k< j k> j

Apply it to the j for which | y | is maximum

| a jj y j | + ∑ a jk yk ≥| a jj y j | − ∑ a jk yk
k< j k< j

≥| a jj | . | y j | −∑ | a jk | . | yk |
k< j

∑| a
k> j
jk | . | xk | ≤ ∑ | a jk | . x
k> j

≥| a jj | . | y j | −∑ | a jk | . | y j |
k< j

   
 | a jj | −∑ | a jk |  y ∞ ≤ ∑ | a jk | . x ≥  | a jj | −∑ | a jk |  y
 ∞  k< j 

 k< j  k> j

∑ | a jk |
k> j
S ≤

 
 | a jj | −∑ | a jk | 

 k< j 
Previous Lecture: Eigenvalues, Nonlinear system
• Eigenvectors and multiplicity
• Iterative methods for linear equations: Convergence
• System of nonlinear equations

Today:
• Approximation of functions and data
+ y 11.72; x =
x = x
+ y 6.71
y y x

( k +1)
 x (1 + ln x)
x
y (1 + ln y ) 
y
 ∆x   x x + y y − 11.72
 y −1 x −1    = − y 
 yx + y x
ln y x ln x + xy  ( x ( k ) , y ( k ) ) ∆y 
y x
 x + y − 6.71 ( x ( k ) , y ( k ) )

i x1 x2 f1(x1,x2) f2(x1,x2) f1x1` f1x2` f2x1` f2x2` J(x1,x2) J-1(x1,x2) ∆x

0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392

3.3863 1.0000 0.1544 -0.0456 0.8683

1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435

16.2721 7.9513 0.0251 -0.0076 -0.2533

2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868

10.0186 4.4150 0.0437 -0.0135 -0.1020

3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086

8.3839 3.5681 0.0545 -0.0171 -0.0128

4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001

8.2181 3.4910 0.0560 -0.0176 -0.0002


Need for approximation
• Function given
Approximate by a simpler function
o For example, to integrate it
• Unknown function: only values at some points
Approximate by a function
o Passing through all data points (Interpolation)
o Capturing the general data trend (Regression)
Estimate the derivative or the integral
o Derivative estimation, e.g., velocity from distance
versus time data (Numerical Differentiation)
o Integral estimation, e.g., area under a curve from y
versus x data (Numerical Integration)
Approximation of functions
• Not very common, but simpler than “data” case
• Generally polynomials are used as approximating
functions (or, if periodic, sine/cosine)
• First question: What should be the degree of the
approximating polynomial?
Depends on the desired accuracy and required
computational effort
• Second question: How do we quantify the
“accuracy” or “error”?
• And finally: How do we obtain the “best”
polynomial, i.e., the one with minimum error?
Approximation of functions
• Easiest method: Use Taylor’s series.
Approximate f(x) over the interval (a,b) using
an mth degree polynomial, fm(x)
f ( x ) = f ( x0 ) + ( x − x0 ) f ′( x0 ) +
( x − x0 )
2
f ′′( x0 ) + ... +
( x − x0 )
m
f [ m ] ( x0 ) + Rm
2! m!

x0 is some point in (a,b), midpoint may be best


Rm is the remainder, given by
Rm = ∫
x
( x − χ ) [ m +1]
m
f ( )
χ dχ =
( x − x0 )
m +1
f [ m +1] (ζ )
x
m!0
(m + 1)!
ζ ∈ (x0 , x )
Taylor’s Series: Example
• Approximate f(x)=1+x+x2 over the interval (0,2)
using a linear function, f1(x)
Choose x0=1
Taylor’s series:
f ( x ) = f (1) + ( x − 1) f ′(1) +
( x − 1) "
2
f (ζ ); ζ ∈ (1, x)
2!
The linear approximation is
f1 ( x ) = 3 + 3(x − 1) = 3 x
And, since the second derivative is constant
(=2), the error at any x is (x − 1)2
7

4
f(x)

0
0 0.5 1 1.5 2

x
Taylor’s series is not a very good fit! Other methods are needed.
Least Squares
• We treat the residual as an error term, Rm=f(x)-
fm(x), and then minimize its “magnitude”
Rm is a function of x.
Magnitude may be taken as the integral over
the domain (a,b)
To accommodate negative error, we square it
• The problem reduces to:
b
Minimize ∫ ( f ( x) − f m ( x) ) dx
2

(Hence, the name " Least Squares" )


Least Squares: Formulation
• We could write fm(x) in the conventional form as
m
f m ( x) = c0 + c1 x + c2 x + ... + cm x = ∑ c j x
2 m j

j =0

• However, alternative forms may also be used:


m m m
f m ( x) = ∑ c j (x − x0 ) ; f m ( x) =
j
∑c j p j ; f m ( x ) = ∑ c j pm , j
j =0 j =0 j =0

where, x0 is a suitable point [e.g., (a+b)/2], pj is a


polynomial of degree j, and pm,j is a polynomial
of degree m. The aim is to obtain the c’s.
Least Squares: Formulation
• Examples, using a 2nd degree polynomial:
2
f 2 ( x) = c0 + c1 x + c2 x

f 2 ( x) = c0 + c1 (x − 1) + c2 ( x − 1) 2

(
f 2 ( x) = c0 + c1 (1 + x ) + c2 1 + x + x 2 )
( 2
) ( 2
) (
f 2 ( x) = c0 1 + x + x + c1 1 + 2 x + 3 x + c2 1 + 4 x + 9 x 2
)
Least Squares: Formulation
• We use a general form:
m
f m ( x ) = ∑ c jφ j ( x )
j =0

• φ’s are known functions (here, polynomials) and


the coefficients are chosen in such a way that the
2
b
 m 
error ∫  f ( x) − ∑ c jφ j ( x)  dx is minimized.
a j =0 
Least Squares: Formulation
• Using the stationary point theorem, we take the
derivative of the error w.r.t. each of the c’s, and
equate it to zero, to get a set of m+1 linear eqs.
b
 m 
∫a 2 f ( x) − ∑ c jφ j ( x) (− φi ( x) )dx = 0 for i = 0,1,2 ,...,m
j =0 

• For a clearer presentation, we drop the (x) from


the expressions and write
b
 m 
 f − ∑
∫a  j =0 φi dx = 0 for i = 0,1,2,...,m
c j φ j

Least Squares: Inner product
• Analogous to vectors, for functions:
VECTORS FUNCTIONS
b
1
b−a ∫
a
f dx

Norm L1,L2,L∞ Magnitude


b
1
b−a ∫
a
f 2 dx

f max
over ( a, b)

Inner Product
b

Dot Product x.y


<f,g> ∫ f .gdx
a

Orthogonality x.y =0 Orthogonality <f,g> = 0


Least Squares: Normal Equations
• Using the inner product notation:
m

∑ c φ ,φ
j =0
j j i = f , φi for i = 0,1,2 ,...,m

• Or, concisely [ A]{c} = {b} : Called the “Normal


Equations”

in which,
aij = φi , φ j ; bi = φi , f ; i, j = 0,1,2,..., m
Normal Equations: Example
• Approximate f(x)=1+x+x2 over the interval (0,2)
using a linear function, f1(x)
• Choose the linear function as f1(x)=c0+c1x
=> φ0 ( x) = 1; φ1 ( x) = x
2
a00 = φ0 , φ0 = ∫ 1dx = 2
0
2
a01 = a10 = φ0 , φ1 = ∫ xdx = 2
0
2
8
a11 = φ1 , φ1 = ∫ x dx =
2

0
3
Normal Equations: Example
2
• and 20
b0 = φ0 , f = ∫ 1 + x + x dx =
2

0
3
2
26
(
b1 = φ1 , f = ∫ x 1 + x + x dx =
2
)3
0
• The normal equations are
Solution :
2 2  c0  20 / 3

2 8 / 3 c  =   1
26 / 3 c0 = ; c1 = 3
  1    3
• Therefore, f1(x)=1/3 + 3 x
7

4
f(x)

0
0 0.5 1 1.5 2

x
Much better than Taylor’s series in overall sense (NOT near 1 !)
Normal Equations: Diagonal form
• If the matrix A becomes diagonal, c is easily
computed. Since we are free to choose the form
of the basis functions (φ’s), use “Orthogonal
polynomials”
b
• Recall: a = φ , φ = φ ( x)φ ( x) dx
ij i j ∫
a
i j

• Using the same example, over the domain (0,2)


• Choose φ0=1 and φ1 , a linear function orthogonal
to it, = d0 +d1 x. 2
• Then: ∫ (d 0 + d1 x )dx = 0 ⇒ d1 = −d 0
0
Normal Equations: Diagonal form
• d0 is arbitrary, let us use 1 => φ1 =1−x
• The normal equations are:
Solution :
2 0  c0  20 / 3
0 2 / 3  c  =  − 2  10
c0 = ; c1 = −3
  1    3

• Therefore, f1(x)= 10/3 − 3 (1 − x) = 1/3 + 3x


• Same as before, but much easier to compute
• However, needs effort in finding the φ’s, which
depend on the range, i.e., a and b.
Orthogonal polynomials: Legendre
• If we standardize the domain, the orthogonal
polynomials need to be computed only once

• Recall that there was an arbitrary constant

• If the standard domain of (−1,1) is chosen and


the arbitrary constant is chosen to make φ =1 at
x=1, we get the Legendre Polynomials, Pn(x).

• If the problem specifies the domain (a,b) for the


*
variable x*, the transformation x = x − b+a
2
b−a
is used to standardize it. 2
Legendre polynomials
• P0(x)=1.

• P1(x) should be a linear function orthogonal to


P0(x) and should be equal to 1 at x=1

• Assume P1(x) = d0 +d1 x


1

• Orthogonality: ∫ (d
−1
0 + d1 x )dx = 0 ⇒ d 0 = 0

• Value at x=1 equal to 1 => P1(x) = x


Legendre polynomials
• Similarly, assume P2(x) = d0 + d1 x + d2 x2

• Orthogonality with P0 gives d0 + d2 /3 =0;


with P1 gives d1=0; and value at x=1 gives d0
+ d2 =1

• P2(x) = (−1+3x2)/2

• Similarly: P3(x) = (−3x+5x3)/2;


P4(x) = (3−30x2+35x4)/8 …
Legendre polynomials
• Recursive Formula
2n − 1 n −1
Pn ( x) = xPn −1 ( x) − Pn − 2 ( x)
n n

• Orthogonality
 0 i≠ j
1


Pi ( x), Pj ( x) = ∫ Pi ( x) Pj ( x)dx = 
−1  2
 i= j
 2i + 1
Legendre polynomials: Example
Legendre polynomials: General Case
• For a general case, degree m:
2
0
2/3
𝐴𝐴 = 2/5
0 ⋱
2/(2𝑚𝑚 + 1)
1, 𝑓𝑓(𝑥𝑥)
𝑥𝑥, 𝑓𝑓(𝑥𝑥)
𝑏𝑏 = (3𝑥𝑥 2 − 1)/2, 𝑓𝑓(𝑥𝑥)

𝑃𝑃𝑚𝑚 𝑥𝑥 , 𝑓𝑓(𝑥𝑥)
Orthogonal polynomials: Tchebycheff
• Legendre: More error 7
6

near end points. 5


4

• In this case, 2/3 near

f(x)
3
2

0 & 2, and 1/3 at 1 1


0

• If we shift the line up by 0 0.5 1


x
1.5 2

1/6, it will make both the errors equal (1/2)


• Tchebycheff: Instead of minimizing the square
of the error, we use weighted errors such that
weights are larger near the end points.
Orthogonal polynomials: Tchebycheff
• Tchebycheff (or Chebyshev): Instead of minimizing
the square of the error, use weighted errors such that
weights are larger near the end points.
• The weight is taken as 1/ 1 − 𝑥𝑥 2
m
• Approximating polynomial f ( x) = m ∑ c T (x )
j =0
j j

• T’s are known as Tchebycheff polynomials) and the


coefficients are chosen such that the weighted error
1 2
1  m 
∫  f ( x) − ∑ c jT j ( x)  dx is minimum.
2  
−1 1− x  j =0 
Tchebycheff polynomials
• Tchebycheff polynomials are orthogonal over
(−1,1) w.r.t. the weight 1/ 1 − 𝑥𝑥 2 and have a
maximum magnitude of unity.
• Therefore, T0(x)=1, and assuming T1(x)=d0 + d1 x
1
d 0 + d1 x

−1 1− x2
dx = 0 ⇒ d 0 = 0 ⇒ T1 ( x) = x

• Similarly, for T2(x):


1 2 1 2
d 0 + d1 x + d 2 x x(d 0 + d1 x + d 2 x )

−1 1 − x 2
dx = 0; ∫
−1 1− x2
dx

⇒ d 0 = 0 ⇒ T1 ( x) = x
7

4
f(x)

0
0 0.5 1 1.5 2

x
Taylor’s series and Least squares approximations
Orthogonal polynomials: Tchebycheff
• Legendre: More error 7
6

near end points. 5


4

• In this case, 2/3 near

f(x)
3
2

0 & 2, and 1/3 at 1 1


0

• If we shift the line up by 0 0.5 1


x
1.5 2

1/6, it will make both the errors equal (1/2)


• Tchebycheff: If we could assign some “suitable”
weights to the error, the “maximum” error could
be minimized.
Orthogonal polynomials: Tchebycheff
• Tchebycheff (or Chebyshev): Instead of minimizing the
square of the error, use weighted errors such that
weights are larger near the end points.
• The weight is taken as 2
1/ 1− x
m
• Approximating polynomial f m ( x ) = ∑ c jT j ( x )
j =0

• T's are known as Tchebycheff polynomials and the


coefficients are chosen such that the weighted error
2
is minimum: 1
1  m 
∫  f ( x) − ∑ c jT j ( x)  dx
2  
−1 1− x  j =0 
Tchebycheff polynomials
• Tchebycheff polynomials are orthogonal over (−1,1)
w.r.t. the weight 1 / 1 − x 2 and have a maximum
magnitude of unity.
• Therefore, T0(x)=1, and assuming T1(x)=d0+ d1 x
1
d 0 + d1 x
∫−1 1 − x 2 dx = 0 ⇒ d 0 = 0 ⇒ T1 ( x) = x
• Similarly, for T2(x):
1 2 1 2
d 0 + d1 x + d 2 x x(d 0 + d1 x + d 2 x )

−1 1 − x 2
−1
dx = 0; ∫
1 − x 2
dx

d2
⇒ d0 + = 0; d1 = 0 ⇒ T2 ( x) = −1 + 2 x 2
2
Tchebycheff polynomials
• General form
(
Tn ( x ) = cos n cos x −1
)
• Recursive Formula

Tn ( x) = 2 xTn −1 ( x) − Tn − 2 ( x)

• Orthogonality 0 i≠ j

1
1 
Ti ( x), T j ( x) = ∫ Ti ( x)T j ( x)dx =  π i= j=0
−1 1− x 2

π
 i= j≠0
2
P0(x)=1, P1(x) = x, P2(x) = (−1+3x2)/2, P3(x) = (−3x+5x3)/2, P4(x) = (3−30x2+35x4)/8
1

0.75

0.5

0.25

Pn(x)
0

-0.25

-0.5

-0.75

-1
-1 -0.5 0 0.5 1

x
T0(x)=1, T1(x) = x, T2(x) = (−1+2x2), T3(x) = (−3x+4x3), T4(x) = (1−8x2+8x4)
1

0.75

0.5

0.25

Tn(x)
0

-0.25

-0.5

-0.75

-1
-1 -0.5 0 0.5 1

x
Tchebycheff polynomials: Example
• Fit straight line to 1+x*+x*2 over (0,2)
• Normalize: x=x*−1
• f(x)=3+3x+x2
π 0  7π / 2
A=  ;b =  
 0 π / 2 3π / 2 
• Solution: f1(x*)= 7/2 + 3 (x*−1) = ½ + 3 x*
• The maximum error is reduced: known as the
Minimax approximation
7

4
f(x)

0
0 0.5 1 1.5 2

x
Least Squares Method : Recap
• Obtain the best mth degree polynomial fit to the
function f(x) over the interval (a,b)
 e.g., best straight line, m=1, which fits 1+x+x2 ,
i.e., f(x), over the interval (0,2)
b

• Formulation: Minimize ∫ ( f ( x) − f m ( x) ) dx
2

a
2
 e.g., Minimize ∫ (1 + x + x − f1 ( x) ) dx
2 2

0
Least Squares Method : Recap
• General form of approximating polynomial:
m
f m ( x ) = ∑ c jφ j ( x )
j =0
 e.g., f1(x)=c0+c1x
2
b
 m 
• Minimization of ∑
∫a 
 f ( x ) −
j =0
c j φ j ( x ) 


dx

∫ (1 + x + x − (c )
 e.g., 2
+ c1 x ) dx
2
0
0
Least Squares Method : Recap
• Stationary Point theorem
b
 m 
 f − ∑
∫a  j =0 φi dx = 0 for i = 0,1,2,...,m
c j φ j

 e.g.,
b

∫ (1 + x + x )
2
− (c0 + c1 x) dx = 0 w.r.t. c0
a
b

∫ (1 + x + x )
2
− (c0 + c1 x) xdx = 0 w.r.t. c1
a
Least Squares Method : Recap
• Inner product: b
f , g = ∫ f .g dx
a

∑ c φ ,φ
j =0
j j i = f , φi for i = 0,1,2 ,...,m

 e.g., c0 + c1 x,1 = 1 + x + x ,1 2

2
c0 + c1 x, x = 1 + x + x , x
Least Squares Method : Recap
• Normal Equations: [ A]{c} = {b}

aij = φi , φ j ; bi = φi , f ; i, j = 0,1,2,..., m
 e.g.,

2 2
 2 
 ∫ 1.1dx ∫0 1.xdx  c0   ∫0 1.(1 + x + x )dx 
2

0   =  
2 2 2
   c1   
 ∫ x.1dx ∫0 ∫
2
x. xdx  x.(1 + x + x ) dx
0  
0 
Least Squares Method : Recap
2 2  c0  20 / 3

2 8 / 3 c  =  
   1  26 / 3
1
Solution : c0 = ; c1 = 3
3
f1(x)=1/3 + 3 x
Least Squares Method : Recap
• Using orthogonal polynomials φ0 =1 and φ1 =1−x

2 0  c0  20 / 3
0 2 / 3  c  =  − 2 
  1   
10
Solution : c0 = ; c1 = −3
3
f1(x)=10/3 − 3 (1−x) = 1/3 + 3 x
Least Squares Method : Recap
• Using Legendre polynomials φ0 =1 and φ1 = x
with the domain changed to (−1,1) using
*
x − b +2 a *
x= = x −1
b−a
2

2 0  c0  20 / 3 10

0 2 / 3 c  =   ⇒ c0 = 3 ; c1 = 3
  1   2 
f1(x)=10/3 + 3 x = 1/3 + 3 x*
Least Squares Method : Recap
• Using Tchebycheff polynomials φ0 =1 and φ1 = x
with the domain changed to (−1,1) using weight
of 1/√(1−x2)

π 0  7π / 2 7
A=  ;b =   ⇒ c0 = 2 ; c1 = 3
 0 π / 2 3π / 2 

f1(x)=7/2 + 3 x = 1/2 + 3 x*
Approximation of Data
• Data denoted by (xk , f (xk )) k = 0,1,2,..., n
• n+1 data points
• Approximating polynomial: fm(x)
• If m=n, unique polynomial passing through
all the data points - Interpolation
• If m<n, best-fit polynomial capturing the
trend of data – Regression : depends on the
definition of “best-fit”
• If m>n, non-unique polynomial passing
through all the points
1.2

0.8
f(x)

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
Interpolation
• There is a unique nth degree polynomial
passing through the n+1 data points
(xk , f (xk )) k = 0,1,2,..., n
• Represent it as n
f n ( x ) = ∑ c jφ j ( x )
j =0

• As discussed before, the basis functions may


be taken in several different forms.
• Conventional form, φ j ( x ) = x , i.e.,
j

n
f n ( x) = c0 + c1 x + ... + cn x
Interpolation
• The polynomial must pass through all the n+1
data points: the coefficients are given by
1 x0 x02 . x0n  c0   f ( x0 )
 n    
1 x1 x1 . x1   c1   f ( x1 )
2

n    
1 x2 x2 . x2 c2  =  f ( x2 ) ⇒ Ac = b
2

    
. . . . . . .
   
1
 xn xn2 . xnn  cn   f (xn )
• Solve by any of the linear equation methods
• The A matrix is called Vandermonde matrix
• Unique solution if all x’s are distinct
• Ill- conditioned for large n: Not recommended
Interpolation: Example
• Find the interpolating polynomial for the
given data points: (0,1), (1,3), (2,7)
• 3 data points => n=2
• Second degree interpolating polynomial
• x0=0, f(x0)=0; x1=1, f(x1)=3; x2=2, f(x2)=7
1 0 0 c0  1 
1 1 1   c  = 3 ⇒ c = 1; c = 1, c = 1
  1    0 1 2

1 2 4 c2  7 

• Interpolating polynomial is 1+x+x2


Interpolation: Lagrange polynomials
n
f n ( x ) = ∑ c jφ j ( x )
j =0

• Each of the basis functions, φj, is an nth-degree


polynomial, such that its value is 1 at x = xj
and zero at all other data points, denoted by
L j (x) 1

• For example, using the


0.75

0.5
Lj(x)
same data points as 0.25

before (x=0,1,2): -0.25


0 0.5 1 1.5 2
x
Interpolation: Lagrange polynomials
0 i ≠ j
n 
f n ( x) = ∑ c j L j (x ) with Li (x j ) = 
j =0 

1 i = j
• What will be the value of the coefficient, ci ?
• SAME AS f(xi) !
• How to obtain the Li ? n x−x
Li ( x ) = ∏
j

j = 0 xi − x j
j ≠i
Lagrange Polynomial: Example
• Find the interpolating polynomial for the
given data points: (0,1), (1,3), (2,7)
• Second degree interpolating polynomial
2
( x − 1)( x − 2) 2 − 3 x + x
L0 = =
(0 − 1)(0 − 2) 2
( x − 0)( x − 2)
L1 = = 2x − x2
(1 − 0)(1 − 2)
( x − 0)( x − 1) − x + x 2
L2 = =
(2 − 0)(2 − 1) 2

• Interpolating polynomial is
2
1× L0 + 3 × L1 + 7 × L2 = 1 + x + x
Lagrange Polynomial: Example
• Useful when the grid points are fixed but
function values may be changing
• For example, estimating the temperature at a
point using the measured temperatures at a
few nearby points
• The value of the Lagrange polynomials at the
desired point need to be calculated only once
• Then, we just need to multiply these values
with the corresponding temperatures.
• What if a new measurement is added?
Interpolation: Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0

• The basis function, φi, is a ith-degree


polynomial, which is zero at all “previous”
points (with φ0 =1) and, for i>0,
i −1 2.5

φi (x ) = ∏ ( x − x j ) 2

1.5
j =0
1
φi(x)0.5
• For example, using the 0

same data (x=0,1,2): -0.5

-1
0 0.5 1 1.5 2
x
Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0

φ0 (x ) = 1; φ1 (x ) = x − x0 ; φ2 (x ) = (x − x0 )(x − x1 )
φ3 (x ) = (x − x0 )(x − x1 )(x − x2 )
• Applying the equality of function value and
the polynomial value at x=x0: c0=f(x0).
• At x=x1:
f ( x1 ) − f ( x0 )
f ( x1 ) = c0 + c1 (x1 − x0 ) ⇒ c1 =
x1 − x0
Newton’s divided difference
• At x=x2:
f ( x2 ) = c0 + c1 ( x2 − x0 ) + c2 ( x2 − x0 )( x2 − x1 )
f ( x1 ) − f ( x0 )
f ( x2 ) − f ( x0 ) − ( x2 − x0 )
x1 − x0
⇒ c2 =
(x2 − x0 )(x2 − x1 )
f ( x2 ) − f ( x1 ) f ( x1 ) − f ( x0 )

x2 − x1 x1 − x0
=
(x2 − x0 )
Newton’s divided difference
• The divided difference notation:
f ( x j ) − f ( xi )
[
f x j , xi = ] x j − xi
[
= f xi , x j ]
[
f xk , x j , xi = ] [ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
xk − xi
f [xn , xn −1 ,..., x2 , x1 ] − f [xn −1 ,..., x2 , x1 , x0 ]
f [xn , xn −1 ,..., x2 , x1 , x0 ] =
xn − x0

• First divided difference, second,…,nth


Newton’s divided difference
• The ith coefficient is then given by the ith
divided difference:
c0 = f ( x0 ); c1 = f [x1 , x0 ]; c2 = f [x2 , x1 , x0 ];...
cn = f [xn , xn −1 ,..., x2 , x1 , x0 ]
• If hand-computed: easier in a tabular form
f(x) f[x1,x0] f[x2,x1,x0] f[x3,x2,x1,x0]

0 1
2 c0 = 1; c1 = 2; c2 = 1
1 3 1
4 f 2 ( x) = 1 + 2( x − 0) + 1( x − 0)( x − 1)
2 7
= 1+ x + x2
Spline Interpolation
• Using piece-wise polynomial interpolation
• Given (xk , f (xk )) k = 0,1,2,..., n
Previous Lecture: Approximation of functions

• Error, magnitude, inner product


• Least squares, normal equations
• Orthogonal polynomials: Legendre

Today:
• Orthogonal polynomials: Tchebycheff
• Interpolation
1.2

0.8
f(x)

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
Lagrange polynomials
n x − xj
Li ( x ) = ∏
n
f n ( x) = ∑ c j L j (x ) xi − x j
j =0
j =0 j ≠i
1

0.75

0.5

Lj(x)
0.25

-0.25
0 0.5 1 1.5 2

x
Lagrange Polynomial
• Useful when the grid points are fixed but
function values may be changing (estimating
the temperature at a point using the
measured temperatures at nearby points)
• The value of the Lagrange polynomials at the
desired point need to be calculated only once
• Then, we just need to multiply these values
with the corresponding temperatures
• What if a new measurement is added?
• The polynomials will need to be recomputed
Interpolation: Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0

• The basis function, φi, is an ith-degree


polynomial, which is zero at all “previous”
points. φ0 =1, and, for i>0,
i −1 2.5

φi (x ) = ∏ ( x − x j ) 2

1.5
j =0
1
φi(x)0.5
• For example, using the 0

same data (x=0,1,2): -0.5

-1
0 0.5 1 1.5 2
x
Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0

φ0 (x ) = 1; φ1 (x ) = x − x0 ; φ2 (x ) = (x − x0 )(x − x1 )
φ3 (x ) = (x − x0 )(x − x1 )(x − x2 )
• Applying the equality of function value and
the polynomial value at x=x0: c0=f(x0).
• At x=x1:
f ( x1 ) − f ( x0 )
f ( x1 ) = c0 + c1 (x1 − x0 ) ⇒ c1 =
x1 − x0
Newton’s divided difference
• At x=x2:
f ( x2 ) = c0 + c1 ( x2 − x0 ) + c2 ( x2 − x0 )( x2 − x1 )
f ( x1 ) − f ( x0 )
f ( x2 ) − f ( x0 ) − ( x2 − x0 )
x1 − x0
⇒ c2 =
(x2 − x0 )(x2 − x1 )
f ( x2 ) − f ( x1 ) f ( x1 ) − f ( x0 )

x2 − x1 x1 − x0
=
(x2 − x0 )
Newton’s divided difference
• The divided difference notation:
f ( x j ) − f ( xi )
[
f x j , xi = ] x j − xi
[
= f xi , x j ]
[
f xk , x j , xi = ] [ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
xk − xi
f [xn , xn −1 ,..., x2 , x1 ] − f [xn −1 ,..., x2 , x1 , x0 ]
f [xn , xn −1 ,..., x2 , x1 , x0 ] =
xn − x0

• First divided difference, second,…,nth


Newton’s divided difference
• The ith coefficient is then given by the ith
divided difference:
c0 = f ( x0 ); c1 = f [x1 , x0 ]; c2 = f [x2 , x1 , x0 ];...
cn = f [xn , xn −1 ,..., x2 , x1 , x0 ]
• If hand-computed: easier in a tabular form
x f(x) f[x1,x0] f[x2,x1,x0]

0 1
2 c0 = 1; c1 = 2; c2 = 1
1 3 1
4 f 2 ( x) = 1 + 2( x − 0) + 1( x − 0)( x − 1)
2 7
= 1+ x + x2
Newton’s divided difference: Error
• The remainder may be written as:
Rn ( x) = f ( x) − f n ( x) = φn +1 ( x) f [x, xn , xn −1 ,..., x2 , x1 , x0 ]
where
φn +1 (x ) = (x − x0 )(x − x1 )(x − x2 )...(x − xn )

• Since f(x) is not known, we may approximate


it by using another point (xn+1,f(xn+1)) as
Rn ( x) ≅ f n +1 ( x) − f n ( x) = φn +1 ( x) f [xn +1 , xn , xn −1 ,..., x2 , x1 , x0 ]
Interpolation: Runge phenomenon
1.0
0.9
n=10 0.8
0.7
0.6
f(x)

0.5
0.4
0.3
1.0
0.2
0.1 n=20 0.8
0.0
-1 -0.5 0 0.5 1
x 0.6

0.4
f(x)
0.2

0.0
-1 -0.5 0 0.5 1
-0.2

x
Spline Interpolation
• Using piece-wise polynomial interpolation
• Given (xk , f (xk )) k = 0,1,2,..., n
• Interpolate using “different” polynomials
between smaller segments
• Easiest: Linear between each successive pair
• Problem: First and
1.0
0.9
0.8

higher derivatives
0.7
0.6
0.5

f(x)
would be discontinuous
0.4
0.3
0.2
0.1
0.0
-1 -0.5 0 0.5 1
x
Spline Interpolation
• Most common: Cubic spline
• Given (xk , f (xk )) k = 0,1,2,..., n
• Interpolate using the cubic splines:

Si ( x) = c0,i + c1,i (x − xi ) + c2,i (x − xi ) + c3,i (x − xi )


2 3

between the points xi and xi+1 (i=0,1,2,…,n-1)


• 4 “unknown” coefficients. Two obvious
conditions are: Si(xi)=f(xi) and Si(xi+1)=f(xi+1)
• 2 “degrees of freedom” in each “segment”
Interior
Nodes
f(x) Si (x)

Segment, i

x0 xi xi+1 xn
Corner Node x Corner
(or End Node) Node
Spline Interpolation
• Total n segments => 2n d.o.f
• Equality of first and second derivative at
interior nodes : 2(n-1) constraints
• Need 2 more constraints (discussed later)!
• How to obtain the coefficients?
• The second derivative of the cubic spline is
linear within a segment. Write it as
1
Si′′( x) = [(xi +1 − x )Si′′( xi ) + (x − x i )Si′′( xi +1 )]
xi +1 − xi
Spline Interpolation
• Integrate it twice:
Si ( x) =
1
6( xi +1 −x )
[(x ′′ 3
′′ ]
i +1 − x ) S i ( xi ) + ( x − x i ) S i ( xi +1 ) + C1 x + C 2
3

and equating the function values at nodes:

f (x ) =
( xi +1 − xi )
2
S i′′( xi )
+ C1 xi + C2
i
6

f ( xi +1 ) =
( xi +1 − x i ) S i′′( xi +1 )
2
+ C1 xi +1 + C2
6
Spline Interpolation
• Resulting in

Si ( x) =
( xi +1 − x ) Si′′( xi ) + ( x − x i ) Si′′( xi +1 )
3 3

6( xi +1 − xi )
 f ( xi ) ( xi +1 − xi )Si′′( xi ) 
+  −  (x i +1 − x)
 xi +1 − xi 6 
 f ( xi +1 ) ( xi +1 − xi )Si′′( xi +1 ) 
+ −  ( x − xi )
 xi +1 − xi 6 

• How to find the nodal values of S” ?


Spline Interpolation
• Continuity of first derivatives: Si′( xi ) = Si′−1 ( xi )
S ′( x ) = −
( xi +1 − xi )Si′′( xi ) ( xi +1 − xi )S i′′( xi +1 ) f ( xi +1 ) − f ( xi )
− +
i i
3 6 xi +1 − xi

S i′−1 (x ) =
( xi − x i −1 )S i′′−1 ( xi ) ( xi − xi −1 )Si′′−1 ( xi −1 ) f ( xi ) − f ( xi −1 )
+ +
i
3 6 xi − xi −1
• Second derivative is also continuous
• We get a tridiagonal system
(xi − x i −1 )Si′′−1 + 2(xi +1 − x i −1 )Si′′+ (xi +1 − x i )Si′′+1
f ( xi +1 ) − f ( xi ) f ( xi ) − f ( xi −1 )
=6 −6
xi +1 − xi xi − xi −1
Spline Interpolation
• What are the 2 more required constraints?
 Clamped: The function is clamped on each corner
node forcing both ends to have some known fixed
slope, say, s0 and sn. This implies S 0′ = s0 and S n′ = sn
Natural: Curvature at the corner nodes is zero, i.e.,
S 0′′ = S n′′ = 0
Not-a-knot: The first and last interior nodes have C3
continuity, i.e., these do not act as a knot, i.e.,
S 0 ( x ) ≡ S1 ( x ) and S n − 2 (x ) ≡ S n −1 (x )

For periodic functions, S 0′ = S n′ and S 0′′ = S n′′


Spline Interpolation: Example
• From the following data, estimate f(2.6)
x 0 1 2 3 4
f(x) 1 0.5 0.2 0.1 0.05882

• The tridiagonal equations are (using natural


spline, S 0′′ = S 4′′ = 0 ):
4 1 0  S1′′  1.2 
1 4 1  S ′′ =  1.2 
  2   
0 1 4  S3′′ 0.35294

• Solution : S1′′ = 0.2420; S 2′′ = 0.2319; S3′′ = 0.03025


Spline Interpolation: Example
• The desired spline (between 2 and 3) is
S 2 ( x) =
( x3 − x ) S 2′′ + ( x − x 2 ) S3′′
3 3

6( xi +1 − xi )
 f ( x2 ) ( x3 − x2 )S 2′′ 
+ −  ( x3 − x )
 x3 − x2 6 
 f ( x3 ) ( x3 − x2 )S3′′ 
• Putting values
+ −  ( x − x2 )
 x3 − x2 6 

S2 ( x) =
(3 − x ) 0.2319 + (x − 2) 0.03025 
3 3
+ 0.2 −
0.2319  
(3 − x ) + 0.1 − 0.03025 
(x − 2)
6  6   6 

• At x=2.6, f(x)= 0.1251


1.2

0.8
f(x)

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
If the data, f(x), may have uncertainty, we do not want the approximating function
to pass through ALL data points. Regression minimizes the “error”.
Regression
• Given (xk , f (xk )) k = 0,1,2,..., n
• Fit an approximating function such that it is
“closest” to the data points
• Mostly polynomial, of degree m (m<n)
• Sometimes trigonometric functions
• As before, assume the approximation as
m
f m ( x ) = ∑ c jφ j ( x )
j =0
Regression: Least Squares
• Minimize the sum of squares of the difference
between the function and the data:
2
n  m 
∑  f ( x k ) − ∑ c jφ j ( x k ) 
 
k =0  j =0 
• Results in m+1 linear equations (that is why
the term Linear Regression): [A]{c}={b}. Called
the Normal Equations.
n n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
Regression: Least Squares
• For example, using conventional form, φj=xj,
 n n n n
  n

 ∑1 ∑ xk ∑ k ... ∑ xk   ∑ f ( xk ) 
2 m
x
 kn=0 k =0 k =0 k =0
 c  k =0

m +1   0 
n n n n
 x ... ∑ xk    ∑ xk f ( xk ) 
∑ ∑x ∑x
2 3
k k k
 c1  k =0 
    n
k =0 k =0 k =0 k =0
 n n n n 
 ∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2   c2  =  ∑ xk2 f ( xk ) 
 k =0 k =0 k =0 k =0   .   k =0 
 . . . . .  .   . 
    
 n. n
.
n
. .
n
.  cm 
n
.

 xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m  ∑ xkm f ( xk )
k =0 k =0 k =0 k =0   k =0 
Least Squares: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
x 0 1 2 3 4
f(x) 1 0.5 0.2 0.1 0.05882
 4 4 4
  4 
 ∑1 ∑x ∑  ∑
2
k x
k  f ( x k ) 
 k4=0 k =0 k =0
 c0   4 k = 0
  5 10 30  c0  1.85882 
4 4
3        
 x
∑ ∑x ∑ ∑  100   c1  = 1.43528 
2
x k  1c =  x f ( x )
k  ⇒ 10 30
k k
 k

 c2   4 
k =0 k =0 k =0 k =0
 4 4 4
 x 2 f ( x ) 30 100 354 c2  3.14112
∑ xk2
 k =0
∑x
k =0
3
k ∑
k =0
x 4
k 


∑
 k =0
k k 

• Solution: 0.9879, -0.5476, 0.07983


• f(2.6)= 0.1039
1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
1.2

0.8
f(x)

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
Lagrange polynomials
n x − xj
Li ( x ) = ∏
n
f n ( x) = ∑ c j L j (x ) xi − x j
j =0
j =0 j ≠i
1

0.75

0.5

Lj(x)
0.25

-0.25
0 0.5 1 1.5 2

x
Newton’s divided difference
2.5
2
1.5
1
φi(x) 0.5
0
-0.5
-1
0 0.5 1 1.5 2
x
f ( x j ) − f ( xi )
[ ]
f x j , xi =
x j − xi
[
= f xi , x j ]
i −1
φi (x ) = ∏ ( x − x j ) [ ]
f xk , x j , xi =
[ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
j =0 xk − xi

c0 = f ( x0 ); c1 = f [x1 , x0 ]; c2 = f [x2 , x1 , x0 ];...


cn = f [xn , xn −1 ,..., x2 , x1 , x0 ]
Runge phenomenon
1.0
0.9
n=10 0.8
0.7
0.6
f(x)

0.5
0.4
0.3
1.0
0.2
0.1 n=20 0.8
0.0
-1 -0.5 0 0.5 1
x 0.6

0.4
f(x)
0.2

0.0
-1 -0.5 0 0.5 1
-0.2

x
Spline Interpolation

(xk , f (xk )) k = 0,1,2,..., n

Interior
Nodes
f(x) Si (x)

Segment, i

x0 xi xi+1 xn
Corner Node x Corner
(or End Node) Node
Linear Spline

1.0
0.9
0.8
0.7
0.6
0.5
f(x)

0.4
0.3
0.2
0.1
0.0
-1 -0.5 0 0.5 1

x
Cubic Spline

Si ( x) =
( xi +1 − x ) Si′′( xi ) + ( x − x i ) Si′′( xi +1 )
3 3

6( xi +1 − xi )
 f ( xi ) ( xi +1 − xi )Si′′( xi ) 
+  −  (x i +1 − x)
 xi +1 − xi 6 
 f ( xi +1 ) ( xi +1 − xi )Si′′( xi +1 ) 
+ −  ( x − xi )
 xi +1 − xi 6 

(xi − x i −1 )Si′′−1 + 2(xi +1 − x i −1 )Si′′+ (xi +1 − x i )Si′′+1


f ( xi +1 ) − f (xi ) f ( xi ) − f ( xi −1 )
=6 −6
xi +1 − xi xi − xi −1
1.2

0.8
f(x)

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
If the data, f(x), may have uncertainty, we do not want the approximating function
to pass through ALL data points. Regression minimizes the “error”.
Regression
• Given (xk , f (xk )) k = 0,1,2,..., n
• Fit an approximating function such that it is
“closest” to the data points
• Mostly polynomial, of degree m (m<n)
• Sometimes trigonometric functions
• As before, assume the approximation as
m
f m ( x ) = ∑ c jφ j ( x )
j =0
Regression: Least Squares
• Minimize the sum of squares of the difference
between the function and the data:
2
n  m 
∑  f ( x k ) − ∑ c jφ j ( x k ) 
 
k =0  j =0 
• Results in m+1 linear equations (that is why
the term Linear Regression): [A]{c}={b}. Called
the Normal Equations.
n n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
Regression: Least Squares
• For example, using conventional form, φj=xj,
 n n n n
  n

 ∑1 ∑ xk ∑ k ... ∑ xk   ∑ f ( xk ) 
2 m
x
 kn=0 k =0 k =0 k =0
 c  k =0

m +1   0 
n n n n
 x ... ∑ xk    ∑ xk f ( xk ) 
∑ ∑x ∑x
2 3
k k k
 c1  k =0 
    n
k =0 k =0 k =0 k =0
 n n n n 
 ∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2   c2  =  ∑ xk2 f ( xk ) 
 k =0 k =0 k =0 k =0   .   k =0 
 . . . . .  .   . 
    
 n. n
.
n
. .
n
.  cm 
n
.

 xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m  ∑ xkm f ( xk )
k =0 k =0 k =0 k =0   k =0 
Least Squares: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
x 0 1 2 3 4
f(x) 1 0.5 0.2 0.1 0.05882
 4 4 4
  4 
 ∑1 ∑x ∑  ∑
2
k x
k  f ( x k ) 
 k4=0 k =0 k =0
 c0   4 k = 0
  5 10 30  c0  1.85882 
4 4
3        
 x
∑ ∑x ∑ ∑  100   c1  = 1.43528 
2
x k  1c =  x f ( x )
k  ⇒ 10 30
k k
 k

 c2   4 
k =0 k =0 k =0 k =0
 4 4 4
 x 2 f ( x ) 30 100 354 c2  3.14112
∑ xk2
 k =0
∑x
k =0
3
k ∑
k =0
x 4
k 


∑
 k =0
k k 

• Solution: 0.9879, -0.5476, 0.07983


• f(2.6)= 0.1039
1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Least Squares: Orthogonal polynomials
• Equidistant points xk ; k = 0,1,2,..., n
2
• Minimize:  f ( x ) − c φ ( x )  => [A]{c}={b}
n  m

∑ k ∑ j j k 
 n
k =0 j =0
n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
• Choose orthonormal basis functions: Known
as Gram’s polynomials, or discrete
Tchebycheff polynomials -- denote by Gi(x).
• Normalize the data range from −1 to 1.
2i
• Implies that xi = −1+
n
Least Squares: Orthogonal polynomials
• Gi(x) is a polynomial of degree i.
n
1

k =0
G0 ( xk )G0 ( xk ) = 1 ⇒ G0 ( x) =
n +1
• Assume G1(x) = d0+d1 x
n
1
∑ (d 0 + d1 x ) = 0 ⇒ d 0 = 0 since ∑ x = 0
k =0 n +1
n
1 1
∑ (d + d1 x ) = 1 ⇒ d1 =
2
0 =
n 2
n
 2k 

k =0

2
x  −1+ 
k =0 k =0  n 
Gram polynomials
• Therefore:
1 3n
d1 = =
n
 4k 2 4k  (n + 1)(n + 2)
∑ 1 + 2 − 
k =0  n n 

• Recursive relation:
αi
Gi +1 ( x) = α i xGi ( x) − Gi −1 ( x) for i = 1,2,..., n − 1
α i −1
1 3n n (2i + 1)(2i + 3)
G0 ( x) = ; G1 ( x) = x ;α i =
n +1 (n + 1)(n + 2) i + 1 (n − i )(n + i + 2)
Gram polynomials: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
t 0 1 2 3 4
f(t) 1 0.5 0.2 0.1 0.05882
• Normalize: x=t/2-1
1 2 2
• For n=4, we get G ( x) = 0 ; G1 ( x) = x ; G2 ( x) =
5
(
7
)
2x2 −1
5
• Normal Equations:
4 
∑ G0 ( xk ) f ( xk ) 
1 0 0 c0   k =40   0.831290 
0 1 0  c  =  G ( x ) f ( x )  = − 0.721746
  1  ∑ 1 k k   
0 0 1 c2   k4=0   0.298702 
 G ( x ) f ( x )  
∑ 2 k k 
 k =0 
Gram polynomials: Example
0.8313 2 2
f 2 ( x) = − 0.7217 x + 0.2987
5 7
(2
2x −1 )
5

• f(t=2.6)=f(x=0.3)= 0.1039
• Same as before 1.0

0.8
• How to estimate 0.6

the closeness? 0.4

• Coefficient of
0.2

0.0

determination 0.0 1.0 2.0 3.0 4.0


Regression: Coefficient of determination
• The “inherent spread” of the data may be
represented by its deviation from mean as
n 2

S t = ∑ ( f ( xk ) − f )
k =0

• f is the arithmetic mean of the function


values n

∑ f (x ) k
f = k =0
n +1

• St is the sum of squares of the total deviations


Coefficient of determination
• Define a sum of residual deviation, from the
fitted mth-degree polynomial as
n 2

S r = ∑ ( f ( xk ) − f m ( x k ) )
k =0

• Obviously, Sr should be as small as possible


and, in the worst case, will be equal to St
• The coefficient of determination is defined as
2 St − S r
r =
St
with its value ranging from 0 to 1.
Coefficient of determination
• A value of 0 for r2 indicates that a constant
value, equal to the mean, is the best-fit
• A value of 1 for r2 indicates that the best-fit
passes through ALL data points
• r is called the correlation coefficient
• r2<0.3 is considered a poor fit, >0.8 is
considered good
• The difference, St-Sr, may be thought of as the
variability in the data explained by the
regression.
Multiple Regression

• For a function of 2 (or more) variables


(xk , yk , f (xk , yk )) k = 0,1,2,..., n
2
n  m2 m1

• Minimize ∑  f ( xi , yi ) − ∑∑ c j ,k xi yi 
j k
 
i =0  k =0 j =0 

• Same as before: a set of (m1+1)x(m2+1) linear


equations
Multiple Regression

• For example, with linear fit:


f11 ( x, y ) = c0, 0 + c1, 0 x + c0,1 y + c1,1 xy
Nonlinear Regression
• Not all relationships between x and f could be
expressed in linear form
• E.g., c1 x c0
f ( x) = c e or f ( x) =
0 c2 x
1 + c1e
• The first one could be linearized
ln f ( x) = ln c0 + c1 x

• But not the second one – nonlinear regression


n
Minimize ∑ ( f ( xk ) − f m ( x, c0 , c1 ,..., cm ) )2
k =0
Nonlinear Regression

• The normal equations are nonlinear


• May be solved using Newton method
• Start with an initial guess for the coefficients
• Use Taylor’s series to form equations A∆c=b
• A and b comprise the derivatives of f wrt c
• Jacobian matrix is defined as before
• Residual r is f-fm
Mid-Sem Exam
• On 21st (September), L7, L12, L, 13, L14, L15, L16
and L17 (OROS)
• Syllabus – Up to the last lecture (Introduction, Error
Analysis, Roots of Nonlinear Equations, Solution of System
of Linear Equations, Eigen values and Eigen vectors, Roots
of system of non-linear equations, Approximation of
functions, Interpolation, Regression)
• Closed book/notes, No formula sheet, No
programmable calculator
• I will be available next week in my office for
discussion. If you want to meet at any other time,
send me a mail and we will fix a time
Regression: Least Squares
• Minimize Error n  m 
2

E = ∑  f ( xk ) − ∑ c jφ j ( xk ) 
k =0  j =0 
• Derivative w.r.t. ci ( i from 0 to m )

∂E n  m 
= 0 ⇒ 2∑  f ( xk ) − ∑ c jφ j ( xk ) (− φi ( xk ) ) = 0
∂ci k =0  j =0 

• Normal Equations: [A]{c}={b}


n n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
Regression: Least Squares
• Conventional form, φj=xj
2 m
f m ( x) = c0 + c1 x + c2 x + ... + cm x
2
n  
E = ∑ ( f ( x ) − (c + c x + c x + ... + c x )) = ∑  f ( x ) − ∑ c x
n m
2
2 m j

k 0 1 k 2 k m k k j k 
k =0 k =0  j =0 
j
n 
∂E m
= 0 ⇒ 2∑  f ( xk ) − ∑ c j xk (− 1) = 0
∂c0 k =0  j =0 
∂E n  m 
= 0 ⇒ 2∑  f ( xk ) − ∑ c j xk (− xk ) = 0
j

∂c1 k =0  j =0 
...
∂E n  m 
∂cm
(
= 0 ⇒ 2∑  f ( xk ) − ∑ c j xk  − xkm = 0
j
)
k =0  j =0 
Regression: Least Squares

 n n n n
  n

 ∑1 ∑ xk ∑ k ... ∑ xk   ∑ f ( xk ) 
2 m
x
 kn=0 k =0 k =0 k =0
 c  k =0

m +1   0 
n n n n
 x ... ∑ xk    ∑ xk f ( xk ) 
∑ ∑ k ∑ k
2 3
x x
k
 c1  k =0 
    n
k =0 k =0 k =0 k =0
 n n n n 
 ∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2   c2  =  ∑ xk2 f ( xk ) 
 k =0 k =0 k =0 k =0   .   k =0 
 . . . . .  .   . 
    
 n. n
.
n
. .
n
.  cm 
n
.

 xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m  ∑ xkm f ( xk )
k =0 k =0 k =0 k =0   k =0 
Gram polynomials
• Normalize: x = -1 to 1
• For example, if n=4, m=2:
f 2 ( x) = c0G0 ( x) + c1G1 ( x) + c2G2 ( x)
1 2 2
G0 ( x) = ; G1 ( x) = x ; G2 ( x) =
5 7
(
2x2 −1 )
5

• Normal Equations:
4 
∑ G0 ( xk ) f ( xk ) 
1 0 0 c0   k =40 
0 1 0   c  =  G ( x ) f ( x ) 
  1  ∑ 1 k k 

0 0 1 c2   k4=0 


 G ( x ) f ( x )
∑ 2 k k 
 k =0 
Goodness of fit

• Coefficient of determination
2 St − S r
r =
St
• r is called the correlation coefficient
• r2<0.3 poor fit, >0.8 good fit
Multiple Regression
• For a function of 2 (or more) variables
(xk , yk , f (xk , yk )) k = 0,1,2,..., n
2
 n m2 m1

• Minimize ∑  f ( xi , yi ) − ∑∑ c j ,k xi yi 
j k

i =0  k =0 j =0 
Nonlinear Regression
n

Minimize ∑ ( f ( xk ) − f m ( x, c0 , c1 ,..., cm ) )
2

k =0
• The normal equations are nonlinear Ac=b
• A=f(c)
• May be solved using Newton method
Numerical Differentiation and Integration
• Given data (xk , f (xk )) k = 0,1,2,..., n

• Estimate the derivatives: e.g., from measured


distances, estimate the velocity/acceleration,
i.e., given a set of (t,x) values, find dx/dt,
d2x/dt2

Numerical Differentiation
• Estimate the integral: e.g., from measured
flow velocities in a pipe, estimate discharge, i.e.,
R

given a set of (r,v) values, find ∫ 2πrvdr


0

Numerical Integration
Numerical Differentiation
• Estimate the derivatives of a function from
given data
(xk , f (xk )) k = 0,1,2,..., n
• Start with the first derivative
• Simplest: The difference of the function values
at two consecutive points divided by the
difference in the x values
• Finite Difference: The analytical derivative has
zero ∆x, but we use a finite value
• What if we want more accurate estimates?
First Derivative
• For simplicity, let us use fi for f(xi)
• Assume that the x’s are arranged in increasing
order (xn>xn-1>…>x0).
• For estimating the first derivative at xi :
′ f i +1 − f i
– Forward difference: f i = x − x
i +1 i

′ f i − f i −1
– Backward difference: f i =
xi − xi −1
– Central difference: ′ f i +1 − f i −1
fi =
xi +1 − xi −1
First Derivative
• Most of the times, the function is “measured”
at equal intervals
• Assume that xn−xn-1= xn-1−xn-2 = … =x1 − x0 = h
• Then, the first derivative at xi :
′ f i +1 − f i
– Forward difference: fi =
h
′ f i − f i −1
– Backward difference: fi =
h
′ f i +1 − f i −1
– Central difference: fi =
2h
First Derivative: Error Analysis
• What is the error in these approximations?
• As an example, if the exact function is a
straight line, the estimate would have no error
• For forward difference, use Taylor’s series:
h2 h m [m] h m +1
f i +1 = f i + hf ′( xi ) + f ′′( xi ) + ... + f ( xi ) + f [ m +1] (ζ f )
2 m! (m + 1)!
ζ f ∈ (xi , xi +1 )

ζf is a point in the forward interval (xi, xi+1)


• We use f ’(xi) to denote the exact value of the
derivative at xi (the estimation is fi’)
First Derivative: Error Analysis
• Truncating at the linear term
2
h
f i +1 = f i + hf ′( xi ) + f ′′(ζ f )
2
• Which implies that
f i +1 − f i h
f ′( xi ) = − f ′′(ζ f )
h 2
• The error in the forward difference
approximation is, therefore,
h
f ′( xi ) − f i′ = − f ′′(ζ f )
2
First Derivative: Error Analysis
• Since the error is proportional to h, the
method is called O(h) accurate.
• Similarly, the error in backward difference is
obtained by expansion of fi-1, as
f i − f i −1 h
f ′( xi ) = + f ′′(ζ b )
h 2
h
f ′( xi ) − f i′ = f ′′(ζ b )
2
ζb is a point in the backward interval, (xi-1,xi)
First Derivative: Error Analysis
• The error in central difference is obtained by
expansion of both fi+1 and fi-1, as
h2 h3
f i +1 = f i + hf ′( xi ) + f ′′( xi ) + f ′′′(ζ f )
2 6
h2 h3
f i −1 = f i − hf ′( xi ) + f ′′( xi ) − f ′′′(ζ b )
2 6
• Using intermediate value theorem
f i +1 − f i −1 h 2
f ′( xi ) = − f ′′′(ζ c )
2h 6
2
h
′ ′
f ( xi ) − f i = − f ′′′(ζ c )
6
ζc is a point in the interval, (xi-1,xi+1)
First Derivative: Error Analysis
• Clearly, the method is O(h2) accurate.
• If we reduce the step size by half, the error in
the estimation by forward or backward
difference should reduce by half, but that in
central difference should reduce to one-fourth
• The presence of the derivatives in the error
expression complicates this simple deduction!
• Note that the error in the forward/backward
differences has the second derivative, while
that in central difference has the third.
First Derivative: Error Analysis
• The forward/backward differences are exact
for a linear function, central difference is exact
for a quadratic function
• How to get more accurate forward difference?
• In addition to i and i+1, use i+2 also
First Derivative: Error Analysis

• Use a quadratic interpolating polynomial and


find its slope at i

• Or, combine two lower order estimates

• Or, use Taylor’s series expansion, which will


provide an error estimate also
First Derivative: Quadratic interpolation
• Using Newton’s divided difference
f i + 2 − f i +1 f i +1 − f i

f i +1 − f i h h
f 2 ( x) = f i + ( x − xi ) + ( x − xi )( x − xi +1 )
h 2h

• Derivative at xi
f i + 2 − f i +1 f i +1 − f i

f i +1 − f i h h − 3 f i + 4 f i +1 − f i + 2
+ ( − h) =
h 2h 2h
Combine two estimates: Richardson Extrapolation
• O(h) accurate: f i +1 − f i
f ′( xi ) = + O ( h)
h
• Write f i +1 − f i

f ( xi ) = + E + O(h ) 2

h
fi+2 − fi
• and ′
f ( xi ) = + 2 E + O(h ) 2

2h

• Eliminate E: 2 f ′( x ) − f ′( x ) = 2 f i +1 − f i − f i + 2 − f i
i i
h 2h
− 3 f i + 4 f i +1 − f i + 2
f ′( xi ) = + O(h 2 )
2h
First Derivative: Taylor’s series
h2 h3 h4
f i +1 = f i + hf ′( xi ) + f ′′( xi ) + f ′′′( xi ) + f ′′′′(ζ f 1 )
2 6 4!
2 3 4
4h 8h 16h

f i + 2 = f i + 2hf ( xi ) + ′′
f ( xi ) + ′′′
f ( xi ) + f ′′′′(ζ f 1 )
2 6 4!
ζf1 Є (xi, xi+1) and ζf2 Є (xi, xi+2)

• Eliminate the 2nd derivative


4h 3
4 f i +1 − f i + 2 = 3 f i + 2hf ′( xi ) − f ′′′( xi ) + O(h 4 )
6
− 3 f i + 4 f i +1 − f i + 2 h 2

f ( xi ) = + ′′′
f ( xi ) + O(h ) 3

2h 3
Taylor’s series
− 3 f i + 4 f i +1 − f i + 2
• O(h2) accurate f i′ =
2h
2
h
Error : − f ′′( xi ) + O(h 3 )
• General Method: 3
1
f i′ = (ci f i + ci +1 f i +1 + ci + 2 f i + 2 )
h
ci + ci +1 + ci + 2 h
= f i + (ci +1 + 2ci + 2 ) f ′( xi ) + (ci +1 + 4ci + 2 ) f ′′( xi )
h 2
h2
+ (ci +1 + 8ci + 2 ) f ′′′( xi ) + ...
6
ci + ci +1 + ci + 2 = 0
• Equate coefficients: => -3/2,2,-1/2
ci +1 + 2ci + 2 = 1
ci +1 + 4ci + 2 = 0
Backward difference
• Similarly, for backward difference, O(h2) accurate:
1
f i′ = (ci f i + ci −1 f i −1 + ci − 2 f i − 2 )
h
ci + ci −1 + ci − 2 h
= f i − (ci −1 + 2ci − 2 ) f ′( xi ) + (ci −1 + 4ci − 2 ) f ′′( xi )
h 2
h2
− (ci −1 + 8ci − 2 ) f ′′′( xi ) + ...
6

3 f i − 4 f i −1 + f i − 2
ci + ci −1 + ci − 2 = 0 f i′ =
2h
ci −1 + 2ci − 2 = −1 2
h
ci −1 + 4ci − 2 = 0 Error : f ′′( xi ) + O(h 3 )
3
Central Difference
• And, for central difference, O(h4) accurate:
1
f i′ = (ci −2 f i −2 + ci −1 f i −1 + ci f i + ci +1 f i +1 + ci + 2 f i + 2 )
h
c +c +c +c +c
= i − 2 i −1 i i +1 i + 2 f i + (− 2ci − 2 − ci −1 + ci +1 + 2ci + 2 ) f ′( xi )
h
h h2
+ (4ci − 2 + ci −1 + ci +1 + 4ci + 2 ) f ′′( xi ) + (− 8ci − 2 − ci −1 + ci +1 + 8ci + 2 ) f ′′′( xi )
2 6
h3
+ (16ci − 2 + ci −1 + ci +1 + 16ci + 2 ) f ′′′′( xi ) + ...
24

ci − 2 + ci −1 + ci + ci +1 + ci + 2 = 0
f i − 2 − 8 f i −1 + 0 f i + 8 f i +1 − f i + 2
− 2ci − 2 − ci −1 + ci +1 + 2ci + 2 = 1 f i′ =
12h
4ci − 2 + ci −1 + ci +1 + 4ci + 2 = 0
h 4 [5]
− 8ci − 2 − ci −1 + ci +1 + 8ci + 2 = 0 Error : f ( xi ) + O(h 6 )
16ci − 2 + ci −1 + ci +1 + 16ci + 2 = 0
30
General formulation
• In general, for the nth derivative
nf
1
fi [n]
= n
h
∑c
j = − nb
i+ j fi+ j

where, nb is the number of backward grid


points, and nf, forward grid points.
Numerical Differentiation: Uneven spacing
• What if the given data is not equally spaced
(xk , f (xk )) k = 0,1,2,..., n
• Forward and backward difference formula for
the first derivative will still be valid
′ f i +1 − f i ′ f i − f i −1
fi = fi =
xi +1 − xi xi − xi −1

• Central difference? ′ f i +1 − f i −1
fi =
xi +1 − xi −1

• We may use it but error will NOT be O(h2)


Uneven spacing: Taylor’s series
2 3 4
hi hi hi
f i −1 = f i − hi f ′( xi ) + f ′′( xi ) − f ′′′( xi ) + f ′′′′(ζ f 1 )
2 6 4!
2 3 4
hi +1 hi +1 hi +1
f i +1 = f i + hi +1 f ′( xi ) + f ′′( xi ) + f ′′′( xi ) + f ′′′′(ζ f 1 )
2 6 4!
2 3 4
4h 8h 16h
f i + 2 = f i + 2hf ′( xi ) + f ′′( xi ) + f ′′′( xi ) + f ′′′′(ζ f 1 )
2 6 4!
ζf1 Є (xi, xi+1) and ζf2 Є (xi, xi+2)

• Eliminate the 2nd derivative

You might also like