Professional Documents
Culture Documents
• 20 cm x 30 cm rectangular plate
• Initially at temperature f(x,y). Then, insulated at x=0, zero
temperature at y=0, and convective heat transfer at x=20 and
at y=30
• Find temperature variation with time at the center
i,j+1
i-1,j i,j i+1,j
i,j-1
t + ∆t t
∂T Ti , j − Ti , j
=
∂t ∆t
Ti +1, j − Ti , j Ti , j − Ti −1, j
− Ti +1, j − 2Ti , j + Ti −1, j
∂ 2T ∆x ∆ x
2
= =
∂x ∆x ∆x 2
Introduction: Engineering problems and computational methods
Definitions:
True Error (e) = True Value − Approximate Value
Approximate Error (ε) = Current Approximation − Previous Approximation
(e.g., x=x/2+1/x, x=1,1.5, 1.4166667,1.4142157,1.4142136)
True Relative Error (er) = (True Value − Approximate Value) / True Value
Approximate Relative Error (εr) = (Current Approximation − Previous
Approximation) / Current Approximation
Error Propagation
[A]{x} = {b}
Error vector
{ε } = {x}− {~x }
• αx = α x
• x1 + x2 ≤ x1 + x2
x p
( p p p
= x1 + x2 + x3 + ... + xn )
p 1/ p
p ≥1
Common Vector Norms
• p=2, Euclidean norm: (length) x 2 = x12 + x22 + ... + xn2
n
• p=1, (total distance ) x 1 = ∑ xi
i =1
5
Graphical Method
Involves plotting f(x) curve and finding the
solution at the intersection of f(x) with x-axis.
6
Bracketing Methods
Intermediate value theorem: Let f be a continuous fn on [a, b]
and let f(a) < s < f(b), then there exists at least one x such that
a < x < b and f(x) = s.
Bracketing methods are application of this theorem with s = 0
Nested interval theorem: For each n, let In = [an, bn] be a
sequence of (non-empty) bounded intervals of real numbers
such that and ,
then contains only one point.
This guarantees the convergence of the bracketing methods to the root.
In bracketing methods, a sequence of nested interval is generated
such that each interval follows the intermediate value theorem with s
= 0. Then the method converges to the root by the one point specified
by the nested interval theorem. Methods only differ in ways to
generate the nested intervals.
7
Intermediate Value Theorem
8
Mean Value Theorem
If f(x) is defined and continuous on the interval [a,b] and
differentiable on (a,b), then there is at least one number c in the
interval (a,b) (that is a < c < b) such that
11
Regula-Falsi or Method of False Position
Principle: In place of the mid point, the function is assumed to be
linear within the interval and the root of the linear function is chosen.
Initialize: Choose a0 and b0 such that, f(a0)f(b0) < 0. This is done by
trial and error.
Iteration step k:
A straight line passing through two points (ak, f(ak)) and (bk, f(bk)) is
given by:
y = f(x)
f(ak)
mk+1
bk
ak
f(bk)
13
Example Problem
True solution = 0.5671
Set up a scheme as follows:
Itr. xl f(xl) xu f(xu) xk f(xk) er(%)
0 0 2 0.6981 186
1 0 0.6981 0.5818 20
2 0 0.5818 0.5687 2.2
14
Example Problem
Find th
, x is in m.
Stopping criteria:
As , = constant
17
Open Methods: Fixed Point
y = g(x)
y=x y = x y = g(x)
x3 x2 x1 x0 x3 x2 x1 x0
Root α Root α
18
Example Problem
True solution = 0.5671
Set up a scheme as follows:
Iterations x g(x) er (%)
1 0 1 100
2 1 0.3678 172
3 0.3678 0.6922 47
4 0.6922 0.5004 38
5 0.5004 0.6064 17
19
Example Problem
Find th
, x is in m.
- e.g. )Δ
→
! !
! !
Taylor polynomial approximation:
x ( i 1)
x (i )
x (i )
f(x0)
f(x1)
x0
y = f(x)
x1 x2
26
Open Methods: Newton-Raphson
Problem: f(x) = 0, find a root x = α such that f(α) = 0
Iteration Step 0: Taylor’s Theorem at x = x0
! !
!
, for some
Iteration Step k:
! !
, for some
Assumptions: Neglect 2nd and higher order terms and assume that the root is
arrived at the (k+1)th iteration, i.e., f(xk+1) = 0
Iteration Formula:
27
Open Methods: Newton-Raphson
Convergence: Taylor’s Theorem after n iterations,
for some
Re-arrange: !
or !
or !
As , = constant
28
Open Methods: Newton-Raphson
Advantages:
y = f(x)
Faster convergence f(x0)
(quadratic) x1
x0
Disadvantages: f(x1)
Need to calculate
derivate
Newton-Raphson
method may get stuck!
29
Open Methods: Secant
Principle: Use a difference approximation for the slope
or derivative in the Newton-Raphson method. This is
equivalent to approximating the tangent with a secant.
Problem: f(x) = 0, find a root x = α such that f(α) = 0
f(x0)
f(x1)
f(x2)
x0 x1 x2
y = f(x)
x3 30
Open Methods: Secant
Problem: f(x) = 0, find a root x = α such that f(α) = 0
Initialize: choose two points x0 and x1 and evaluate f(x0) and f(x1)
Iteration Formula:
Stopping criteria:
31
Error Analysis: Vector Norms
What if the computed result is a vector?
[A]{x} = {b}
Error vector
{ε } = {x}− {~x }
• αx = α x
• x1 + x2 ≤ x1 + x2
x p
( p p p
= x1 + x2 + x3 + ... + xn )
p 1/ p
p ≥1
Common Vector Norms
• p=2, Euclidean norm: (length) x 2 = x12 + x22 + ... + xn2
n
• p=1, (total distance ) x 1 = ∑ xi
i =1
fu − fl f ′′(ζ )
f (x ) = f (xu ) + (x − xu ) + (x − xu )(x − xl ) ; x, ζ ∈ ( xl , xu )
xu − xl 2
• If we assume the function to be uniformly concave/convex,
one end of the interval, xu(or xl), remains fixed, say, x(0), and
(0) (i )
( i +1) (0) x −x
x =x − f0
f0 − fi
Linear Interpolation: Error Analysis
From the interpolation equation, applied at x=ξ:
( i ) f ′′(ζ 1 )
(
f (ξ ) = 0 = f 0 + ξ − x ( 0 ) )xf− fi
0
(0)
− x (i )
+ ξ ( − x (0)
ξ)(
− x ) 2
; ζ 1 ∈ x (
(i )
, x (0)
)
x (0) − x (i ) ( 0 ) ( i ) f ′′(ζ 1 )
⇒ f0
f0 − fi
(0)
= −e − e e
2 f ′(ζ 2 )
(
; ζ 1 , ζ 2 ∈ x (i ) , x (0) )
And, from the iteration equation
( 0 ) ( i ) f ′′(ζ 1 )
(0) (i )
x − x
ξ − x (i +1) = ξ − x + f0
(0)
⇒e ( i +1)
= −e e
f0 − fi 2 f ′(ζ 2 )
Today:
• Newton
1000 Method: Comments and Example
• Secant800 Method
600
• Muller Method
f(x) 400
• Bairstow
200 Method
0
0 0.5 1 1.5 2
-200
x
Secant Method: Algorithm
f(x)
fi-1
fi
xi+1 xi xi-1
ξ x
(i ) ( i −1)
x − x
x (i +1) = x (i ) + (− f i )
f i − f i −1
Difference from False Position method?
Not necessarily bracketing the root !
Secant Method: Iterations
(i ) ( i −1)
x −x
-2 -1 0 1 2
-1
( i +1) (i )
Iteration scheme : x =x + (− f i ) -2
f i − f i −1 -3
i x(i) f
(i+1)
x εa (%)
-2 -7.92236
0 -1 1.265737 -1.13776 12.10787
1 -1.13776 0.639987 -1.27865 11.01884
2 -1.27865 -0.18321 -1.24729 2.513998
3 -1.24729 0.016878 -1.24994 0.211612
4 -1.24994 0.000382 -1.25 0.004896
Secant Method: Error Analysis
• Similar to Linear interpolation, applied at the root, ξ,
f i − f i −1 ( i −1) f ′′(ζ )
(
0 = fi + ξ − x (i )
) (i )
x −x ( i −1)
+ ξ (
− x (i )
)(
ξ − x
2
) ; ζ ∈ x ( i −1)
(
, x (i )
,ξ )
(i ) ( i −1)
• Iteration ( i +1) (i ) x − x
x =x + (− f i )
f i − f i −1
( i ) ( i −1) f ′′(ζ 1 )
(i ) ( i −1)
x − x
• Error: ξ − x (i +1) = ξ − x + fi
(i )
⇒e ( i +1)
= −e e
f i − f i −1 2 f ′(ζ 2 )
ζ 1 ∈ (x (i −1) , x (i ) , ξ ); ζ 2 ∈ (x (i −1) , x (i ) )
•Recall: e ( i +1) (i ) p e(i )
1/ p
( i ) 1/ p
f ′′(ξ ) f ′′(ξ )
1 1
(i ) p (i ) e 1+
p (i ) p −1−
Ce =e ⇒C e p =
C 2 f ′(ξ ) 2 f ′(ξ )
• Therefore, p − 1 − 1/p = 0 => p=1.618 (Golden Ratio)
and
f ′′(ξ )
0.618
C=
2 f ′(ξ )
fi-2
fi-1
fi
x
( i +1) (i ) (i )
x = x + ∆x
Find the root near −1, starting with x(-2)=−2, x(-1)=−1, x(0)=0 2
2c
-2
-3
( i +1) (i )
Iteration scheme : x =x −
2
b ± b − 4ac
i x(i) f a b c ∆x ∆x2 x(i+1) εa (%)
-2 -7.922
-1 1.2657
0 0 1.9534 -4.25 -3.562 1.9534 0.3779 -1.216 0.3779
1 0.3779 1.2383 -1.872 -2.6 1.2383 0.375 -1.764 0.753 49.808
2 0.753 0.495 -0.119 -2.027 0.495 0.2408 -17.23 0.9938 24.233
3 0.9938 0.1474 0.8745 -1.233 0.1474 0.1319 1.2778 1.1257 11.718
4 1.1257 0.0368 1.6223 -0.625 0.0368 0.0725 0.3126 1.1982 6.05
5 1.1982 0.0066 2.0675 -0.266 0.0066 0.0335 0.0953 1.2317 2.7176
6 1.2317 0.0008 2.3054 -0.095 0.0008 0.0131 0.028 1.2447 1.0488
7 1.2447 7E-05 2.4244 -0.027 7E-05 0.0042 0.0071 1.2489 0.3342
Muller Method: Error Analysis
• Quadratic interpolation, applied at the root, ξ,
f i − f i −1 f i −1 − f i − 2
−
f i − f i −1
(
0 = fi + ξ − x (i )
)
x ( i ) − x ( i −1)
+ ξ −( x (i )
ξ )(
− x ( i −1)
) x (i ) − x (i −1) x (i −1) − x (i − 2 )
x (i ) − x (i −2)
f ' ' ' (ζ )
( )( )(
+ ξ − x (i ) ξ − x (i −1) ξ − x (i − 2 ) ) 6
(
; ζ ∈ x (i − 2 ) , x (i −1) , x (i ) , ξ )
• From the Iteration scheme, it can be shown (as before) that
f ′′′(ξ )
( i +1) ( i ) ( i −1) ( i − 2 )
e = −e e e
6 f ′(ξ )
• Order of convergence: p3 − p2 − p − 1=0 => p=1.839
f ′′′(ξ )
0.4196
and
C=
6 f ′(ξ )
• Better than Secant
Roots of polynomials: General
• Polynomial equations are very common in Eigenvalue
problems and approximations of functions
• Any of the methods discussed so far should work
• After finding one root, we may deflate the polynomial and
find other roots successively
• If some roots are complex, we may run into problems. This
may happen even with polynomials which have all real
coefficients.
• We will look at polynomials with real coefficients only
• The complex roots will occur in conjugate pairs, implying that
a quadratic factor with real coefficients will be present
(x − [a + ib])(x − [a − ib]) = x 2 − 2ax + a 2 + b 2
• Bairstow Method
Roots of polynomials: Bairstow Method
• Find a quadratic factor of the polynomial f(x) as x
2
− α1 x − α 0
• Find the two roots (real or complex conjugates) as
(
r1, 2 = 0.5 α1 ± α12 + 4α 0 )
n
• Algorithm: Express the given function as f ( x) = ∑ c j x j
j =0
• Linearly convergent
• Asymptotic error constant is not constant (was ½ for bisection),
but depends on the nature of the function
(0) f ′′(ζ 3 )
C=e
2 f ′(ζ 4 )
The iterations show one end fixed at −2, the root is −1.25; e(0)=0.75.
|f”/2f’| varies from 0.5 to 0.8 over (−2, −1.25), indicating C
between 0.35 and 0.6.
Newton Method: Example
3
(i )
f (x )
-2 -1 0 1 2
-1
( i +1) (i )
Iteration scheme : x =x − -2
f ' ( x (i ) ) -3
x(i)
(i+1)
i f f' f" x εa (%)
0 1.000000 0.140738 -1.062900 3.499600 1.234750 19.011928
1 1.234750 0.000585 -0.076048 4.908098 1.250052 1.224103
2 1.250052 0.000000 -0.000242 4.999910 1.250034 0.001403
3 1.250034 0.000000 -0.000329 4.999805 1.250029 0.000371
Previous Lecture: Nonlinear Equations
Today: f(x)
• Bairstow Method : Algorithm and Example
• Linear Simultaneous Equations: Introduction
fi-
• Matrix Norms fi- 2
fi
• Condition Number 1 x
ξ xi+ xi xi- xi-
• Eigenvalue 1 2
Roots of polynomials: Bairstow Method
• Find a quadratic factor of the polynomial f(x) as x
2
− α1 x − α 0
• Find the two roots (real or complex conjugates) as
(
r1, 2 = 0.5 α1 ± α12 + 4α 0 )
n
• Algorithm: Express the given function as f ( x) = ∑ c j x j
j =0
(i ) (i )
∂d 0 ∂d 0
d ( i +1)
0 =0=d (i )
0 + ∆α 0 + ∆α1
∂α 0 ∂α1
(i ) (i )
∂d1 ∂d1
d( i +1)
1 =0=d (i )
1 + ∆α 0 + ∆α1
∂α 0 ∂α1
Bairstow Method: Algorithm
• The partial derivatives could be written as
∂d n
=0
∂α 0
∂d n −1
=0
∂α 0
∂d j ∂d j + 2 ∂d j +1
= d j +2 + α 0 + α1 for j = n − 2 to 0
∂α 0 ∂α 0 ∂α 0
and
∂d n
=0
∂α1
∂d n −1
= dn
∂α1
∂d j ∂d j + 2 ∂d j +1
= d j +1 + α 0 + α1 for j = n − 2 to 0
∂α1 ∂α1 ∂α1
Bairstow Method: Algorithm
• These may be combined in a single recursive equation by
defining
∂d j −1 ∂d j
δj = =
∂α 0 ∂α1
to obtain
δ n −1 = d n
δ n − 2 = d n −1 + α1δ n −1
δ j = d j +1 + α1δ j +1 + α 0δ j + 2 for j = n − 3 to 0
• Which leads to A 2 = xT AT Ax = λ
• Also known as the spectral norm
System of Linear Equations: Condition Number
• We now come back to the question: how sensitive is
the solution to small changes in [A] and/or {b}?
• Look at the worst-case scenario: upper bound of error
• Effect of change in {b}:
A( x + δx ) = (b + δb ) ⇒ δx = A−1δb ⇒ δx ≤ A−1 . δb
δx A−1 . δb b A−1 . δb A . x −1 δb
≤ ≤ ≤ A .A
x x b x b b
• The Matrix norms are (How to find the 2-norm is described later):
A 1 = 2.01 A∞ =2 A2 =2
A−1 = 100 A−1 = 100.5 A−1 = 100
1 ∞ 2
Condition Number: Example
x1 + x2 = 2.00 Solution: 1,1
0.99 x1 + 1.01x2 = 2.00
x1 + x2 = 1.98
Solution: −1.01,2.99
0.99 x1 + 1.01x2 = 2.02
x1 + x2 = 2.00
Solution: 2,0
1.00 x1 + 1.01x2 = 2.00
Previous Lecture: Nonlinear Equations
Today:
• Matrix Norms
• Eigenvalue
• Condition Number
• Direct and Iterative Methods
Matrix Norms: Review
A ≥ 0 (0 only for null matrix) Ax p
A p = max Ax = max
αA = α A x p
=1 p x p
≠0 x p
A+ B ≤ A + B n
AB ≤ A B A 1 = max
1≤ j ≤ n
∑a
i =1
ij
Ax ≤ A x
n
A ∞ = max
1≤i ≤ n
∑a
j=1
ij
Matrix Norm – The 2-norm
• The 2-norm of the matrix is written as A 2 = max Ax 2
x 2 =1
• If we transform a unit vector, {x}, to another vector,
{b}, by multiplying with matrix [A], what is maximum
“length” of {b}?
• We could view [A] as operating on vector {x} to
generate another vector {b}
• It will, in general, lead to a rotation as well as
stretching (or shortening) of the vector {x}
• E.g., consider
1.25 0.75
[ A] =
0. 75 1 .25
1.25 0.75
[ A] =
0. 75 1 . 25
1.5
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5
-1
-1.5
Eigenvalue
• If Ax = λx , there is no rotation. λ is called an
Eigenvalue of [A] and {x} is the corresponding
Eigenvector
• For symmetric matrices, the maximum stretching
and/or shortening will occur along its Eigenvectors,
all of which are mutually orthogonal
• For others, it will occur along some other direction
(we will see later that it is eigenvector of ATA).
• How to find these will be considered after we look at
the methods of solving the linear system.
1.25 0.75
[ A] =
0 .1 1 .25
1.5
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5
-1
-1.5
T 1.5725 1.0625
[ A] [ A] =
1 . 0625 2 . 125
3
0
-3 -2 -1 0 1 2 3
-1
-2
-3
Matrix Norm – The 2-norm
• The 2-norm of the matrix is written as A 2 = max Ax 2
x 2 =1
• If we transform a unit vector, {x}, to another vector,
{b}, by multiplying with matrix [A], what is maximum
“length” of {b}?
• This may be posed as a constrained optimization
problem: Maximize {x}T[A]T[A]{x} subject to {x}T{x} =1
• Use the Lagrange multiplier method
∇ [
x A Ax − ( x x )]
Maximize f ( xT1 , x2T) subject to g T( x1 , x2 ) = 0
λ − 1 = 0 ⇒ AT
A necessary condition is : ∇[ f ( x1 , x2 ) − λg ( x1 , x2 )] = 0 Ax = λx
For example, finding maximum value of x1 + x2 on the unit circle :
• Which leads to
0 T T
∂
x1
[ ( )]
x1 + x2 − λ x1 + x2 − 1A=2= ⇒
2 2
0
x 1A− 2Ax = 0;1λ− 2λx2 = 0 ⇒ x1 = x2 =
λx1 =
1
2
∂
•x2 Also
known as the spectral norm
System of Linear Equations: Condition Number
• We now come back to the question: how sensitive is
the solution to small changes in [A] and/or {b}?
• Look at the worst-case scenario: upper bound of error
• Effect of change in {b}:
A( x + δx ) = (b + δb ) ⇒ δx = A−1δb ⇒ δx ≤ A−1 . δb
δx A−1 . δb b A−1 . δb A . x −1 δb
≤ ≤ ≤ A .A
x x b x b b
• The Matrix norms are (How to find the 2-norm is described later):
A 1 = 2.01 A∞ =2 A2 =2
A−1 = 100 A−1 = 100.5 A−1 = 100
1 ∞ 2
Condition Number: Example
x1 + x2 = 2.00 Solution: 1,1
0.99 x1 + 1.01x2 = 2.00
x1 + x2 = 1.98
Solution: −1.01,2.99
0.99 x1 + 1.01x2 = 2.02
x1 + x2 = 2.00
Solution: 2,0
1.00 x1 + 1.01x2 = 2.00
Methods of Solution of Systems of Linear Equations
• Given Ax = b , how to find {x} for known A and b?
• Easiest: when A is a diagonal matrix
• A little more difficult, but still easy, if A is a triangular
matrix (either upper triangular or lower triangular)
• Otherwise, we may perform some operations on the
system to reduce A to one of these forms, and then
solve the system directly (Direct methods). These
will arrive at the solution in a finite number of steps.
• The other option is to start with a guess value of the
solution and then use an iterative scheme to improve
these (Iterative methods). Number of steps depends
on the convergence properties of the system and the
desired accuracy. May not even Converge!
Direct Methods: Gauss Jordan Method
• Reduce A to a diagonal matrix (generally Identity matrix)
• Take the first equation and divide it by a11 to make the
diagonal element unity. Then, express x1 as a function of
other (n-1) variables: b1 − a12 x2 −a13 x3 ... − a1n xn
x1 =
a11
• Using this, eliminate x1 from all other equations. This will
change the coefficients of these equations. E.g., the
second equation:
b1 − a12 x2 −a13 x3 ... − a1n xn
a21 + a22 x2 + a23 x3 + ... + a2 n xn = b2
a11
a12 a1n b1
a22 − a21 x2 + ... + a2 n − a21 xn = b2 − a21
a11 a11 a11
Direct Methods: Gauss Jordan Method
• Written compactly: R’1=R1/a11; R’i=Ri−ai1xR’1 i=2 to n
• After this step, the first column of [A] is the same as that
of an n x n Identity matrix [I]
• Now, take the second equation and divide it by a’22 to
make the diagonal element unity. Then, express x2 as a
function of other (n-2) variables :
b2' − a23
'
x3 − a24 '
x4 ... − a2' n xn
x2 = '
a22
• Using this, eliminate x2 from all other equations,
including the first one. E.g.
' ' '
" ' ' b " ' ' a " ' ' a
b3 = b3 − a32 ' ; a33 = a33 − a32 ' ;...; a3n = a3n − a32 2' n
2 23
a22 a22 a22
• Compactly: R”2=R’2/a’22; R”i=R’i−a’i2xR’2 i=1, 3 to n
Direct Methods: Gauss Jordan Method
• After this step, first two columns are same as [I]. Also
note that the row-modifications in this step need to be
done only for columns 3 to n.
• Similarly, for the third equation, computations needed
for only columns 4 to n.
• Repeat till the last equation and the modified {b} vector
is the solution!
• Note that any Pivot element aii should not become 0.
Otherwise, the order of the equations needs to be
changed.
• Changing the order of equations to bring a non-zero
element at the pivot position is called pivoting.
Gauss Jordan Method: Example
2 4 1 x1 13 1 2 0.5 x1 6.5 1 2 0.5 x1 6.5
1 2 3 x = 14 ; 1 2 3 x = 14 ; 0 0 2.5 x = 7.5
2 2 2
3 1 5 x3 20 3 1 5 x3 20 0 − 5 3.5 x3 0.5
1 0 1.9 x1 6.7 1 0 0 x1 1
0 1 − 0.7 x = − 0.1 0 1 0 x = 2
2 2
0 0 1 x3 3 0 0 1 x3 3
Direct Methods: Gauss Elimination Method
• Reduce A to upper-triangular matrix (forward elimination)
• Solve the triangular system (back substitution)
• Using the first equation express x1 as a function of other (n-
1) variables: b1 − a12 x2 −a13 x3 ... − a1n xn
x1 =
a11
• Eliminate x1 from all other equations. This will change the
coefficients of these equations. E.g., the second equation:
b1 − a12 x2 −a13 x3 ... − a1n xn
a21 + a22 x2 + a23 x3 + ... + a2 n xn = b2
a11
a21 a21 a21
a22 − a12 x2 + ... + a2 n − a1n xn = b2 − b1
a11 a11 a11
Gauss Elimination Method
• Written compactly: R’i=Ri−(ai1/a11)xR1 i=2 to n
• After this step, the first column of [A] has a11 on the
diagonal and zeroes everywhere else
• Now, take the second equation, express x2 as a function
of other (n-2) variables :
b2' − a23
' '
x3 − a24 x4 ... − a2' n xn
x2 = '
a22
• Using this, eliminate x2 from all other equations,
excluding the first one. E.g.
' ' '
" ' a32 ' " ' a ' " ' a '
b3 = b3 − ' b2 ; a33 = a33 − ' a23 ;...; a3n = a3n − ' a2 n
32 32
a22 a22 a22
• Compactly: R”i=R’i−(a’i2/a’22)xR’2 i= 3 to n
Gauss Elimination Method
• After this step, first two columns have all below-diagonal
elements as zero.
• Repeat till the last equation to obtain an upper triangular
matrix and a modified {b} vector.
• The last equation can now be used to obtain xn directly.
• The second-last equation has only two “unknowns,” xn-1
and xn, and xn is already computed. Solve for xn-1.
• Repeat, going backwards to the first equation, to obtain
the complete solution.
• The total flops are, n n n 2 2 3
∑ [i + 2i(i − 1)] + ≈ ∑ i ≈ ∑ 2i
i =1 i =1 i =1
≈
3
n
2 4 1 x1 13
0 − 5 3.5 x = 0.5;
2
0 0 2.5 x3 7.5
Make the pivot non-zero
Back substitution
Today:
• Reducing round-off errors: Pivoting, scaling, equilibration
• LU decomposition
• Cholesky decomposition
• Banded matrix: Thomas algorithm
Gauss Elimination Method: Pivoting
• First Step: R’i=Ri−(ai1/a11)xR1 i=2 to n
• Second step: R”i=R’i−(a’i2/a’22)xR’2 i= 3 to n; and so on..
• The multiplying factor at each step is equal to the
corresponding element of the row divided by the pivot
element
• If we want round-off errors to be attenuated, this factor
should be small in magnitude.
• Since order of equations is immaterial, we can
interchange rows without affecting the solution
• Partial Pivoting: Scan the elements below the diagonal in
that particular column, find the largest magnitude, and
interchange rows so that the pivot element becomes the
largest.
Pivoting
• Complete Pivoting: Scan the elements in the entire
submatrix, find the largest magnitude, and interchange
rows and columns, so that the pivot element becomes
the largest.
Partial Complete
Pivoting
• Parial Pivoting does not need any additional bookkeeping.
Complete pivoting, due to an exchange of columns, needs
to keep track of this exchange.
• If we exchange columns k and l: after the solution, the
values of xk and xl have to be interchanged.
Examples: Partial Pivoting:
2 4 1 x1 13 3 1 5 x1 20 3 1 5 x1 20
1 2 3 x = 14 ; 1 2 3 x = 14 ; x = 22 / 3
2 2 0 5 / 3 4 / 3 2
3 1 5 x3 20 2 4 1 x3 13 0 10 / 3 − 7 / 3 x3 − 1 / 3
3 1 5 x1 20 3 1 5 x1 20 Soln:
0 10 / 3 − 7 / 3 x = − 1 / 3; 0 10 / 3 − 7 / 3 x = − 1 / 3;
2
2 1,2,3
0 5 / 3 4 / 3 x3 22 / 3 0 0 5 / 2 x3 15 / 2
Example: Complete (or Full) Pivoting
2 4 1 x1 13 3 1 5 x1 20 5 1 3 x3 20
1 2 3 x = 14 ; 1 2 3 x = 14 ; 3 2 1 x = 14 ;
2 2 2
3 1 5 x3 20 2 4 1 x3 13 1 4 2 x1 13
• Note the interchange, of the variables x1 and x3. While computing, we
deal with only [A] and {b}. So we make a note of this exchange and do
it after the solution is obtained.
5 1 3 20 5 1 3 20
0 1.4 − 0.8{x} = 2 ; 0 3.8 1.4 {x} = 9 ;
0 3.8 1.4 9 0 1.4 − 0.8 2
5 1 3 20
0 3.8 1 . 4 {x} = 9 ; Solution : 3,2,1. Interchange :1,2,3
0 0 − 25 / 19 − 25 / 19
Other considerations
• Round-off errors are typically large if the elements of the [A]
matrix are of very different magnitudes.
• Scaling of variable and equilibration of equation are used to
make the coefficient matrix elements roughly of the same order
• For example, 0.02 4 1 x1 13 may be because x1 is
0.01 2 3 x = 14 in cm and others in m
2
0.03 1 5 x3 20
′
• Scaling of x1 to x’1=x1/100 will result in 2 4 1 x1 13
1 2 3 x2 = 14 ;
3 1 5 x3 20
Doolittle:
9 3 − 2 1 0 0 u11 u12 u13 1 0 0 9 3 − 2
3 6 1 = l
21 1 0 0 u22 u23 = 1 / 3 1 0 0 5 5 / 3
− 2 1 9 l31 l32 1 0 0 u33 − 2 / 9 1 / 3 1 0 0 8
1 0 0 y1 10 9 3 − 2 x1 10
1/ 3 y = 10 0 5 5 / 3 x = 20 / 3
1 0 2 2
− 2 / 9 1 / 3 1 y3 8 0 0 8 x3 8
LU decomposition: Examples
9 3 − 2 x1 10
3 6 1 x = 10
2
− 2 1 9 x3 8
Crout:
9 3 − 2 l11 0 0 1 u12 u13 9 0 0 1 1 / 3 − 2 / 9
3 6 1 = l
21 l22 0 0 1 u23 = 3 5 0 0 1 1 / 3
− 2 1 9 l31 l32 l33 0 0 1 − 2 5 / 3 8 0 0 1
9 0 0 y1 10 1 1 / 3 − 2 / 9 x1 10 / 9
3 y = 10 0 1 1 / 3 x = 4 / 3
5 0 2 2
− 2 5 / 3 8 y3 8 0 0 1 x3 1
LU decomposition: Examples
9 3 − 2 x1 10
3 6 1 x = 10
2
− 2 1 9 x3 8
Cholesky:
9 3 − 2 l11 0 0 l11 l21 l31 3 0 0 3 1 − 2 / 3
3 6 1 = l
21 l22 0 0 l22 l32 = 1 5 0 0 5 5 / 3
− 2 1 9 l31 l32 l33 0 0 l33 − 2 / 3 5 /3 8 0 0 8
3 0 0 y1 10 3 1 − 2 / 3 x1 10 / 3
0
1 5
0 y2 = 10 5 5 / 3 x2 = 4 5 / 3
− 2 / 3 5 /3 8 y3 8 0 0 8 x3 8
Banded and Sparse matrices
Several of the matrices in engineering applications are sparse, i.e.,
have very few non-zero elements. For example, solution of
differential equations:
i,j+1
i-1,j i,j i+1,j
i,j-1
Ti +1, j − Ti , j Ti , j − Ti −1, j
t + ∆t t − Ti +1, j − 2Ti , j + Ti −1, j
∂T Ti , j − Ti , j ∂ 2T ∆x ∆x
= ; 2 = =
∂t ∆t ∂x ∆x ∆x 2
If the non-zero elements occur within a “narrow” band around the
diagonal, it is called banded matrix. The fact that most of the
elements are zero can be used to save computational effort.
Banded Matrix
× × . 0 0
× × . . 0
. . . . .
0 . . × ×
0 0 . × ×
l2 −1 3 −1 4 −1 1
α1 = 2; α 2 = d 2 − u1 = 2 − (−1) = ; α 3 = 2 − (−1) = ; α 4 = 1 − (−1) =
α1 2 2 3/ 2 3 4/3 4
l2 −1 −1 −1 3
β1 = 0; β 2 = b2 − β1 = 0 − 0 = 0; β 3 = 1 − 0 = 1; β 4 = 0 − 1=
α1 2 3/ 2 4/3 4
ln ln
α n = d n − un−1 β n = bn − β n−1
α n−1 α n−1
Thomas Algorithm
• Given: four vectors l, d, u and b
• Generate: two vectors α and β as
α1 = d1 and β1 = b1
li li
α i = d i − ui −1 β i = bi − β i −1
α i −1 α i −1
i = 2, 3, ….. n
• Solution: βn β i − ui xi +1
xn = xi =
αn αi
i = n-1 …… 3, 2, 1
• FP operations: 8(n-1) + 3(n-1) + 1 = 11n - 10
Thomas Algorithm: Example
2 − 1 0 0 x1 0
− 1 2 − 1 0 x 0
2 =
0 − 1 2 − 1 x3 1
0 0 − 1 1 x4 0
li li βn β −u x
α1 = d1 ; α i = d i − ui −1 ; β1 = b1 ; β i = bi − β i −1 xn = ; xi = i i i +1
α i −1 α i −1 αn αi
l2 −1 3 −1 4 −1 1
α1 = 2; α 2 = d 2 − u1 = 2 − (−1) = ; α 3 = 2 − (−1) = ; α 4 = 1 − (−1) =
α1 2 2 3/ 2 3 4/3 4
l2 −1 −1 −1 3
β1 = 0; β 2 = b2 − β1 = 0 − 0 = 0; β 3 = 1 − 0 = 1; β 4 = 0 − 1=
α1 2 3/ 2 4/3 4
j =1 j =i +1
xi( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
aii
Stopping Criteria
• Stop when: ║ e ║∞ ≤ ε
D U
a11 0 ⋅ ⋅ 0 ⋅ 0 0 a12 a13 ⋅ a1n
0 a 0 0 ⋅ 0 + 0 0 a
22 23 ⋅ a 2n
0 0 a33 0 ⋅ 0 0 0 0 ⋅ a3n
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
0 0 0 ⋅ ⋅ ⋅ann 0 0 ⋅ ⋅ 0
Iterative Methods
• A= L+ D + U
• Ax = b translates to (L + D + U)x = b
• Jacobi: for an iteration counter k
Dx ( k +1) = −(U + L) x ( k ) + b
x ( k +1) = − D −1 (U + L) x ( k ) + D −1b
j =1
n
• From the definition of eigenvalues: e
(k )
= ∑ C j λkj v j
j =1
j =1
n
• From the definition of eigenvalues: e
(k )
= ∑ C j λkj v j
j =1
Today:
• Eigenvalues
• Next Topic: System of nonlinear equations
Eigenvalues and Eigenvectors
• The system [A]{x}={b}: [A] operating on vector {x} to transform it
to another vector, {b}.
• It will, in general, lead to a change in “direction” as well as the
“length” of the vector {x}. Example, for a unit vector {x}:
1.5
1.25 0.75
[ A] = 1
0 . 75 1 . 25
0.5
cos θ
{x} = 0
sin θ -2 -1
-0.5
0 1 2
0
0 2 4 6 8 10 10
0
0 5 10 15 20 25
Largest Eigenvalue : Power method
• The algorithm is written as:
Choose an arbitrary unit vector z(0)
Multiply z(0) by A and normalize to get z(1)
(i +1) (i ) (i )
Repeat till z and z are the same (
(i) (i+1) z = Az / Az )
z is the eigenvector and the normalization factor is the
corresponding eigenvalue
2 1 1
• Example: . Choose starting vector as
1 2 0
• If we use L∞ norm:
2.5
• Bracketing does 2
0.5
Solution: 0
x2
-1
-1.5
-2
-2.5
-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y
• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y
( x(i ) , y(i ) )
y ( i +1)
(
= 11.72 − x )
x 1/ y
( x ( i +1) , y ( i ) )
0 2 2 1.646208 3.073785
• ξ is also a vector
∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1
x1
∂x2
x2
∂xn
xn
• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn
convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x
( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )
• Iterations:
( k +1)
x (1 + ln x)
x
y (1 + ln y )
y
∆x x x + y y − 11.72
y −1 = − y
yx + y x
ln y x y ln x + xy x −1 ( x ( k ) , y ( k ) ∆y x
x + y − 6.71 ( x ( k ) , y ( k )
0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392
1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435
2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868
3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086
4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001
Today:
• Finding All eigenvalues
• Eigenvectors and multiplicity
Finding All Eigenvalues
• Directly from the Characteristic Equation: How to
efficiently obtain the Characteristic Polynomial –
Faddeev-Le Verrier
• Using similarity transformation: Reduce to diagonal
or triangular form – QR decomposition
• The characteristic equation, det (A- λI)=0, may be
written as
(−1) (λ − an−1λ
n n n −1
− an − 2 λ
n−2
− ... − a1λ − a0 ) = 0
• It may be seen that an−1 = ∑ aii , i.e., trace(A)
• and a = (− 1) det (A)
n +1 i =1,...,n
0
Finding All Eigenvalues
• Fedeev-Leverrier came up with an algorithm for
obtaining all the coefficients of the polynomial.
• Set An-1=A and, as seen, an-1=trace(An-1)
• For i=n-2, n-3, …,1,0:
Ai=A(Ai+1-ai+1 I); ai=trace(Ai)/(n-i)
• Solve using any of the methods discussed earlier
to get all the eigenvalues.
• Another side-benefit is that we get the inverse of
A as A−1=[ A1-a1I ]/a0
Faddeev-Le Verrier method: Example
• Ai=A(Ai+1-ai+1 I); ai=trace(Ai)/(n-i) A−1=[ A1-a1I ]/a0
3 4 1 3 4 1
3 5 1 3 5 1
• Example: A= A2= a2=9;
2 2 1 2 2 1
− 4 − 2 − 1 1 0 0
A1= − 1 − 6 0 a =-7; A = 0 1 0 a0=1
1 0
− 4 2 − 4 0 0 1
• Characteristic polynomial is
(-1) (λ3- 9 λ2+7 λ-1); Roots: 8.16, 0.66, 0.19
3 − 2 − 1
• Inverse is −1 1 0
− 4 2 3
Finding All Eigenvalues : Similarity Transform
y2 =
( T
x2 − x y1 y1
2 )
( T
x2 − x y1 y1
2 )
3
3 1
Take x1 = ; x2 = 2.5
1 3
1 3 2
y1 =
10 1 1.5
T 6
x y1 =
2
1
10
− 4 / 5
0.5
( T
)
x2 − x y1 y1 =
2
12 / 5 -1
0
0 1 2 3
1 − 1
y2 =
10 3 y2 =
( T
x2 − x y1 y1
2 )
( T
x2 − x y1 y1
2 )
Orthogonalization: Gram-Schmidt Method
• It is easy to show that y2 is orthogonal to y1:
(x − (x y )y )
2
T
2 1 1
T T
( T
) T
y1 = x y1 − x y1 y y1 = 0
2 2 1
y3 =
( T
)
x3 − x y1 y1 − x y2 y2
3 ( T
3 )
( T
)
x3 − x y1 y1 − x y2 y2
3 ( T
3 )
• The general equation is
k
xk +1 − ∑ x ( T
y yi
k +1 i )
yk +1 = i =1
k
xk +1 − ∑ x ( T
y yi
k +1 i )
i =1
Finding All Eigenvalues : QR method
• We generate an orthogonal matrix Q
• We know that if A=S-1BS, the eigenvalues of B
will be same as those of A
• Also, if Q is orthogonal, its transpose is its
inverse
• If A=QTBQ for some Q and B, and if B is
diagonal or triangular, we get the eigenvalues
• For example, since a symmetric matrix has
orthogonal eigenvectors, we could construct
Q by using the eigenvectors as its columns
Finding All Eigenvalues : QR method
• Therefore: λ= xTAx/(xTx)
2.5
• Bracketing does 2
0.5
Solution: 0
x2
-1
-1.5
-2
-2.5
-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y
• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y
( x(i ) , y(i ) )
y ( i +1)
(
= 11.72 − x )
x 1/ y
( x ( i +1) , y ( i ) )
0 2 2 1.646208 3.073785
• ξ is also a vector
∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1
x1
∂x2
x2
∂xn
xn
• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn
convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x
( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )
• Iterations:
( k +1)
x (1 + ln x)
x
y (1 + ln y )
y
∆x x x + y y − 11.72
y −1 = − y
yx + y x
ln y x y ln x + xy x −1 ( x ( k ) , y ( k ) ∆y x
x + y − 6.71 ( x ( k ) , y ( k )
0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392
1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435
2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868
3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086
4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001
Today:
• Eigenvectors and multiplicity
• Iterative methods for linear equations: Convergence
• System of nonlinear equations
Finding Eigenvectors
• Once the eigenvalues are obtained, use
[A-λI]{x}={0}
to solve for {x}
• Note that a unique solution does not exist
• It will only give the direction of the vector
• Example: 3 1
A=
1 3 Eigenvalues: 2 and 4
For 2, x1+x2=0 => Eigenvector is {1,-1}T
For 4, -x1+x2=0 => Eigenvector is {1,1}T
Finding Eigenvectors: Multiple eigenvalues
• What if an eigenvalue is repeated? Known as the
algebraic multiplicity. E.g., in the QR method,
eigenvalue “1” had an algebraic multiplicity of 2.
2 1 1
A = 1 2 1 Eigenvalues : 4, 1, 1
1 1 2
• For this value, we get x1+x2+x3=0
• Eigenvectors may be taken as {1,-1,0}T and
{1,0,-1}T . Two linearly independent eigenvectors
for the same eigenvalue: called geometric
multiplicity of “2”.
Finding Eigenvectors: Multiple eigenvalues
• Another example: 2 1
A=
0 2
• Eigenvalue “2” has algebraic multiplicity of 2
• For this value, we get x2=0
• A single eigenvector {1,0}T : geometric
multiplicity is 1.
• Geometric multiplicity is always ≤ Alg. Mult
• A defective matrix has GM<AM for some λ
(called defective eigenvalue): It will not have n
linearly independednt eigenvalues.
Finding eigenvalues for given eigenvectors
• Straightforward: All components are multiplied
by the factor λ. Ratio of the L1, L2, or L∞ norm of
Ax and x could be used.
• Therefore: λ= xTAx/(xTx)
D U
a11 0 ⋅ ⋅ 0 ⋅ 0 0 a12 a13 ⋅ a1n
0 a 0 0 ⋅ 0 + 0 0 a
22 23 ⋅ a 2n
0 0 a33 0 ⋅ 0 0 0 0 ⋅ a3n
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
0 0 0 ⋅ ⋅ ⋅ann 0 0 ⋅ ⋅ 0
Iterative Methods
• A= L+ D + U
• Ax = b translates to (L + D + U)x = b
• Jacobi: for an iteration counter k
Dx ( k +1) = −(U + L) x ( k ) + b
x ( k +1) = − D −1 (U + L) x ( k ) + D −1b
or
Improving Convergence
Denoting: ρ (S ) = λ max
e ( k +1)
≅ λ max e (k )
or e ( k +1)
−e (k )
≅ λ max (e (k )
−e ( k −1)
)
j =1 j =i +1
xi( k +1) = , i = 1, 2, ⋅ ⋅ ⋅ ⋅ n
aii
Rewrite as:
Successive Over/Under Relaxation
2.5
• Bracketing does 2
0.5
Solution: 0
x2
-1
-1.5
-2
-2.5
-3
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1
System of nonlinear equations: Fixed Point
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
(
x = 6.71 − y ) x 1/ y
(
⇒ φ1 ( x, y ) = 6.71 − y x 1/ y
)
y = (11.72 − x ) x 1/ y
(
⇒ φ2 ( x, y ) = 11.72 − x )
x 1/ y
• Iterations: x ( i +1)
(
= 6.71 − y )
x 1/ y
( x(i ) , y(i ) )
y ( i +1)
(
= 11.72 − x )
x 1/ y
( x ( i +1) , y ( i ) )
0 2 2 1.646208 3.073785
φ1 ( x, y ) = (6.71 − y )
x 1/ y
(
; φ2 ( x, y ) = 11.72 − x )
x 1/ y
Fixed Point Method: Convergence
e ( i +1)
ξ−x
= ( i +1)
φ (ξ ) − φ ( x )
= (i )
• ξ is also a vector
∂φ1
( i +1) (i )∂φ1 (i ) ∂φ1 (i )
e1
= e1 + e2 + ... + en
∂x1
x1
∂x2
x2
∂xn
xn
• If ∂φ j ∂φ j ∂φ j
+ + ... < 1 ∀j from 1 to n
∂x1 ∂x2 ∂xn
convergence is guaranteed
System of nonlinear equations: Newton method
• Given n equations,
f1 ( x1 , x2 ,..., xn ) = 0; f 2 ( x1 , x2 ,..., xn ) = 0;...; f n ( x1 , x2 ,..., xn ) = 0
• Similar to the single equation method, we write:
n
∂f1
f1 x (
( k +1)
)
≈ f1 x + ∑
(k )
( ) (
x (jk +1) − x (jk ) =
0 )
j =1 ∂x j ( k )
x
( )
• Iterations: J x ( k ) ∆x ( k +1) =
−f x (k )
( )
∂f i ( x )
J is called the Jacobian matrix, given by J i , j =
∂x j
( k +1)
x = x (k ) ( k +1)
+ ∆x = x (k )
−J −1
(x ) f (x )
(k ) (k )
• Iterations:
( k +1)
x (1 + ln x)
x
y (1 + ln y )
y
∆x x x + y y − 11.72
y −1 = − y
yx + y x
ln y x y ln x + xy x −1 ( x ( k ) , y ( k ) ∆y x
x + y − 6.71 ( x ( k ) , y ( k )
0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392
1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435
2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868
3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086
4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001
| a jj y j | + ∑ a jk yk ≥| a jj y j | − ∑ a jk yk
k< j k< j
≥| a jj | . | y j | −∑ | a jk | . | yk |
k< j
∑| a
k> j
jk | . | xk | ≤ ∑ | a jk | . x
k> j
∞
≥| a jj | . | y j | −∑ | a jk | . | y j |
k< j
| a jj | −∑ | a jk | y ∞ ≤ ∑ | a jk | . x ≥ | a jj | −∑ | a jk | y
∞ k< j
∞
k< j k> j
∑ | a jk |
k> j
S ≤
∞
| a jj | −∑ | a jk |
k< j
Previous Lecture: Eigenvalues, Nonlinear system
• Eigenvectors and multiplicity
• Iterative methods for linear equations: Convergence
• System of nonlinear equations
Today:
• Approximation of functions and data
+ y 11.72; x =
x = x
+ y 6.71
y y x
( k +1)
x (1 + ln x)
x
y (1 + ln y )
y
∆x x x + y y − 11.72
y −1 x −1 = − y
yx + y x
ln y x ln x + xy ( x ( k ) , y ( k ) ) ∆y
y x
x + y − 6.71 ( x ( k ) , y ( k ) )
0 1.0000 2.0000 -6.7200 -3.7100 1.0000 6.7726 3.3863 1.0000 1.0000 6.7726 -0.0456 0.3088 0.8392
1 1.8392 2.8683 11.8882 5.9762 4.9354 42.1865 16.2721 7.9513 4.9354 42.1865 -0.0123 0.0652 -0.2435
2 1.5957 2.6150 2.7388 1.3201 3.0928 24.2235 10.0186 4.4150 3.0928 24.2235 -0.0193 0.1058 -0.0868
3 1.5089 2.5130 0.2726 0.1180 2.6254 19.4694 8.3839 3.5681 2.6254 19.4694 -0.0232 0.1265 -0.0086
4 1.5002 2.5002 0.0036 0.0012 2.5832 18.9449 8.2181 3.4910 2.5832 18.9449 -0.0238 0.1292 -0.0001
4
f(x)
0
0 0.5 1 1.5 2
x
Taylor’s series is not a very good fit! Other methods are needed.
Least Squares
• We treat the residual as an error term, Rm=f(x)-
fm(x), and then minimize its “magnitude”
Rm is a function of x.
Magnitude may be taken as the integral over
the domain (a,b)
To accommodate negative error, we square it
• The problem reduces to:
b
Minimize ∫ ( f ( x) − f m ( x) ) dx
2
j =0
f 2 ( x) = c0 + c1 (x − 1) + c2 ( x − 1) 2
(
f 2 ( x) = c0 + c1 (1 + x ) + c2 1 + x + x 2 )
( 2
) ( 2
) (
f 2 ( x) = c0 1 + x + x + c1 1 + 2 x + 3 x + c2 1 + 4 x + 9 x 2
)
Least Squares: Formulation
• We use a general form:
m
f m ( x ) = ∑ c jφ j ( x )
j =0
f max
over ( a, b)
Inner Product
b
∑ c φ ,φ
j =0
j j i = f , φi for i = 0,1,2 ,...,m
in which,
aij = φi , φ j ; bi = φi , f ; i, j = 0,1,2,..., m
Normal Equations: Example
• Approximate f(x)=1+x+x2 over the interval (0,2)
using a linear function, f1(x)
• Choose the linear function as f1(x)=c0+c1x
=> φ0 ( x) = 1; φ1 ( x) = x
2
a00 = φ0 , φ0 = ∫ 1dx = 2
0
2
a01 = a10 = φ0 , φ1 = ∫ xdx = 2
0
2
8
a11 = φ1 , φ1 = ∫ x dx =
2
0
3
Normal Equations: Example
2
• and 20
b0 = φ0 , f = ∫ 1 + x + x dx =
2
0
3
2
26
(
b1 = φ1 , f = ∫ x 1 + x + x dx =
2
)3
0
• The normal equations are
Solution :
2 2 c0 20 / 3
2 8 / 3 c = 1
26 / 3 c0 = ; c1 = 3
1 3
• Therefore, f1(x)=1/3 + 3 x
7
4
f(x)
0
0 0.5 1 1.5 2
x
Much better than Taylor’s series in overall sense (NOT near 1 !)
Normal Equations: Diagonal form
• If the matrix A becomes diagonal, c is easily
computed. Since we are free to choose the form
of the basis functions (φ’s), use “Orthogonal
polynomials”
b
• Recall: a = φ , φ = φ ( x)φ ( x) dx
ij i j ∫
a
i j
• Orthogonality: ∫ (d
−1
0 + d1 x )dx = 0 ⇒ d 0 = 0
• P2(x) = (−1+3x2)/2
• Orthogonality
0 i≠ j
1
Pi ( x), Pj ( x) = ∫ Pi ( x) Pj ( x)dx =
−1 2
i= j
2i + 1
Legendre polynomials: Example
Legendre polynomials: General Case
• For a general case, degree m:
2
0
2/3
𝐴𝐴 = 2/5
0 ⋱
2/(2𝑚𝑚 + 1)
1, 𝑓𝑓(𝑥𝑥)
𝑥𝑥, 𝑓𝑓(𝑥𝑥)
𝑏𝑏 = (3𝑥𝑥 2 − 1)/2, 𝑓𝑓(𝑥𝑥)
⋮
𝑃𝑃𝑚𝑚 𝑥𝑥 , 𝑓𝑓(𝑥𝑥)
Orthogonal polynomials: Tchebycheff
• Legendre: More error 7
6
f(x)
3
2
⇒ d 0 = 0 ⇒ T1 ( x) = x
7
4
f(x)
0
0 0.5 1 1.5 2
x
Taylor’s series and Least squares approximations
Orthogonal polynomials: Tchebycheff
• Legendre: More error 7
6
f(x)
3
2
d2
⇒ d0 + = 0; d1 = 0 ⇒ T2 ( x) = −1 + 2 x 2
2
Tchebycheff polynomials
• General form
(
Tn ( x ) = cos n cos x −1
)
• Recursive Formula
Tn ( x) = 2 xTn −1 ( x) − Tn − 2 ( x)
• Orthogonality 0 i≠ j
1
1
Ti ( x), T j ( x) = ∫ Ti ( x)T j ( x)dx = π i= j=0
−1 1− x 2
π
i= j≠0
2
P0(x)=1, P1(x) = x, P2(x) = (−1+3x2)/2, P3(x) = (−3x+5x3)/2, P4(x) = (3−30x2+35x4)/8
1
0.75
0.5
0.25
Pn(x)
0
-0.25
-0.5
-0.75
-1
-1 -0.5 0 0.5 1
x
T0(x)=1, T1(x) = x, T2(x) = (−1+2x2), T3(x) = (−3x+4x3), T4(x) = (1−8x2+8x4)
1
0.75
0.5
0.25
Tn(x)
0
-0.25
-0.5
-0.75
-1
-1 -0.5 0 0.5 1
x
Tchebycheff polynomials: Example
• Fit straight line to 1+x*+x*2 over (0,2)
• Normalize: x=x*−1
• f(x)=3+3x+x2
π 0 7π / 2
A= ;b =
0 π / 2 3π / 2
• Solution: f1(x*)= 7/2 + 3 (x*−1) = ½ + 3 x*
• The maximum error is reduced: known as the
Minimax approximation
7
4
f(x)
0
0 0.5 1 1.5 2
x
Least Squares Method : Recap
• Obtain the best mth degree polynomial fit to the
function f(x) over the interval (a,b)
e.g., best straight line, m=1, which fits 1+x+x2 ,
i.e., f(x), over the interval (0,2)
b
• Formulation: Minimize ∫ ( f ( x) − f m ( x) ) dx
2
a
2
e.g., Minimize ∫ (1 + x + x − f1 ( x) ) dx
2 2
0
Least Squares Method : Recap
• General form of approximating polynomial:
m
f m ( x ) = ∑ c jφ j ( x )
j =0
e.g., f1(x)=c0+c1x
2
b
m
• Minimization of ∑
∫a
f ( x ) −
j =0
c j φ j ( x )
dx
∫ (1 + x + x − (c )
e.g., 2
+ c1 x ) dx
2
0
0
Least Squares Method : Recap
• Stationary Point theorem
b
m
f − ∑
∫a j =0 φi dx = 0 for i = 0,1,2,...,m
c j φ j
e.g.,
b
∫ (1 + x + x )
2
− (c0 + c1 x) dx = 0 w.r.t. c0
a
b
∫ (1 + x + x )
2
− (c0 + c1 x) xdx = 0 w.r.t. c1
a
Least Squares Method : Recap
• Inner product: b
f , g = ∫ f .g dx
a
∑ c φ ,φ
j =0
j j i = f , φi for i = 0,1,2 ,...,m
e.g., c0 + c1 x,1 = 1 + x + x ,1 2
2
c0 + c1 x, x = 1 + x + x , x
Least Squares Method : Recap
• Normal Equations: [ A]{c} = {b}
aij = φi , φ j ; bi = φi , f ; i, j = 0,1,2,..., m
e.g.,
2 2
2
∫ 1.1dx ∫0 1.xdx c0 ∫0 1.(1 + x + x )dx
2
0 =
2 2 2
c1
∫ x.1dx ∫0 ∫
2
x. xdx x.(1 + x + x ) dx
0
0
Least Squares Method : Recap
2 2 c0 20 / 3
2 8 / 3 c =
1 26 / 3
1
Solution : c0 = ; c1 = 3
3
f1(x)=1/3 + 3 x
Least Squares Method : Recap
• Using orthogonal polynomials φ0 =1 and φ1 =1−x
2 0 c0 20 / 3
0 2 / 3 c = − 2
1
10
Solution : c0 = ; c1 = −3
3
f1(x)=10/3 − 3 (1−x) = 1/3 + 3 x
Least Squares Method : Recap
• Using Legendre polynomials φ0 =1 and φ1 = x
with the domain changed to (−1,1) using
*
x − b +2 a *
x= = x −1
b−a
2
2 0 c0 20 / 3 10
0 2 / 3 c = ⇒ c0 = 3 ; c1 = 3
1 2
f1(x)=10/3 + 3 x = 1/3 + 3 x*
Least Squares Method : Recap
• Using Tchebycheff polynomials φ0 =1 and φ1 = x
with the domain changed to (−1,1) using weight
of 1/√(1−x2)
π 0 7π / 2 7
A= ;b = ⇒ c0 = 2 ; c1 = 3
0 π / 2 3π / 2
f1(x)=7/2 + 3 x = 1/2 + 3 x*
Approximation of Data
• Data denoted by (xk , f (xk )) k = 0,1,2,..., n
• n+1 data points
• Approximating polynomial: fm(x)
• If m=n, unique polynomial passing through
all the data points - Interpolation
• If m<n, best-fit polynomial capturing the
trend of data – Regression : depends on the
definition of “best-fit”
• If m>n, non-unique polynomial passing
through all the points
1.2
0.8
f(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
Interpolation
• There is a unique nth degree polynomial
passing through the n+1 data points
(xk , f (xk )) k = 0,1,2,..., n
• Represent it as n
f n ( x ) = ∑ c jφ j ( x )
j =0
n
f n ( x) = c0 + c1 x + ... + cn x
Interpolation
• The polynomial must pass through all the n+1
data points: the coefficients are given by
1 x0 x02 . x0n c0 f ( x0 )
n
1 x1 x1 . x1 c1 f ( x1 )
2
n
1 x2 x2 . x2 c2 = f ( x2 ) ⇒ Ac = b
2
. . . . . . .
1
xn xn2 . xnn cn f (xn )
• Solve by any of the linear equation methods
• The A matrix is called Vandermonde matrix
• Unique solution if all x’s are distinct
• Ill- conditioned for large n: Not recommended
Interpolation: Example
• Find the interpolating polynomial for the
given data points: (0,1), (1,3), (2,7)
• 3 data points => n=2
• Second degree interpolating polynomial
• x0=0, f(x0)=0; x1=1, f(x1)=3; x2=2, f(x2)=7
1 0 0 c0 1
1 1 1 c = 3 ⇒ c = 1; c = 1, c = 1
1 0 1 2
0.5
Lj(x)
same data points as 0.25
j = 0 xi − x j
j ≠i
Lagrange Polynomial: Example
• Find the interpolating polynomial for the
given data points: (0,1), (1,3), (2,7)
• Second degree interpolating polynomial
2
( x − 1)( x − 2) 2 − 3 x + x
L0 = =
(0 − 1)(0 − 2) 2
( x − 0)( x − 2)
L1 = = 2x − x2
(1 − 0)(1 − 2)
( x − 0)( x − 1) − x + x 2
L2 = =
(2 − 0)(2 − 1) 2
• Interpolating polynomial is
2
1× L0 + 3 × L1 + 7 × L2 = 1 + x + x
Lagrange Polynomial: Example
• Useful when the grid points are fixed but
function values may be changing
• For example, estimating the temperature at a
point using the measured temperatures at a
few nearby points
• The value of the Lagrange polynomials at the
desired point need to be calculated only once
• Then, we just need to multiply these values
with the corresponding temperatures.
• What if a new measurement is added?
Interpolation: Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0
φi (x ) = ∏ ( x − x j ) 2
1.5
j =0
1
φi(x)0.5
• For example, using the 0
-1
0 0.5 1 1.5 2
x
Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0
φ0 (x ) = 1; φ1 (x ) = x − x0 ; φ2 (x ) = (x − x0 )(x − x1 )
φ3 (x ) = (x − x0 )(x − x1 )(x − x2 )
• Applying the equality of function value and
the polynomial value at x=x0: c0=f(x0).
• At x=x1:
f ( x1 ) − f ( x0 )
f ( x1 ) = c0 + c1 (x1 − x0 ) ⇒ c1 =
x1 − x0
Newton’s divided difference
• At x=x2:
f ( x2 ) = c0 + c1 ( x2 − x0 ) + c2 ( x2 − x0 )( x2 − x1 )
f ( x1 ) − f ( x0 )
f ( x2 ) − f ( x0 ) − ( x2 − x0 )
x1 − x0
⇒ c2 =
(x2 − x0 )(x2 − x1 )
f ( x2 ) − f ( x1 ) f ( x1 ) − f ( x0 )
−
x2 − x1 x1 − x0
=
(x2 − x0 )
Newton’s divided difference
• The divided difference notation:
f ( x j ) − f ( xi )
[
f x j , xi = ] x j − xi
[
= f xi , x j ]
[
f xk , x j , xi = ] [ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
xk − xi
f [xn , xn −1 ,..., x2 , x1 ] − f [xn −1 ,..., x2 , x1 , x0 ]
f [xn , xn −1 ,..., x2 , x1 , x0 ] =
xn − x0
0 1
2 c0 = 1; c1 = 2; c2 = 1
1 3 1
4 f 2 ( x) = 1 + 2( x − 0) + 1( x − 0)( x − 1)
2 7
= 1+ x + x2
Spline Interpolation
• Using piece-wise polynomial interpolation
• Given (xk , f (xk )) k = 0,1,2,..., n
Previous Lecture: Approximation of functions
Today:
• Orthogonal polynomials: Tchebycheff
• Interpolation
1.2
0.8
f(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
Lagrange polynomials
n x − xj
Li ( x ) = ∏
n
f n ( x) = ∑ c j L j (x ) xi − x j
j =0
j =0 j ≠i
1
0.75
0.5
Lj(x)
0.25
-0.25
0 0.5 1 1.5 2
x
Lagrange Polynomial
• Useful when the grid points are fixed but
function values may be changing (estimating
the temperature at a point using the
measured temperatures at nearby points)
• The value of the Lagrange polynomials at the
desired point need to be calculated only once
• Then, we just need to multiply these values
with the corresponding temperatures
• What if a new measurement is added?
• The polynomials will need to be recomputed
Interpolation: Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0
φi (x ) = ∏ ( x − x j ) 2
1.5
j =0
1
φi(x)0.5
• For example, using the 0
-1
0 0.5 1 1.5 2
x
Newton’s divided difference
n
f n ( x ) = ∑ c jφ j ( x )
j =0
φ0 (x ) = 1; φ1 (x ) = x − x0 ; φ2 (x ) = (x − x0 )(x − x1 )
φ3 (x ) = (x − x0 )(x − x1 )(x − x2 )
• Applying the equality of function value and
the polynomial value at x=x0: c0=f(x0).
• At x=x1:
f ( x1 ) − f ( x0 )
f ( x1 ) = c0 + c1 (x1 − x0 ) ⇒ c1 =
x1 − x0
Newton’s divided difference
• At x=x2:
f ( x2 ) = c0 + c1 ( x2 − x0 ) + c2 ( x2 − x0 )( x2 − x1 )
f ( x1 ) − f ( x0 )
f ( x2 ) − f ( x0 ) − ( x2 − x0 )
x1 − x0
⇒ c2 =
(x2 − x0 )(x2 − x1 )
f ( x2 ) − f ( x1 ) f ( x1 ) − f ( x0 )
−
x2 − x1 x1 − x0
=
(x2 − x0 )
Newton’s divided difference
• The divided difference notation:
f ( x j ) − f ( xi )
[
f x j , xi = ] x j − xi
[
= f xi , x j ]
[
f xk , x j , xi = ] [ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
xk − xi
f [xn , xn −1 ,..., x2 , x1 ] − f [xn −1 ,..., x2 , x1 , x0 ]
f [xn , xn −1 ,..., x2 , x1 , x0 ] =
xn − x0
0 1
2 c0 = 1; c1 = 2; c2 = 1
1 3 1
4 f 2 ( x) = 1 + 2( x − 0) + 1( x − 0)( x − 1)
2 7
= 1+ x + x2
Newton’s divided difference: Error
• The remainder may be written as:
Rn ( x) = f ( x) − f n ( x) = φn +1 ( x) f [x, xn , xn −1 ,..., x2 , x1 , x0 ]
where
φn +1 (x ) = (x − x0 )(x − x1 )(x − x2 )...(x − xn )
0.5
0.4
0.3
1.0
0.2
0.1 n=20 0.8
0.0
-1 -0.5 0 0.5 1
x 0.6
0.4
f(x)
0.2
0.0
-1 -0.5 0 0.5 1
-0.2
x
Spline Interpolation
• Using piece-wise polynomial interpolation
• Given (xk , f (xk )) k = 0,1,2,..., n
• Interpolate using “different” polynomials
between smaller segments
• Easiest: Linear between each successive pair
• Problem: First and
1.0
0.9
0.8
higher derivatives
0.7
0.6
0.5
f(x)
would be discontinuous
0.4
0.3
0.2
0.1
0.0
-1 -0.5 0 0.5 1
x
Spline Interpolation
• Most common: Cubic spline
• Given (xk , f (xk )) k = 0,1,2,..., n
• Interpolate using the cubic splines:
Segment, i
x0 xi xi+1 xn
Corner Node x Corner
(or End Node) Node
Spline Interpolation
• Total n segments => 2n d.o.f
• Equality of first and second derivative at
interior nodes : 2(n-1) constraints
• Need 2 more constraints (discussed later)!
• How to obtain the coefficients?
• The second derivative of the cubic spline is
linear within a segment. Write it as
1
Si′′( x) = [(xi +1 − x )Si′′( xi ) + (x − x i )Si′′( xi +1 )]
xi +1 − xi
Spline Interpolation
• Integrate it twice:
Si ( x) =
1
6( xi +1 −x )
[(x ′′ 3
′′ ]
i +1 − x ) S i ( xi ) + ( x − x i ) S i ( xi +1 ) + C1 x + C 2
3
f (x ) =
( xi +1 − xi )
2
S i′′( xi )
+ C1 xi + C2
i
6
f ( xi +1 ) =
( xi +1 − x i ) S i′′( xi +1 )
2
+ C1 xi +1 + C2
6
Spline Interpolation
• Resulting in
Si ( x) =
( xi +1 − x ) Si′′( xi ) + ( x − x i ) Si′′( xi +1 )
3 3
6( xi +1 − xi )
f ( xi ) ( xi +1 − xi )Si′′( xi )
+ − (x i +1 − x)
xi +1 − xi 6
f ( xi +1 ) ( xi +1 − xi )Si′′( xi +1 )
+ − ( x − xi )
xi +1 − xi 6
S i′−1 (x ) =
( xi − x i −1 )S i′′−1 ( xi ) ( xi − xi −1 )Si′′−1 ( xi −1 ) f ( xi ) − f ( xi −1 )
+ +
i
3 6 xi − xi −1
• Second derivative is also continuous
• We get a tridiagonal system
(xi − x i −1 )Si′′−1 + 2(xi +1 − x i −1 )Si′′+ (xi +1 − x i )Si′′+1
f ( xi +1 ) − f ( xi ) f ( xi ) − f ( xi −1 )
=6 −6
xi +1 − xi xi − xi −1
Spline Interpolation
• What are the 2 more required constraints?
Clamped: The function is clamped on each corner
node forcing both ends to have some known fixed
slope, say, s0 and sn. This implies S 0′ = s0 and S n′ = sn
Natural: Curvature at the corner nodes is zero, i.e.,
S 0′′ = S n′′ = 0
Not-a-knot: The first and last interior nodes have C3
continuity, i.e., these do not act as a knot, i.e.,
S 0 ( x ) ≡ S1 ( x ) and S n − 2 (x ) ≡ S n −1 (x )
6( xi +1 − xi )
f ( x2 ) ( x3 − x2 )S 2′′
+ − ( x3 − x )
x3 − x2 6
f ( x3 ) ( x3 − x2 )S3′′
• Putting values
+ − ( x − x2 )
x3 − x2 6
S2 ( x) =
(3 − x ) 0.2319 + (x − 2) 0.03025
3 3
+ 0.2 −
0.2319
(3 − x ) + 0.1 − 0.03025
(x − 2)
6 6 6
0.8
f(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
If the data, f(x), may have uncertainty, we do not want the approximating function
to pass through ALL data points. Regression minimizes the “error”.
Regression
• Given (xk , f (xk )) k = 0,1,2,..., n
• Fit an approximating function such that it is
“closest” to the data points
• Mostly polynomial, of degree m (m<n)
• Sometimes trigonometric functions
• As before, assume the approximation as
m
f m ( x ) = ∑ c jφ j ( x )
j =0
Regression: Least Squares
• Minimize the sum of squares of the difference
between the function and the data:
2
n m
∑ f ( x k ) − ∑ c jφ j ( x k )
k =0 j =0
• Results in m+1 linear equations (that is why
the term Linear Regression): [A]{c}={b}. Called
the Normal Equations.
n n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
Regression: Least Squares
• For example, using conventional form, φj=xj,
n n n n
n
∑1 ∑ xk ∑ k ... ∑ xk ∑ f ( xk )
2 m
x
kn=0 k =0 k =0 k =0
c k =0
m +1 0
n n n n
x ... ∑ xk ∑ xk f ( xk )
∑ ∑x ∑x
2 3
k k k
c1 k =0
n
k =0 k =0 k =0 k =0
n n n n
∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2 c2 = ∑ xk2 f ( xk )
k =0 k =0 k =0 k =0 . k =0
. . . . . . .
n. n
.
n
. .
n
. cm
n
.
xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m ∑ xkm f ( xk )
k =0 k =0 k =0 k =0 k =0
Least Squares: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
x 0 1 2 3 4
f(x) 1 0.5 0.2 0.1 0.05882
4 4 4
4
∑1 ∑x ∑ ∑
2
k x
k f ( x k )
k4=0 k =0 k =0
c0 4 k = 0
5 10 30 c0 1.85882
4 4
3
x
∑ ∑x ∑ ∑ 100 c1 = 1.43528
2
x k 1c = x f ( x )
k ⇒ 10 30
k k
k
c2 4
k =0 k =0 k =0 k =0
4 4 4
x 2 f ( x ) 30 100 354 c2 3.14112
∑ xk2
k =0
∑x
k =0
3
k ∑
k =0
x 4
k
∑
k =0
k k
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
1.2
0.8
f(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
Lagrange polynomials
n x − xj
Li ( x ) = ∏
n
f n ( x) = ∑ c j L j (x ) xi − x j
j =0
j =0 j ≠i
1
0.75
0.5
Lj(x)
0.25
-0.25
0 0.5 1 1.5 2
x
Newton’s divided difference
2.5
2
1.5
1
φi(x) 0.5
0
-0.5
-1
0 0.5 1 1.5 2
x
f ( x j ) − f ( xi )
[ ]
f x j , xi =
x j − xi
[
= f xi , x j ]
i −1
φi (x ) = ∏ ( x − x j ) [ ]
f xk , x j , xi =
[ ] [
f xk , x j − f x j , xi ] = f [x , x , x ] = ...
i j k
j =0 xk − xi
0.5
0.4
0.3
1.0
0.2
0.1 n=20 0.8
0.0
-1 -0.5 0 0.5 1
x 0.6
0.4
f(x)
0.2
0.0
-1 -0.5 0 0.5 1
-0.2
x
Spline Interpolation
Interior
Nodes
f(x) Si (x)
Segment, i
x0 xi xi+1 xn
Corner Node x Corner
(or End Node) Node
Linear Spline
1.0
0.9
0.8
0.7
0.6
0.5
f(x)
0.4
0.3
0.2
0.1
0.0
-1 -0.5 0 0.5 1
x
Cubic Spline
Si ( x) =
( xi +1 − x ) Si′′( xi ) + ( x − x i ) Si′′( xi +1 )
3 3
6( xi +1 − xi )
f ( xi ) ( xi +1 − xi )Si′′( xi )
+ − (x i +1 − x)
xi +1 − xi 6
f ( xi +1 ) ( xi +1 − xi )Si′′( xi +1 )
+ − ( x − xi )
xi +1 − xi 6
0.8
f(x)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
If the data, f(x), may have uncertainty, we do not want the approximating function
to pass through ALL data points. Regression minimizes the “error”.
Regression
• Given (xk , f (xk )) k = 0,1,2,..., n
• Fit an approximating function such that it is
“closest” to the data points
• Mostly polynomial, of degree m (m<n)
• Sometimes trigonometric functions
• As before, assume the approximation as
m
f m ( x ) = ∑ c jφ j ( x )
j =0
Regression: Least Squares
• Minimize the sum of squares of the difference
between the function and the data:
2
n m
∑ f ( x k ) − ∑ c jφ j ( x k )
k =0 j =0
• Results in m+1 linear equations (that is why
the term Linear Regression): [A]{c}={b}. Called
the Normal Equations.
n n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
Regression: Least Squares
• For example, using conventional form, φj=xj,
n n n n
n
∑1 ∑ xk ∑ k ... ∑ xk ∑ f ( xk )
2 m
x
kn=0 k =0 k =0 k =0
c k =0
m +1 0
n n n n
x ... ∑ xk ∑ xk f ( xk )
∑ ∑x ∑x
2 3
k k k
c1 k =0
n
k =0 k =0 k =0 k =0
n n n n
∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2 c2 = ∑ xk2 f ( xk )
k =0 k =0 k =0 k =0 . k =0
. . . . . . .
n. n
.
n
. .
n
. cm
n
.
xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m ∑ xkm f ( xk )
k =0 k =0 k =0 k =0 k =0
Least Squares: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
x 0 1 2 3 4
f(x) 1 0.5 0.2 0.1 0.05882
4 4 4
4
∑1 ∑x ∑ ∑
2
k x
k f ( x k )
k4=0 k =0 k =0
c0 4 k = 0
5 10 30 c0 1.85882
4 4
3
x
∑ ∑x ∑ ∑ 100 c1 = 1.43528
2
x k 1c = x f ( x )
k ⇒ 10 30
k k
k
c2 4
k =0 k =0 k =0 k =0
4 4 4
x 2 f ( x ) 30 100 354 c2 3.14112
∑ xk2
k =0
∑x
k =0
3
k ∑
k =0
x 4
k
∑
k =0
k k
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Least Squares: Orthogonal polynomials
• Equidistant points xk ; k = 0,1,2,..., n
2
• Minimize: f ( x ) − c φ ( x ) => [A]{c}={b}
n m
∑ k ∑ j j k
n
k =0 j =0
n
aij = ∑ φi ( xk )φ j ( xk ) and bi = ∑ φi ( xk ) f ( xk )
k =0 k =0
• Choose orthonormal basis functions: Known
as Gram’s polynomials, or discrete
Tchebycheff polynomials -- denote by Gi(x).
• Normalize the data range from −1 to 1.
2i
• Implies that xi = −1+
n
Least Squares: Orthogonal polynomials
• Gi(x) is a polynomial of degree i.
n
1
∑
k =0
G0 ( xk )G0 ( xk ) = 1 ⇒ G0 ( x) =
n +1
• Assume G1(x) = d0+d1 x
n
1
∑ (d 0 + d1 x ) = 0 ⇒ d 0 = 0 since ∑ x = 0
k =0 n +1
n
1 1
∑ (d + d1 x ) = 1 ⇒ d1 =
2
0 =
n 2
n
2k
∑
k =0
∑
2
x −1+
k =0 k =0 n
Gram polynomials
• Therefore:
1 3n
d1 = =
n
4k 2 4k (n + 1)(n + 2)
∑ 1 + 2 −
k =0 n n
• Recursive relation:
αi
Gi +1 ( x) = α i xGi ( x) − Gi −1 ( x) for i = 1,2,..., n − 1
α i −1
1 3n n (2i + 1)(2i + 3)
G0 ( x) = ; G1 ( x) = x ;α i =
n +1 (n + 1)(n + 2) i + 1 (n − i )(n + i + 2)
Gram polynomials: Example
• From the following data (n=4), estimate f(2.6), using
regression with a quadratic polynomial (m=2):
t 0 1 2 3 4
f(t) 1 0.5 0.2 0.1 0.05882
• Normalize: x=t/2-1
1 2 2
• For n=4, we get G ( x) = 0 ; G1 ( x) = x ; G2 ( x) =
5
(
7
)
2x2 −1
5
• Normal Equations:
4
∑ G0 ( xk ) f ( xk )
1 0 0 c0 k =40 0.831290
0 1 0 c = G ( x ) f ( x ) = − 0.721746
1 ∑ 1 k k
0 0 1 c2 k4=0 0.298702
G ( x ) f ( x )
∑ 2 k k
k =0
Gram polynomials: Example
0.8313 2 2
f 2 ( x) = − 0.7217 x + 0.2987
5 7
(2
2x −1 )
5
• f(t=2.6)=f(x=0.3)= 0.1039
• Same as before 1.0
0.8
• How to estimate 0.6
• Coefficient of
0.2
0.0
S t = ∑ ( f ( xk ) − f )
k =0
∑ f (x ) k
f = k =0
n +1
S r = ∑ ( f ( xk ) − f m ( x k ) )
k =0
E = ∑ f ( xk ) − ∑ c jφ j ( xk )
k =0 j =0
• Derivative w.r.t. ci ( i from 0 to m )
∂E n m
= 0 ⇒ 2∑ f ( xk ) − ∑ c jφ j ( xk ) (− φi ( xk ) ) = 0
∂ci k =0 j =0
∂c1 k =0 j =0
...
∂E n m
∂cm
(
= 0 ⇒ 2∑ f ( xk ) − ∑ c j xk − xkm = 0
j
)
k =0 j =0
Regression: Least Squares
n n n n
n
∑1 ∑ xk ∑ k ... ∑ xk ∑ f ( xk )
2 m
x
kn=0 k =0 k =0 k =0
c k =0
m +1 0
n n n n
x ... ∑ xk ∑ xk f ( xk )
∑ ∑ k ∑ k
2 3
x x
k
c1 k =0
n
k =0 k =0 k =0 k =0
n n n n
∑ xk2 ∑ k
x 3
∑ k
x 4
... ∑ xkm + 2 c2 = ∑ xk2 f ( xk )
k =0 k =0 k =0 k =0 . k =0
. . . . . . .
n. n
.
n
. .
n
. cm
n
.
xm
∑ k ∑ k
x m +1
∑ k
x m+ 2
. ∑ xk2 m ∑ xkm f ( xk )
k =0 k =0 k =0 k =0 k =0
Gram polynomials
• Normalize: x = -1 to 1
• For example, if n=4, m=2:
f 2 ( x) = c0G0 ( x) + c1G1 ( x) + c2G2 ( x)
1 2 2
G0 ( x) = ; G1 ( x) = x ; G2 ( x) =
5 7
(
2x2 −1 )
5
• Normal Equations:
4
∑ G0 ( xk ) f ( xk )
1 0 0 c0 k =40
0 1 0 c = G ( x ) f ( x )
1 ∑ 1 k k
• Coefficient of determination
2 St − S r
r =
St
• r is called the correlation coefficient
• r2<0.3 poor fit, >0.8 good fit
Multiple Regression
• For a function of 2 (or more) variables
(xk , yk , f (xk , yk )) k = 0,1,2,..., n
2
n m2 m1
• Minimize ∑ f ( xi , yi ) − ∑∑ c j ,k xi yi
j k
i =0 k =0 j =0
Nonlinear Regression
n
Minimize ∑ ( f ( xk ) − f m ( x, c0 , c1 ,..., cm ) )
2
•
k =0
• The normal equations are nonlinear Ac=b
• A=f(c)
• May be solved using Newton method
Numerical Differentiation and Integration
• Given data (xk , f (xk )) k = 0,1,2,..., n
Numerical Differentiation
• Estimate the integral: e.g., from measured
flow velocities in a pipe, estimate discharge, i.e.,
R
Numerical Integration
Numerical Differentiation
• Estimate the derivatives of a function from
given data
(xk , f (xk )) k = 0,1,2,..., n
• Start with the first derivative
• Simplest: The difference of the function values
at two consecutive points divided by the
difference in the x values
• Finite Difference: The analytical derivative has
zero ∆x, but we use a finite value
• What if we want more accurate estimates?
First Derivative
• For simplicity, let us use fi for f(xi)
• Assume that the x’s are arranged in increasing
order (xn>xn-1>…>x0).
• For estimating the first derivative at xi :
′ f i +1 − f i
– Forward difference: f i = x − x
i +1 i
′ f i − f i −1
– Backward difference: f i =
xi − xi −1
– Central difference: ′ f i +1 − f i −1
fi =
xi +1 − xi −1
First Derivative
• Most of the times, the function is “measured”
at equal intervals
• Assume that xn−xn-1= xn-1−xn-2 = … =x1 − x0 = h
• Then, the first derivative at xi :
′ f i +1 − f i
– Forward difference: fi =
h
′ f i − f i −1
– Backward difference: fi =
h
′ f i +1 − f i −1
– Central difference: fi =
2h
First Derivative: Error Analysis
• What is the error in these approximations?
• As an example, if the exact function is a
straight line, the estimate would have no error
• For forward difference, use Taylor’s series:
h2 h m [m] h m +1
f i +1 = f i + hf ′( xi ) + f ′′( xi ) + ... + f ( xi ) + f [ m +1] (ζ f )
2 m! (m + 1)!
ζ f ∈ (xi , xi +1 )
• Derivative at xi
f i + 2 − f i +1 f i +1 − f i
−
f i +1 − f i h h − 3 f i + 4 f i +1 − f i + 2
+ ( − h) =
h 2h 2h
Combine two estimates: Richardson Extrapolation
• O(h) accurate: f i +1 − f i
f ′( xi ) = + O ( h)
h
• Write f i +1 − f i
′
f ( xi ) = + E + O(h ) 2
h
fi+2 − fi
• and ′
f ( xi ) = + 2 E + O(h ) 2
2h
• Eliminate E: 2 f ′( x ) − f ′( x ) = 2 f i +1 − f i − f i + 2 − f i
i i
h 2h
− 3 f i + 4 f i +1 − f i + 2
f ′( xi ) = + O(h 2 )
2h
First Derivative: Taylor’s series
h2 h3 h4
f i +1 = f i + hf ′( xi ) + f ′′( xi ) + f ′′′( xi ) + f ′′′′(ζ f 1 )
2 6 4!
2 3 4
4h 8h 16h
′
f i + 2 = f i + 2hf ( xi ) + ′′
f ( xi ) + ′′′
f ( xi ) + f ′′′′(ζ f 1 )
2 6 4!
ζf1 Є (xi, xi+1) and ζf2 Є (xi, xi+2)
2h 3
Taylor’s series
− 3 f i + 4 f i +1 − f i + 2
• O(h2) accurate f i′ =
2h
2
h
Error : − f ′′( xi ) + O(h 3 )
• General Method: 3
1
f i′ = (ci f i + ci +1 f i +1 + ci + 2 f i + 2 )
h
ci + ci +1 + ci + 2 h
= f i + (ci +1 + 2ci + 2 ) f ′( xi ) + (ci +1 + 4ci + 2 ) f ′′( xi )
h 2
h2
+ (ci +1 + 8ci + 2 ) f ′′′( xi ) + ...
6
ci + ci +1 + ci + 2 = 0
• Equate coefficients: => -3/2,2,-1/2
ci +1 + 2ci + 2 = 1
ci +1 + 4ci + 2 = 0
Backward difference
• Similarly, for backward difference, O(h2) accurate:
1
f i′ = (ci f i + ci −1 f i −1 + ci − 2 f i − 2 )
h
ci + ci −1 + ci − 2 h
= f i − (ci −1 + 2ci − 2 ) f ′( xi ) + (ci −1 + 4ci − 2 ) f ′′( xi )
h 2
h2
− (ci −1 + 8ci − 2 ) f ′′′( xi ) + ...
6
3 f i − 4 f i −1 + f i − 2
ci + ci −1 + ci − 2 = 0 f i′ =
2h
ci −1 + 2ci − 2 = −1 2
h
ci −1 + 4ci − 2 = 0 Error : f ′′( xi ) + O(h 3 )
3
Central Difference
• And, for central difference, O(h4) accurate:
1
f i′ = (ci −2 f i −2 + ci −1 f i −1 + ci f i + ci +1 f i +1 + ci + 2 f i + 2 )
h
c +c +c +c +c
= i − 2 i −1 i i +1 i + 2 f i + (− 2ci − 2 − ci −1 + ci +1 + 2ci + 2 ) f ′( xi )
h
h h2
+ (4ci − 2 + ci −1 + ci +1 + 4ci + 2 ) f ′′( xi ) + (− 8ci − 2 − ci −1 + ci +1 + 8ci + 2 ) f ′′′( xi )
2 6
h3
+ (16ci − 2 + ci −1 + ci +1 + 16ci + 2 ) f ′′′′( xi ) + ...
24
ci − 2 + ci −1 + ci + ci +1 + ci + 2 = 0
f i − 2 − 8 f i −1 + 0 f i + 8 f i +1 − f i + 2
− 2ci − 2 − ci −1 + ci +1 + 2ci + 2 = 1 f i′ =
12h
4ci − 2 + ci −1 + ci +1 + 4ci + 2 = 0
h 4 [5]
− 8ci − 2 − ci −1 + ci +1 + 8ci + 2 = 0 Error : f ( xi ) + O(h 6 )
16ci − 2 + ci −1 + ci +1 + 16ci + 2 = 0
30
General formulation
• In general, for the nth derivative
nf
1
fi [n]
= n
h
∑c
j = − nb
i+ j fi+ j
• Central difference? ′ f i +1 − f i −1
fi =
xi +1 − xi −1