Numerical Analysis

CS450/ECE 491: Introduction to Scientific Computing
❑ Course webpage:
❑ https://relate.cs.illinois.edu/course/cs450-s22/
❑ Quizzes, homework, lecture notes, links to recorded lectures

will be found on the Relate page.
❑ Homework and quiz submissions will also be submitted on the

Relate page.
❑ Exams (2 midterms and a final) will be at the CBTF.

CS450/ECE 491: Introduction to Scientific Computing
❑ Paul Fischer: 4320 Siebel
❑ fischerp@illinois.edu
❑ Off. Hours: 11:30-1 WF (online, for now)
❑ TAs:
Contact us by email if you

have any questions!
Text Book: Scientific Computing, M. Heath
❑ To the extent possible, notation and slides will be

drawn from the text
❑ The lectures will be designed to expand on and

complement the text.
❑ We will use a combination of the text slides plus

additional notes.
❑ Slides will be posted on the Relate page.
❑ Lectures will be recorded.

Scientific Computing
• What is scientific computing?

– Design and analysis of algorithms for numerically solving
mathematical problems in science and engineering
– Traditionally called numerical analysis
• Distinguishing features of scientific computing
– Deals with continuous quantities
– Considers e↵ects of approximations
• Why scientific computing?
– Simulation of natural phenomena
– Virtual prototyping of engineering designs
Simulation Example:
Example: Convective Convective Transport
Transport
⇢
@u @u initial conditions
= c +
@t @x boundary conditions
(See Fig. 11.1 in text.)

❑ Examples: ❑ Problem Characteristics:
❑ Ocean currents: ❑ Large (sparse) linear systems: n=106 – 1012
<example:
❑ Pollution
convect demo>
❑ Saline ❑ Demands:
❑ Thermal transport ❑ Accurate approximations
❑ Atmosphere ❑ Fast (low-cost) algorithms
❑ Climate ❑ Stable algorithms
❑ Weather
❑ Industrial processes
❑ Combustion
❑ Automotive engines
❑ Gas turbines
Simulation Example: Convective Transport
o Temperature distribution in hot + cold mixing at T-junction

o 100 million dofs: 20 hours runtime on 16384 cores
o Solve several 108 x 108 linear systems every second
Numerical Simulation
❑ Related course material

❑ Linear systems - chapter 2
❑ Eigenvalues / eigenvectors - chapter 4
❑ Interpolation - chapter 7
❑ Numerical integration/differentiation - chapter 8
❑ Initial value problems - chapter 9
❑ Bounary value problems - chapter 10
❑ Numerical PDEs (simplified) - chapter 11
Data Fitting / Optimization
❑ Examples
❑ Weather prediction (data assimilation)
❑ Image recognition
❑ Etc.
❑ Related course material

❑ Linear / Nonlinear Least Squares - chapter 3
❑ Singular Value Decomposition - chapter 3/4
❑ Nonlinear systems - chapter 5
❑ Optimization - chapter 6
❑ Interpolation - chapter 7
Main Topics / Take-Aways for Chapter 1 (1/2)
• Conditioning of a problem
• Condition number
• Stability of an algorithm
• Errors
– Relative / absolute error
– Total error = computational error + propagated-data error
– Truncation errors
– Rounding errors
• Floating point numbers: IEEE 64
• Floating point arithmetic
– Rounding errors
– Cancellation
Main Topics / Take-Aways for Chapter 1 (2/2)
• Floating point numbers: IEEE 64

• Floating point arithmetic
– Rounding errors
– The Standard Model: f l(a b) = (a b)(1 + ✏)
– Commutativity and associativity
– Cancellation
Scientific Computing Introduction
Approximations Computational Problems
Computer Arithmetic General Strategy
Well-Posed Problems
Problem is well-posed if solution

exists
is unique
depends continuously on problem data
Otherwise, problem is ill-posed
Even if problem is well posed, solution may still be

sensitive to input data
Computational algorithm should not make sensitivity worse
Michael T. Heath Scientific Computing 4 / 46

Scientific Computing Introduction
Approximations Computational Problems
Computer Arithmetic General Strategy
General Strategy
Replace difficult problem by easier one having same or

closely related solution
infinite ! finite
differential ! algebraic
nonlinear ! linear
complicated ! simple
Solution obtained may only approximate that of original

problem

Scientific Computing Sources of Approximation
Approximations Error Analysis
Computer Arithmetic Sensitivity and Conditioning
Sources of Approximation
Before computation
modeling
empirical measurements
previous computations
During computation
truncation or discretization
rounding
Accuracy of final result reflects all these Conditioning!
Uncertainty in input may be amplified by problem
Perturbations during computation may be amplified by

algorithm Stability!

Example: Approximations
Computing surface area of Earth using formula A = 4⇡r2

involves several approximations
Earth is modeled as sphere, idealizing its true shape
Value for radius is based on empirical measurements and
previous computations
Value for ⇡ requires truncating infinite process
Values for input data and results of arithmetic operations
are rounded in computer

Absolute Error and Relative Error
Absolute error : approximate value true value
absolute error
Relative error :
true value
Equivalently, approx value = (true value) ⇥ (1 + rel error)
True value usually unknown, so we estimate or bound

error rather than compute it exactly
Relative error often taken relative to approximate value,

rather than (unknown) true value

Data Error and Computational Error
Typical problem: compute value of function f : R ! R for

given argument
x = true value of input
f (x) = desired result
x̂ = approximate (inexact) input
fˆ = approximate function actually computed
Total error: fˆ(x̂) f (x) =
fˆ(x̂) f (x̂) + f (x̂) f (x)

computational error + propagated data error
Algorithm has no effect on propagated data error

Data Error and Computational Error
Typical problem: compute value of function f : R ! R for

given argument
x = true value of input
f (x) = desired result
x̂ = approximate (inexact) input
fˆ = approximate function actually computed
Total error: || fˆ(x̂) f (x)||=
|| fˆ(x̂) f (x̂)|| + || f (x̂) f (x) ||

computational error + propagated data error
Algorithm has no effect on propagated data error

Truncation Error and Rounding Error

Truncation error : difference between true result (for actual
input) and result produced by given algorithm using exact
arithmetic
Due to approximations such as truncating infinite series or
terminating iterative sequence before convergence
Rounding error : difference between result produced by
given algorithm using exact arithmetic and result produced
by same algorithm using limited precision arithmetic
Due to inexact representation of real numbers and
arithmetic operations upon them
Computational error is sum of truncation error and
rounding error, but one of these usually dominates
< interactive example >

2 m!
Truncation Error Example

• Take m = 2:
f (x + h) f (x) 0 h 00
❑ Recall Taylor
h series:
= f (x)
| {z }
+
2
f (⇠)
| (k) {z } | {z }
• If fcomputable desiredonresult
exists (is bounded) [x, x + h], k = 0, . error
truncation . . , m, then there exists
a ⇠ 2 [x, x + h] such that
h 00 h 002
• Truncation error: f
2 0 (⇠) ⇡ 2 fh (x)
00
as h ! 0. hm (m)
f (x + h) = f (x) + hf (x) + f (x) + · · · + f (⇠).
2 m!
h 00 h 00
– To be precise, 2 f (⇠) = 2 f (x) + O(h2 )
• Take m = 2:
This simply says that, as we zoom in (h ! 0), f (x) looks like a line.
f (x + h) f (x) 0 h 00
• Can use the Taylor series=to f
generate(x) + f (⇠) to f 0 (x), f 00 (x),
approximations
| h
{z } | {z } 2
etc., by evaluating f at x, x ±desired
h, x ± result
2h.
computable
• We then solve for the desired derivative and consider lim h ! 0.

• Truncation error: | h2 f 00 (⇠)| ⇡ h2 f 00 (x) as h ! 0.
Q: Suppose |f 00 (x)|
00 ⇡ 1.
Q: Suppose |f (x)| ⇡ 1.
Can we take h = 10 3030and expect
Can we take h = 10 and expect
taylor_demo.m
f (x + h) f (x) 0 10 30
f (x + h) f (x) f (x)
0
 2 10
?30
h f (x)  2 ?
h
h
| computable
{z } | {z }result
desired |2 {z } error
desired result truncation
computable truncation error
h 00 h 00
• Truncation error: h2 f00 (⇠) ⇡h 2 f
00 (x) as h ! 0.
• Truncation error: 2 f (⇠) ⇡ 2 f (x) as h ! 0.
❑ Recall Taylor series:
h 00
–– To
To(k)be
be precise, (⇠)==hh2ff00 (x)
precise, h2ff00 (⇠) 00
(x)++O(h
O(h2 )
2
) k = 0, . . . , m, then there exists
• If f exists (is 2bounded)2 on [x, x + h],
This simply says
This simply says that,
that,asaswewezoom zoominin(h(h !!0),0),f (x)
f (x) looks
looks likelike a line.
a line.
0 h2 00 hm (m)
f (x + h) = f (x) + hf (x) + f (x) + · · · + f (⇠).
2 m!
• Take m = 2:
• Taylor series
series are
are fundamental
fundamentaltotonumerical
numericalmethods
methods and
and analysis.
analysis.
f (x + h) f (x) h
= f 0 (x) + f 00 (⇠)
| h
{z } | {z } 2
• Newton’s
Newton’s method,
method, optimization
optimization algorithms,
algorithms,
desired result and
andnumerical
numericalsolution of of
solution
computable
di↵erential
di↵erential equations
equationsallallrely
relyon
onunderstanding
understandingthe behavior
the of of
behavior functions
functions
in the neighborhood
neighborhoodofofaaspecific
specificpoint
pointororsetsetofofpoints.
points.
• In essence, numerical methods convert calculus from the continuous back
Q: Suppose |f 00 (x)| ⇡ 1.
to the discrete.
30
• ( A way of avoiding caculus. :) )
f (x + h) f (x) 10 30
f 0 (x)  2 ?
h
Truncation
0 h2 Error
00 Example
hm
f (x + h) = f (x) + hf (x) + f (x) + · · · + f (m)(⇠).
2 m!
• Take m = 2:
• If f (k) exists (is bounded) on [x, x + h], k = 0, . . . , m, then there exists
a ⇠f (x + h)
2 [x, x + h]f (x)
such=that f 0 (x) +
h 00
f (⇠)
| h
{z } | {z } 2
| {z } m
2
desired h
result truncation h
h) = f (x) + hf 0 (x) +
f (x +computable f 00 (x) + · · · +error f (m) (⇠).
2 m!
h 00 h 00
• Take m = 2:
h 00 h 00 2
– Tofbe precise,
(x + h) f
2 f
(x)(⇠) = 2 f (x)0 + O(h ) h
= f (x) + f 00 (⇠)
| h
{z } | {z } 2
• Can use the Taylor series desired
computable resultapproximations to f 0 (x), f 00 (x),
to generate
etc., by evaluating f at x, x ± h, x ± 2h.
• We then solve for the desired derivative and consider lim h ! 0.
Q: Suppose |f 00 (x)| ⇡ 1.
Q: Suppose |f 00 (x)| ⇡ 1.
Can we take h = 10 30 and expect
Can we take h = 10 30 and expect
f (x + h) f (x) 0 10 30
f (x + h) f (x) f 0
(x)  10 230 ?
h f (x)  2 ?
h

• If f (k) exists (is bounded) on [x, x + h], k = 0, . . . , m, then there exists
h2 00
0 hm (m)
f (x + h) = f (x) + hf (x) + f (x) + · · · + f (⇠).
2 m!
• Take m = 2:
f (x + h) f (x) h
= f 0 (x) + f 00 (⇠)
| h
{z } | {z } 2
computable desired result
Q: Suppose |f 00 (x)| ⇡ 1.
30
f (x + h) f (x) 10 30
f 0 (x)  2 ?
h
TaylorTaylor
Series Series
(Very important for SciComp!)
(Very important for SciComp!)
(
•• If
If f k)
f (k) exists
f ( k)
• exists
If (is
(is bounded)
exists onon
[x,[x,
(is bounded)
bounded) x+ x on
+
h],h], k =kx 0,
[x, = 0,
. . ., .m,
+. h], k. ,=m,0,then
then . .therethere
. , m, exists
then
exists there exist
2 [x,
aa ⇠⇠ 2 [x,
a xx⇠ +
+ h]
2 h] such
[x,such that
x + that
h] such that
2h2 2 mhm m
ff (x
(x ++ h)h) =+ fh)
f(x)
(x)=++fhf
hf 0
(x) h 00 h
(x)++·f· ··(x)·+ h
· ++ f· · f· (m)(⇠). h
f (x= 0
(x)(x) +++ 0
hf (x)
2f f(x)
00
+ 00
m!
(m)
(⇠).
+ f (m)(⇠).
2 2 m! m!
• Take •m
mTake
= 2:
= 2: m = 2:
f (x +f (x
h) + h)
f (x (x)f (x)
+f h) f (x) h h00 00
= = 0 f 0 (x)
f=(x) 0+ + f f(⇠)(⇠) h 00
| | {zh {z h } } | {z |} {z }f| {z
(x) 2 2+
} | {z }
f (⇠)
h
|computable
{z } desired result |2 {zerror
}
computable desired result
desired result Truncation
truncation error
computable truncation error
h 00 00 h 00h 00
•• Truncation error: | 2 f h2(⇠)|
Truncation error: ⇡⇡
f (⇠) 2hf 00(x) as has
f (x) h h
! 0.
00 ! 0.
2
Q: Suppose |f 00 (x)| ⇡h 1.00 h 00 2

– To be precise, 2 f (⇠) =
h 200f (x) +
h O(h
00 ) 2
– To be precise,
30 2 f (⇠) = 2 f (x) + O(h )
f (x + h) f (x) 10 30
00
Q: Suppose |f (x)| ⇡ 001. f 0 (x)  2 ?
Q: Suppose |f h (x)| ⇡ 1.
30
f (x + 00 h) f (xf+(x)h) f (x) 0 h 00 h
f (x
uppose |f (x)| ⇡ 1. = + h) f (x) f (x) + 0 f h f 0000 (⇠)
(⇠)
| h
{z } h Truncation
| {z =} Error
= f 0
f| {z(x)} Example
(x)
|2 {z+
+ }2 f (⇠)
|| {z
h desired
{z
30 }} result | {z } 2
Can we take h = computable
computable 10 and expect desired
desired result
truncation
result error
computable
f (x + h) f (x) 0 10 30
h 00 f (x) h 00  ? 0.
• Truncation
•• Truncation herror: f h (⇠) 00 ⇡ f (x)
h 00as 2h !
Truncation error: error:2 || h22 ff 00 (⇠)|
(⇠)|2 ⇡
⇡ h22 ff 00 (x)
(x) as
as hh ! ! 0. 0.
h 00 h 00 2
–
Only ifQ:To
we be precise,
Suppose
can |f
compute 00 f
00 (x)|
2 (⇠)
⇡ =
1.
every f (x) + O(h )
2 term in finite-di↵erence formula
our algorithm) with sufficient accuracy.
30
Can
Can we take h = 10 30 and
we take h = 10 and expect
expect
00
ff (x
(x + + h)
h) ff (x)
(x) 00 10 30
f
f (x)
(x) 
 10
2
30
??
Can we take h = 10 h 30
h and expect 2
f (x + h) f (x) 30
f 0 (x)  102 ?
A:
A: Only
Only if
if we
we can
h compute
can compute every
every term
term in
in finite-di↵erence
finite-di↵erence formula
formula
(our
(our algorithm)
algorithm) with
with sufficient
sufficient accuracy.
accuracy.
A: Only if we can compute every term in finite-di↵erence formula
(our algorithm) with sufficient accuracy.
Example: Finite Difference Approximation

%
!"
"
!"
!%
Round-off Truncation error
!"
error
!$
!"
(/(36+)../.
!#
!"
error
)../.
!&
!"
!!"
!"
!!%
!"
!!$
!" (.0123(,/1+)../. ./014,15+)../.
!!#
!"
!!&
!"
!!# !!$ !!% !!" !& !# !$ !% "
!" !" !" !" !" !" !" !" !"
step'()*+',-)
size, h
Error in finite difference approximation

0 f (x + h) f (x)
f (x) ⇡
h
exhibits tradeoff between rounding error and truncation
error
Truncation error bounded by M h/2, where M bounds
|f 00 (t)| for t near x
Rounding error bounded by 2✏/h, where error in function
values bounded by ✏
p
Total error minimized when h ⇡ 2 ✏/M
Error increases for smaller h because of rounding error
and increases for larger h because of truncation error
❑ Round-Off
The computed approximation to the derivative of a C 2 (twice di↵erentiable) function f has

an absolute error satisfying
⌧
f h 2✏M
eabs := f 0 . |f 00 | + |f |, (1)
x 2 h
where
⌧
f hf (x + h)i hf (x)i
:= (2)
x h
is the one-sided finite di↵erence approximation to f 0 (x) and h·i indicates the truncated float-
ing point approximation to its argument.
Round-o↵ error:
hf (x + h)i = f (x + h)(1 + ✏),
where ✏ is a random variable with |✏|  ✏M .
The error in (1) consists of two parts, the truncation error, T E := h2 |f 00 |, and the round-o↵
error, RE := 2✏hM |f |. For relatively large h, TE dominates, while RE dominates for smaller
h.
In the class notes and in the text we plotted eabs , T E, and RE as a function of h for the case
f (x) = sin(x) and x = 1. We found a minimal realizable error ( ✏M ) and the corresponding
value of h.
Consider now the computed approximation to the second derivative, given by the central-
di↵erence formula,
⌧ 2
f hf (x + h)i 2 hf (x)i + hf (x h)i
:= . (3)
x2 h2

1
%
!"
⇠ hf0 (x)
2
"
!" 2✏M
⇠ f(1x)
Round-off Truncation error
h00
!%
!" ⇠ hf (x) error
2
!$
!"
2 ✏M
(/(36+)../. 1 00
!#
!" ⇠ f (x) ⇠ hf (x)
h 2
error
)../.
!&
!"
2 ✏M
!!"
!" ⇠ f (x)
h
!!%
!"
!!$
!" (.0123(,/1+)../. ./014,15+)../.
!!#
!"
!!&
!"
!!# !!$ !!% !!" !& !# !$ !% "
!" !" !" !" !" !" !" !" !"
step'()*+',-)
size, h
Round-Off Error
❑ In general, round-off error will prevent us from representing f(x)
and f(x+h) with sufficient accuracy to reach such a result.
❑ Round-off is a principal concern in scientific computing.

(Though once you’re aware of it, you generally know how to avoid
it as an issue.)
❑ Round-off results from having finite-precision arithmetic and finite-

precision storage in the computer. (e.g., how would you store 1/3
in a computer?)
❑ Most scientific computing is done either with 32-bit or 64-bit

arithmetic, with 64-bit being predominant. (IEEE 754-2008)
❑ Round-off error is effectively a source of noise (i.e. perturbation)

at every step of a calculation.
Round-Off Error
❑ We are of course all very familiar with round-off error.
❑ When we perform computation with pencil and paper, we

inevitably truncate infinite-decimal results, and propagate
these forward.
Round-off
error
Forward and Backward Error
Suppose we want to compute y = f (x), where f : R ! R,

but obtain approximate value ŷ
Forward error : y = ŷ y
Backward error : x = x̂ x, where f (x̂) = ŷ

Backward Error Analysis
• User wants: y = f (x).

• User wants:• User wants:y = f (x). y = f (x).
= fˆ(x).
• Algorithm produces: ŷ = fˆ(x).
• Algorithm produces:
• Algorithm ŷproduces: ŷ = fˆ(x).
• Algorithm produces: ŷ = fˆ(x).
• User
• Backward error wants:
analysis asks
• Backward error analysis asks
y = f(
• Backward error analysis asks – Is there an x̂ near x such that f (x̂) = ŷ?
• Algorithm
• Backward –error
Is there produces:
an x̂ near
analysis = fˆf(
asks x suchŷthat
– Is there an x̂ near x such that f –(x̂) = ŷ?an x̂ near x such that f (x̂) =
Is there
• Why is this useful?
•• Why
Backward error analysis asks
is this useful?
– Maybe original data known only to tolerance ✏
– Maybe original
an x̂ data
nearknown only
• Why is this useful? • Why is this–useful?
Is there x such th
– In other words,
– Maybe–original
In other data
words,known only to to
– Maybe original data known onlyWe
tocan’t distinguish✏small |x
tolerance errors x̂|.
induced by the
• Why
algorithm fromWeiscan’t
thisdistinguish
acceptablyuseful? small
small errors errors
in the inpu
– In other algorithm
words, from acceptably small
– Maybe original data known
– In other words,
We can’t distinguish small errors indu
algorithmby
We can’t distinguish small errors induced – Infrom
theacceptably
other words, small errors
algorithm from acceptably small errors in the input.
We can’t distinguish small e
algorithm from acceptably sm
Example: Forward and Backward Error
p
As approximation to y = 2, ŷ = 1.4 has absolute forward
error
| y| = |ŷ y| = |1.4 1.41421 . . . | ⇡ 0.0142
or relative forward error of about 1 percent

p
Since 1.96 = 1.4, absolute backward error is
| x| = |x̂ x| = |1.96 2| = 0.04
or relative backward error of 2 percent

Backward Error Analysis
Idea: approximate solution is exact solution to modified

problem
How much must original problem change to give result

actually obtained?
How much data error in input would explain all error in

computed result?
Approximate solution is good if it is exact solution to nearby

problem
Backward error is often easier to estimate than forward

error

Example: Backward Error Analysis
Approximating cosine function f (x) = cos(x) by truncating

Taylor series after two terms gives
ŷ = fˆ(x) = 1 x2 /2
Forward error is given by
y = ŷ y = fˆ(x) f (x) = 1 x2 /2 cos(x)
To determine backward error, need value x̂ such that

f (x̂) = fˆ(x)
For cosine function, x̂ = arccos(fˆ(x)) = arccos(ŷ)

Example, continued
For x = 1,
y = f (1) = cos(1) ⇡ 0.5403

ŷ = fˆ(1) = 1 12 /2 = 0.5
x̂ = arccos(ŷ) = arccos(0.5) ⇡ 1.0472
Forward error: y = ŷ y ⇡ 0.5 0.5403 = 0.0403
Backward error: x = x̂ x ⇡ 1.0472 1 = 0.0472

Relative forward error ~ 8%
Relative backward error ~ 5%

Backward Error Analysis for Finite Difference Example
r wants: y = f (x).
orithm produces: ˆ(x).
ŷ = fproduces:
• Algorithm ŷ = fˆ(x).
kward error analysis asks error analysis asks

• Backward
Is there an x̂ near– xIssuch
therethat
an x̂f near
(x̂) =
x ŷ?
such
x + hthat f (x̂) = ŷ?
x+h
• By Mean Value Theorem, there exists an x̂ 2 [x, x + h] such that

y is this useful?
• Why is this useful?
– Maybe
Maybe original data f (x to
0 original
known only + data
h) fknown
(x)
tolerance ✏ only|xto tolerance
x̂|. ✏
f (x̂) = .
h
In other words,f is–di↵erentiable

(Assuming In other words,
on [x, x + h].)
We can’t distinguishWe can’t
small distinguish
errors small
induced errors induced by th
by the
algorithm
algorithm from acceptably from
small acceptably
errors in the small
input.errors in the i
• Backward error for the finite di↵erence approximation is bounded by |h|,
assuming no round-o↵ error.
Sensitivity and Conditioning
Problem is insensitive, or well-conditioned, if relative

change in input causes similar relative change in solution
Problem is sensitive, or ill-conditioned, if relative change in

solution can be much larger than that in input data
Condition number :
|relative change in solution|
cond =
|relative change in input data|
|[f (x̂) f (x)]/f (x)| | y/y|

= =
|(x̂ x)/x| | x/x|
Problem is sensitive, or ill-conditioned, if cond 1

Note About Condition Number
❑ It’s tempting to say that a large condition number indicates that a

small change in the input implies a large change in the output.
❑ However, to be dimensionally correct, we need to be more precise.
❑ A large condition number indicates that a small relative change in

input implies a large relative change in the output:
Condition Number
Condition number is amplification factor relating relative

forward error to relative backward error
relative relative
= cond ⇥
forward error backward error
Condition number usually is not known exactly and may

vary with input, so rough estimate or upper bound is used
for cond, yielding
relative relative
/ cond ⇥
forward error backward error

Example: Evaluating Function
Evaluating function f for approximate input x̂ = x + x

instead of true input x gives
Absolute forward error: f (x + x) f (x) ⇡ f 0 (x) x
f (x + x) f (x) f 0 (x) x
Relative forward error: ⇡
f (x) f (x)
f 0 (x) x/f (x) xf 0 (x)
Condition number: cond ⇡ =
x/x f (x)
Relative error in function value can be much larger or

smaller than that in input, depending on particular f and x

Example: Sensitivity
Tangent function is sensitive for arguments near ⇡/2

tan(1.57079) ⇡ 1.58058 ⇥ 105
tan(1.57078) ⇡ 6.12490 ⇥ 104
Relative change in output is quarter million times greater

than relative change in input
For x = 1.57079, cond ⇡ 2.48275 ⇥ 105

Condition Number Examples
❑ Q: In our finite difference example, where did things go wrong?
x f 0 (x)
Using the formula, cond = , what is
f (x)
the condition number of the following?
f (x) = a x
a
f (x) =
x
f (x) = a + x
Condition Number Examples
x f 0 (x)
cond = ,
f (x)
xa
For f (x) = ax, f 0 = a, cond = ax = 1.
a 0 2 x x2
a
For f (x) = , f = ax , cond = a = 1.
x x
x·1 |x|
For f (x) = a + x, f 0 = 1, cond = a+x = |a+x| .
• The condition number for (a + x) is <1 if a and x are of the same sign,
but it is >1 if they are of opposite sign, and potentially 1 if the are
of opposite sign but close to the same magnitude.
This ill-conditioning is often referred to as cancellation.
• Subtraction of two positive (or negative) values of nearly the same

magnitude is ill-conditioned.
• The condition number for (a + x) is <1 if a and x are of the same sign,
Taylor Series
but it is >1 Condition
(Very
if they Numbersign,
are of important
opposite Examples
for
andSciComp!)
potentially 1 if the are
of opposite sign but close to the same magnitude.
(
• If f
• Subtractionk)ofexists (is bounded)
two positive on values
(or negative) [x, x of
+ nearly
h], k the
= 0,same
. . . , m,
a ⇠ 2 is[x,ill-conditioned.
magnitude x + h] such that
• Multiplication and division are benign.
0 h2 00 hm
f (x + h) = f (x) + hf (x) + f (x) + · · · +
• Addition of two positive (or negative) values is2 also OK. m!
• In our finite di↵erence example, the culprit is the subtraction, more
than the division
• Take m = 2: by a small number.
f (x + h) f (x) 0 h 00
= f (x) + f (⇠).
h 2
Stability
Algorithm is stable if result produced is relatively

insensitive to perturbations during computation
Stability of algorithms is analogous to conditioning of

problems
From point of view of backward error analysis, algorithm is

stable if result produced is exact solution to nearby
problem
For stable algorithm, effect of computational error is no

worse than effect of small data error in input

Accuracy
Accuracy : closeness of computed solution to true solution

of problem
Stability alone does not guarantee accurate results
Accuracy depends on conditioning of problem as well as

stability of algorithm
Inaccuracy can result from applying stable algorithm to

ill-conditioned problem or unstable algorithm to
well-conditioned problem
Applying stable algorithm to well-conditioned problem

yields accurate solution

Examples of Potentially Unstable Algorithms
❑ Examples of potentially unstable algorithms include
❑ Gaussian elimination without pivoting
❑ Using the normal equations to solve linear least squares problems
❑ High-order polynomial interpolation with unstable bases (e.g.,

monomials or Lagrange polynomials on uniformly spaced nodes)
agnitude is ill-conditioned.
Multiplication and division are
Unavoidable benign.
Source of Noise in the Input
ddition of two positive (or negative) values is also OK.
❑ Numbers in the computer are represented in finite precision.
n our finite di↵erence example, the culprit is the subtraction, more
Therefore,
han the❑division byunless ournumber.
a small set of input numbers, x, are perfectly
representable in the given mantissa, we already have an error
and our actual input is thus
x̂ = x + x
❑ The next topic discusses the set of representable numbers.
❑ We’ll sometimes refer to this set of “floating point numbers” as F.
❑ We’ll primarily be concerned with two things –

❑ the relative precision,
❑ the maximum absolute value representable.
Floating-Point Numbers
• Floating-point number system is characterized by four integers
base or radix
p precision
(L, U ) exponent range
• Number is represented as
✓ ◆
d1 d2 dp 1 E
x = ± d0 + + 2 + · · · p 1
– Here, we have p digits, d0 , d1 , . . . , dp 1 .

– Each digit is in the range 0  di  1.
– Representations are typically normalized, meaning that d0 6= 0.
– The exponent is in the interval L  E  U.
• Modern computers use base = 2 (i.e., binary representation).

Floating-Point Number Examples
• Base 10 x = ±1.2345678901234567 ⇥ 10±9
• Base 2 x = ±1.0101010101010101 ⇥ 2±1001 (⇥2±9 )
• Base 16 x = ±1.23456789abcdef 01 ⇥ 16±9

Floating-Point Numbers, continued
• Portions of floating-point numbers are designated as
– exponent: E
– mantissa: d0 d1 · · · dp 1
– fraction: d1 d2 · · · dp 1
• Sign, exponent, and mantissa are stored in separate

fixed-width fields of each floating-point word.
Approximations
Floating-Point Arithmetic
Computer Arithmetic
Typical Floating-Point Systems
Parameters for typical floating-point systems

system p L U
IEEE SP 2 24 126 127
IEEE DP 2 53 1022 1023
Cray 2 48 16383 16384
HP calculator 10 12 499 499
IBM mainframe 16 6 64 63
Most modern computers use binary ( = 2) arithmetic
IEEE floating-point systems are now almost universal in

digital computers

Approximations
Computer Arithmetic
Normalization
Floating-point system is normalized if leading digit d0 is

always nonzero unless number represented is zero
In normalized systems, mantissa m of nonzero

floating-point number always satisfies 1  m <
Reasons for normalization
representation of each number unique
no digits wasted on leading zeros
leading bit need not be stored (in binary system)

Normalized Mantissa Examples
❑ Decimal:
❑ 1.2345
❑ 9.814 x 10-2
❑ Binary:
Not normalized:
❑ 1.010 x 2-3
.00101
❑ 1.111 x 24
Binary Representation of ⇡
• In 64-bit floating point,
⇡64 ⇡ 1.1001001000011111101101010100010001000010110100011 ⇥ 21
• In reality,
⇡ = 1.10010010000111111011010101000100010000101101000110000100011010 · · · ⇥ 21
• They will (potentially) di↵er in the 53rd bit...
⇡ ⇡64 = 0.00000000000000000000000000000000000000000000000000000100011010 · · · ⇥ 21
Rounding Error
Approximations
– mantissa: d0 d1 · · · dp
Computer Arithmetic
1
Properties of Floating-Point Systems
– fraction: d1 d2 · · · dp 1
Floating-point number system is finite and discrete
• Sign, exponent,
Total number of normalized and
floating-point mantissa
numbers is are store
fixed-width fields of each floating-point
2( 1) p 1 (U L + 1) + 1
⇡ 2w , where w=number of bits in a word
Smallest positive normalized number: UFL =
L
Largest floating-point number: OFL = U +1 (1 p)
Floating-point numbers equally spaced only between

successive powers of
Not all real numbers exactly representable; those that are

are called machine numbers
Approximations
Computer Arithmetic
Example: Floating-Point System
Tick marks indicate all 25 numbers in floating-point system

having = 2, p = 3, L = 1, and U = 1
OFL = (1.11)2 ⇥ 21 = (3.5)10
1
UFL = (1.00)2 ⇥ 2 = (0.5)10
At sufficiently high magnification, all normalized

floating-point systems look grainy and unequally spaced

Example: numbers.m

Approximations
Computer Arithmetic
Rounding Rules
If real number x is not exactly representable, then it is
approximated by “nearby” floating-point number fl(x)
This process is called rounding, and error introduced is
called rounding error
Two commonly used rounding rules
chop : truncate base- expansion of x after (p 1)st digit;
also called round toward zero
round to nearest : fl(x) is nearest floating-point number to
x, using floating-point number whose last stored digit is
even in case of tie; also called round to even
Round to nearest is most accurate, and is default rounding
rule in IEEE systems
Approximations
Computer Arithmetic
Machine Precision
Accuracy of floating-point system characterized by unit

roundoff (or machine precision or machine epsilon)
denoted by ✏mach
With rounding by chopping, ✏mach = 1 p
With rounding to nearest, ✏mach = 1

2
1 p
Alternative definition is smallest number ✏ such that

fl(1 + ✏) > 1
Maximum relative error in representing real number x

within range of floating-point system is given by
fl(x) x
 ✏mach
x

Rounded Numbers in Floating Point Representation
❑ The relationship on the preceding slide,
can be conveniently thought of as:
fl(x) = x (1 + ✏x)
|✏x|  ✏mach
❑ The nice thing is the expression above has an equality, which is

easier to work with.
Approximations
Computer Arithmetic
Machine Precision, continued
For toy system illustrated earlier

✏mach = (0.01)2 = (0.25)10 with rounding by chopping
✏mach = (0.001)2 = (0.125)10 with rounding to nearest
For IEEE floating-point systems

✏mach = 2 24
⇡ 10 7
in single precision
✏mach = 2 53
⇡ 10 16
in double precision
So IEEE single and double precision systems have about 7

and 16 decimal digits of precision, respectively

Advantage of Floating Point
❑ By sacrificing a few bits to the exponent, floating point allows us

to represent a Huge range of numbers….
❑ All numbers have same relative precision.

❑ The numbers are not uniformly spaced.
❑ Many more between 0 and 10 than between 10 and 100!
Relative Precision Example
Let’s look at the highlighted entry from the preceding slide.
x = 3141592653589793238462643383279502884197169399375105820974944.9230781... = ⇡ ⇥ 1060
x̂ = 3141592653589793000000000000000000000000000000000000000000000.0000000... ⇡ ⇡ ⇥ 1060
x x̂ = 238462643383279502884197169399375105820974944.9230781... = 2.3846... ⇥ 1044

16
⇡ .7590501687441757 ⇥ 10 ⇥x
< 1.110223024625157e 16 ⇥ x
⇡ ✏mach ⇥ x.
• The di↵erence between x := ⇡ ⇥ 1060 and x̂ := fl(⇡ ⇥ 1060 ) is large:
x x̂ ⇡ 2.4 ⇥ 1044 .
• The relative error, however, is

x x̂ 2.4 ⇥ 1044 16
⇡ ⇡ 0.8 ⇥ 10 < ✏mach
x ⇡ ⇥ 1060
Approximations
Computer Arithmetic
Though both are “small,” unit roundoff ✏mach should not be

confused with underflow level UFL
Unit roundoff ✏mach is determined by number of digits in

mantissa of floating-point system, whereas underflow level
UFL is determined by number of digits in exponent field
In all practical floating-point systems,
0 < UFL⌧< ✏✏mach

0 <UFL mach <⌧ OFL
OFL

Approximations
Computer Arithmetic
0 < UFL ⌧ ✏mach ⌧ OFL

Though both are “small,” unit roundoff ✏mach should not be
confused
• Despite with underflow
the remarkably level UFL
small magnitude of UFL, there is a
(superfluous) mechanism to stu↵ more numbers between UFL and 0.
Unit roundoff ✏mach is determined by number of digits in
• They are called denormalized numbers (leading bit 6= 1).
mantissa of floating-point system, whereas underflow level
is determined
UFL(fast)
• Most floating-pointby number
hardware of digits
does in exponent
not support them and field
operations with them are thousands of times slower than without.
In all practical floating-point systems,
• Practically, any denormalized number ⇡ 0, so best not used.
0 < UFL⌧< ✏✏mach
0 <UFL mach <⌧ OFL
OFL

Summary of Ranges for IEEE Double Precision
p 53 16
p = 53 ✏mach = 2 = 2 ⇡ 10
p 53
p = 53
L ✏mach
1022 = 2 308= 2 ⇡ 10
L = 1022 UF L = 2 = 2 ⇡ 10
L 1022
L = 1022U U F L
1023 = 2 308= 2 ⇡ 10
U = 1023 OF L ⇡ 2 = 2 ⇡ 10
U = 1023 OF L ⇡ 2U = 21023 ⇡ 1030
Q: How many atoms in the Universe?

Q: How many atoms in the Universe?
Q: Howmany
Q: How manyatoms
positive fp64
in the numbers < 1?
Universe?
Approximations
Computer Arithmetic
Subnormals and Gradual Underflow

Normalization causes gap around zero in floating-point
system
If leading digits are allowed to be zero, but only when
exponent is at its minimum value, then gap is “filled in” by
additional subnormal or denormalized floating-point
numbers
Subnormals extend range of magnitudes representable,

but have less precision than normalized numbers, and unit
roundoff is no smaller
Augmented system exhibits gradual underflow
Denormalizing: normal(ized) and subnormal numbers
❑ With normalization, the smallest (positive) number you can represent is:
❑ UFL = 1.00000… x 2L = 1. x 2-1022 ~= 10-308
❑ With subnormal numbers you can represent:

❑ x = 0.00000…01 x 2L = 1. x 2-1022-53 ~= 10-324
❑ Q: Would you want to denormalize??
❑ Cost: Often, subnormal arithmetic handled in software à slooooow (1000X).

❑ Number of atoms in universe: ~ 1080
❑ Probably, UFL is small enough.
❑ Similarly, for IEEE DP, OFL ~ 10308 >> number of atoms in universe.
à Overflow will never be an issue (unless your solution goes unstable).
Approximations
Computer Arithmetic
Exceptional Values
IEEE floating-point standard provides special values to

indicate two exceptional situations
Inf, which stands for “infinity,” results from dividing a finite
number by zero, such as 1/0
NaN, which stands for “not a number,” results from
undefined or indeterminate operations such as 0/0, 0 ⇤ Inf,
or Inf/Inf
Inf and NaN are implemented in IEEE arithmetic through

special reserved values of exponent field
• Note: 0 is also a special number --- it is not normalized.

Approximations
Computer Arithmetic
Addition or subtraction : Shifting of mantissa to make

exponents match may cause loss of some digits of smaller
number, possibly all of them
Multiplication : Product of two p-digit mantissas contains up

to 2p digits, so result may not be representable
Division : Quotient of two p-digit mantissas may contain

more than p digits, such as nonterminating binary
expansion of 1/10
Result of floating-point arithmetic operation may differ from

result of corresponding real arithmetic operation on same
operands

Approximations
Computer Arithmetic
Example: Floating-Point Arithmetic
Assume = 10, p = 6
Let x = 1.92403 ⇥ 102 , y = 6.35782 ⇥ 10 1
Floating-point addition gives x + y = 1.93039 ⇥ 102 ,

assuming rounding to nearest
Last two digits of y do not affect result, and with even

smaller exponent, y could have had no effect on result
Floating-point multiplication gives x ⇤ y = 1.22326 ⇥ 102 ,

which discards half of digits of true product

Approximations
Computer Arithmetic
Floating-Point Arithmetic, continued
Real result may also fail to be representable because its

exponent is beyond available range
Overflow is usually more serious than underflow because

there is no good approximation to arbitrarily large
magnitudes in floating-point system, whereas zero is often
reasonable approximation for arbitrarily small magnitudes
On many computer systems overflow is fatal, but an

underflow may be silently set to zero

Approximations
Computer Arithmetic
Example: Summing Series

Infinite series
1
X 1
n
n=1
has finite sum in floating-point arithmetic even though real
series is divergent
Possible explanations
Partial sum eventually overflows
1/n eventually underflows
Partial sum ceases to change once 1/n becomes negligible
relative to partial sum
X1 n 1
1
< ✏mach
n k
k=1
Q: How long would it<take to realize

interactive failure?
example >
x=1
Standard Model for Floating Point Arithmetic
• Ideally, for x, y 2 F , x flop y = fl(x op y), with op = +, -, / , *.

• This standard met by IEEE.
• Analysis is streamlined using the Standard Model:
fl(x op y) = (x op y)(1 + ), | |  ✏mach ,
which is more conveniently analyzed by backward error analysis.

• For example, with op = +,
fl(x + y) = (x + y)(1 + ) = x(1 + ) + y(1 + ).
• With this type of analysis, we can examine, say, floating-point multipli

cation.
x(1 + x) · y(1 + y) = x · y(1 + x + y + x · y) ⇡ x · y(1 + x + y)

Approximations
Computer Arithmetic
Floating-Point Arithmetic, continued
Ideally, x flop y = fl(x op y), i.e., floating-point arithmetic

operations produce correctly rounded results
Computers satisfying IEEE floating-point standard achieve

this ideal as long as x op y is within range of floating-point
system
But some familiar laws of real arithmetic are not

necessarily valid in floating-point system
Floating-point addition and multiplication are commutative

but not associative
Example: if ✏ is positive floating-point number slightly

smaller than ✏mach , then (1 + ✏) + ✏ = 1, but 1 + (✏ + ✏) > 1

Approximations
Computer Arithmetic
Cancellation
Subtraction between two p-digit numbers having same sign

and similar magnitudes yields result with fewer than p
digits, so it is usually exactly representable
Reason is that leading digits of two numbers cancel (i.e.,

their difference is zero)
For example,
1.92403 ⇥ 102 1.92275 ⇥ 102 = 1.28000 ⇥ 10 1
which is correct, and exactly representable, but has only

three significant digits

Cancellation Example
❑ Cancellation leads to promotion of garbage into “significant” digits
x = 1 . 0 1 1 0 0 1 0 1 b b g g g g e
y = 1 . 0 1 1 0 0 1 0 1 b0 b0 g g g g e
x y = 0 . 0 0 0 0 0 0 0 0 b00 b00 g g g g e
= b00 . b00 g g g g ? ? ? ? ? ? ? ? ? e 9
Approximations
Computer Arithmetic
Cancellation, continued
Despite exactness of result, cancellation often implies

serious loss of information
Operands are often uncertain due to rounding or other

previous errors, so relative uncertainty in difference may be
large

smaller than ✏mach , then (1 + ✏) (1 ✏) = 1 1 = 0 in
floating-point arithmetic, which is correct for actual
operands of final subtraction, but true result of overall
computation, 2✏, has been completely lost
Subtraction itself is not at fault: it merely signals loss of

information that had already occurred
Approximations
Computer Arithmetic
Despite exactness of result, cancellation often implies

serious loss of information
Operands are often uncertain due to rounding or other

previous errors, so relative uncertainty in difference may be
large

smaller than ✏mach , then (1 + ✏) (1 ✏) = 1 1 = 0 in
floating-point arithmetic, which is correct for actual
operands of final subtraction, but true result of overall
computation, 2✏, has been completely lost
• Of the basic operations,
Subtraction itself is not+at
- *fault:
/ , with arguments
it merely of theloss
signals sameof sign,
only subtraction has cond. number significantly different from
information that had already occurred
unity. Division, multiplication, addition (same sign) are OK.
Approximations
Computer Arithmetic
Digits lost to cancellation are most significant, leading

digits, whereas digits lost in rounding are least significant,
trailing digits
Because of this effect, it is generally bad idea to compute

any small quantity as difference of large quantities, since
rounding error is likely to dominate result
For example, summing alternating series, such as
x 2 x 3
ex = 1 + x + + + ···
2! 3!
for x < 0, may give disastrous results due to catastrophic
cancellation

Approximations
Computer Arithmetic
Example: Quadratic Formula

Two solutions of quadratic equation ax2 + bx + c = 0 are
given by p
b ± b2 4ac
x=
2a
Naive use of formula can suffer overflow, or underflow, or
severe cancellation
Rescaling coefficients avoids overflow or harmful underflow
Cancellation between b and square root can be avoided
by computing one root using alternative formula
2c
x= p
b ⌥ b2 4ac
Cancellation inside square root cannot be easily avoided
without using higher precision
Approximations
Computer Arithmetic
Example: Standard Deviation

Mean and standard deviation of sequence xi , i = 1, . . . , n,
are given by
n
" n
#1
1X 1 X 2
x̄ = xi and = (xi x̄)2
n n 1
i=1 i=1
Mathematically equivalent formula

" n
!# 1
1 X 2
= x2i nx̄2
n 1
i=1
avoids making two passes through data

Single cancellation at end of one-pass formula is more
damaging numerically than all cancellations in two-pass
formula combined
f (x + h)(1 + ✏+ ) f (x)(1 + ✏0 )
=
h
Finite Di↵erences and Truncation/Round-O↵
f (x + h) f (x) f (xError
+ h)✏+ f (x)✏0
= + ,
• Taylor Series:1 h h
f ✏+ , f✏(x
0 random
+ h) variables of0 arbitrary sign
f (x) h 00with magnitude  ✏M .
:= = f (x) + f (⇠)
x | h
{z } | {z } 2
| {z }
• This computed
computable expression
desiredignores
result round-o↵ in x, h, x + h and division.
truncation error
Can show (by Taylor series expansions) that those errors are benign.
f df
• So, •
expect e :=
Use f (x + h)
abs x = f (x)
dx + hf 0!
= O(h) (⇠)0 to
as h ! 0.
write the first round-o↵ term as:
0
• Computed valuefof(x fx+ h)✏+standard(fmodel,
, using (x) + hf (⇠))✏+ f (x)✏+ f (x)✏+
= = + f 0 (⇠)✏+ ⇡ .
⌧ h h h h
=
x h
• So, computed approximation
f (x + h)(1 + ✏+ )is f (x)(1 + ✏0 )
=
⌧ h
f f (x + h) +f (x)✏0 f (x + h)✏+
f ✏ f (x)✏0 0 h 00 ✏M
=
= + +f (x) + O(✏+ ) ⇡ ,f (x) + f (⇠) + R f (x).
x x h h h 2 h
– ✏+ , ✏0 random variables of arbitrary sign with magnitude  ✏M .
where |R| < 2.
– This computed expression ignores round-o↵ in x, h, x + h and division.
– Can show (by Taylor series expansions) that those errors are benign.
1
Assuming f , f 0 and f 00 bounded on [x, x + h].
where R is a random variable with |R| < 2.
•• So,
Use computed f (x) + hf 0 (⇠) to
f (x + h) =approximation is write the first round-o↵ term as: 2
⌧
f (x + h)✏+ f (x) + hf 0f(⇠))✏+ ✏+ f✏(x)✏
(f 0 + 0 f (x)✏+
= = + = f (x) +
+ fO(✏
(⇠)✏
+ )
+ ⇡ .
h x h x h h h
h 00 ✏M
• Round-o↵ error is thus ⇡ f 0 (x) + f (x) + R f (x) .
|2 {z } | h{z }
f (x + h)✏+ f (x)✏0 ✏+ ✏0 TE ✏+
RE ✏0 ✏M
= f (x) + ✏+ f 0 (⇠) ⇡ f (x) = f (x) R,
h h h h
where R is a random variable with |R| < 2.
• So, computed approximation is

⌧
f f ✏+ ✏0
= + f (x) + O(✏+ )
x x h
h 00 ✏M
⇡ f 0 (x) + f (x) + R f (x) .
2
| {z } h
| {z }
TE RE
• Notice, crucially, that the units of each term on the right match.
• This is a good way to check your work.
2
The unknown value of ⇠ is di↵erent from the one in the original Taylor series expansion.
f (x + h)(1 + ✏+ ) f (x)(1 + ✏0 )
=
h
Finite Di↵erences and
f (xTruncation/Round-O↵
+ h) f (x) Error
f (x + h)✏+ f (x)✏0
= + ,
• Taylor Series: 1 h h
✏+ , ✏0f random
f (x +variables
h) f (x) of arbitrary
0
sign with hmagnitude
00
 ✏M .
:= = f (x) + f (⇠)
x | h
{z } | {z } 2
| {z }
• This computed expression
computable ignores
desired round-o↵
result in x, h,error
truncation x+h
and division.
Can show (by Taylor series expansions) that those errors are benign.
f df
• •
Use f (x + h) = f (x) + hf=0 (⇠)
So, expect e abs := x dx O(h) ! 0 asthe
to write ! 0. round-o↵
h first term as:
f (xvalue
+ h)✏ 0
• Computed of+ fx , using (f (x) + hf (⇠))✏+ f (x)✏+ f (x)✏+
= standard model, = + f 0 (⇠)✏+ ⇡ .
h⌧ h h h
=
x h
• So, computed approximation is + ✏+ ) f (x)(1 + ✏0 )
f (x + h)(1
=
⌧ h
f f ✏
f (x ✏
+ + h) 0 f (x) f (x + h)✏+ 0f (x)✏0 h 00 ✏M
= =+ f (x) + + O(✏+ ) ⇡ f (x) + , f (⇠) + R f (x).
x x h h h 2 h
– ✏+ , ✏0 random variables of arbitrary sign with magnitude  ✏M .
where |R| < 2.
– This computed expression ignores round-o↵ in x, h, x + h and division.
– Can show (by Taylor series expansions) that those errors are benign.
1
Assuming f , f 0 and f 00 bounded on [x, x + h].

Numerical Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical Analysis

Uploaded by

Copyright:

Available Formats

CS450/ECE 491: Introduction to Scientific Computing

❑ Quizzes, homework, lecture notes, links to recorded lectures

❑ Homework and quiz submissions will also be submitted on the

❑ Exams (2 midterms and a final) will be at the CBTF.

Contact us by email if you

❑ To the extent possible, notation and slides will be

❑ The lectures will be designed to expand on and

❑ We will use a combination of the text slides plus

❑ Slides will be posted on the Relate page.

❑ Lectures will be recorded.

• What is scientific computing?

(See Fig. 11.1 in text.)

o Temperature distribution in hot + cold mixing at T-junction

❑ Related course material

❑ Related course material

• Floating point numbers: IEEE 64

Problem is well-posed if solution

Otherwise, problem is ill-posed

Even if problem is well posed, solution may still be

Computational algorithm should not make sensitivity worse

Michael T. Heath Scientific Computing 4 / 46

Replace difficult problem by easier one having same or

Solution obtained may only approximate that of original

Michael T. Heath Scientific Computing 5 / 46

Accuracy of final result reflects all these Conditioning!

Uncertainty in input may be amplified by problem

Perturbations during computation may be amplified by

Michael T. Heath Scientific Computing 6 / 46

Computing surface area of Earth using formula A = 4⇡r2

Michael T. Heath Scientific Computing 7 / 46

Absolute Error and Relative Error

Absolute error : approximate value true value

Equivalently, approx value = (true value) ⇥ (1 + rel error)

True value usually unknown, so we estimate or bound

Relative error often taken relative to approximate value,

Michael T. Heath Scientific Computing 8 / 46

Data Error and Computational Error

Typical problem: compute value of function f : R ! R for

Total error: fˆ(x̂) f (x) =

fˆ(x̂) f (x̂) + f (x̂) f (x)

Algorithm has no effect on propagated data error

Michael T. Heath Scientific Computing 9 / 46

Data Error and Computational Error

Typical problem: compute value of function f : R ! R for

Total error: || fˆ(x̂) f (x)||=

|| fˆ(x̂) f (x̂)|| + || f (x̂) f (x) ||

Algorithm has no effect on propagated data error

Michael T. Heath Scientific Computing 9 / 46

Truncation Error and Rounding Error

< interactive example >

Michael T. Heath Scientific Computing 10 / 46

Truncation Error Example

• We then solve for the desired derivative and consider lim h ! 0.

❑ Recall Taylor series:

• Truncation error: | h2 f 00 (⇠)| ⇡ h2 f 00 (x) as h ! 0.

Q: Suppose |f 00 (x)| ⇡h 1.00 h 00 2

Example: Finite Difference Approximation

Example: Finite Difference Approximation

Error in finite difference approximation

The computed approximation to the derivative of a C 2 (twice di↵erentiable) function f has

Example: Finite Difference Approximation

❑ Round-off is a principal concern in scientific computing.

❑ Round-off results from having finite-precision arithmetic and finite-

❑ Most scientific computing is done either with 32-bit or 64-bit