02 ErrorAnalysis

Numerical Algorithms
Samir Moustafa
University of Vienna
October 20, 2022
Samir Moustafa (University of Vienna) Numerical Algorithms

Content
▶ Review
▶ Machine Epsilon
▶ Standard Model of Floating Point Arithmetic
▶ Forward- and Backward Error Analysis for Inner Product

Review: Machine Epsilon
▶ The accuracy of floating point systems is characterized by the

machine epsilon εm
▶ With rounding by truncation, εm = β 1−p
▶ With rounding to nearest, εm = 12 β 1−p
* β: bases, and p: mantissa length
▶ The maximum relative error in representing x ∈ R within the range of
a given floating point system is

fl(x) − x
≤ εm

x

▶ Equivalently,
fl(x) = x(1 + δ) , |δ| ≤ εm

Review: Machine Epsilon
An alternative definition:
▶ εm is the smallest number ϵ satisfying fl(1 + ϵ) > 1
where fl : R → M is the mapping into a given floating point system

Review: IEEE Standard for Floating-Point Arithmetic
▶ Single Precision: 32 bit word
▶ 1 bit sign
▶ 8 bit exponent
▶ 23 bit mantissa ⇒ εm = 2−23 ≈ 1.2 · 10−7
▶ Double Precision: 64 bit word
▶ 1 bit sign
▶ 11 bit exponent
▶ 52 bit mantissa ⇒ εm = 2−52 ≈ 2.2 · 10−16
▶ Half Precision: 16 bit word
▶ 1 bit sign
▶ 5 bit exponent
▶ 10 bit mantissa ⇒ εm = 2−10 ≈ 9.8 · 10−4
▶ bfloat16 : 16 bit word

(e. g., Intel AI accelerators, Intel FPGAs, Google Cloud TPUs, etc.)
▶ 1 bit sign, 8 bit exponent, 7 bit mantissa
⇒ εm = 2−7 ≈ 7.8 · 10−3
Standard Model of Floating Point Arithmetic
▶ For performing rounding error analysis, some assumptions have to be

made about the accuracy of basic floating point operations
▶ These are embodied in the following model:
fl(x op y) = (x op y)(1 + δ) , |δ| ≤ εm
with x, y ∈ M and op = +, −, ·, /
▶ “The computed value of x op y is “as good as”

the rounded exact answer”
▶ This is consistent with the IEEE floating-point standard

Computing Inner Products
▶ Consider the standard inner product, s = x⊤ y with x, y ∈ Rn
▶ We want to derive error bounds for a computed ŝ, using the model
introduced
▶ The order of evaluation is important for the analysis (and may also
affect the result) → assume it to be from left to right
▶ Let si = x1 y1 + x2 y2 + . . . + xi yi be the i-th partial sum

▶ It follows that
ŝ1 = fl(x1 y1 ) = x1 y1 (1 + δ1 )
ŝ2 = fl(ŝ1 + x2 y2 ) = (ŝ1 + x2 y2 (1 + δ2 ))(1 + δ3 )
= (x1 y1 (1 + δ1 ) + x2 y2 (1 + δ2 ))(1 + δ3 )
= x1 y1 (1 + δ1 )(1 + δ3 ) + x2 y2 (1 + δ2 )(1 + δ3 )
with |δi | ≤ εm , i ∈ {1, 2, 3}

▶ We can drop the distinction between the δi and write
1 + δi ≡ 1 ± δ
with 0 ≤ δ ≤ εm

▶ Therefore,
ŝ3 = fl(ŝ2 + x3 y3 ) = (ŝ2 + x3 y3 (1 ± δ)) (1 ± δ)

= x1 y1 (1 ± δ)3 + x2 y2 (1 ± δ)3 + x3 y3 (1 ± δ)2
▶ Overall, we have
ŝn = x1 y1 (1 ± δ)n + x2 y2 (1 ± δ)n + x3 y3 (1 ± δ)n−1

+ . . . + xn yn (1 ± δ)2
▶ How can we further simplify this?

For further simplification, we can use the following lemma:
▶ Lemma: If |δi | ≤ εm and ρi = ±1 for i ∈ {1, . . . , n} and nεm < 1,

then
Yn
(1 + δi )ρi = 1 + θn ,
i=1
where
nεm
|θn | ≤ =: γn
1 − nεm
▶ Note the assumptions!

▶ Applying this lemma to
ŝn = x1 y1 (1 ± δ)n + x2 y2 (1 ± δ)n + x3 y3 (1 ± δ)n−1

+ . . . + xn yn (1 ± δ)2 ,
we obtain the following backward error result:
ŝn = x1 y1 (1 + θn ) + x2 y2 (1 + θn′ ) + x3 y3 (1 + θn−1 )

+ . . . + xn yn (1 + θ2 )
▶ The computed inner product is exact for a set of perturbed input

data: (y1 (1 + θn ), y2 (1 + θn′ ), . . . , yn (1 + θ2 )) and (x1 , x2 , . . . , xn )
▶ Each relative perturbation is bounded by γn = nεm

1−nεm
▶ Note: We could also perturb the xi and leave the yi untouched

▶ More generally, for any order of evaluation, we have
fl(x⊤ y) = (x + ∆x)⊤ y = x⊤ (y + ∆y)
with |∆x| ≤ γn |x| , |∆y| ≤ γn |y|
where “|x|” denotes the vector with elements |xi | and inequalities
between vectors hold componentwise.
▶ This yields an absolute forward error bound
n
X
|x⊤ y − fl(x⊤ y)| ≤ γn |xi yi | = γn |x|⊤ |y|
i=1

▶ We get the relative forward error bound
|x⊤ y − fl(x⊤ y)| |x|⊤ |y|

⊤
≤ γn ⊤
|x y| |x y|
▶ For y = x we have [Why?]
|x⊤ x − fl(x⊤ x)|

≤ γn ,
|x⊤ x|
so high relative accuracy is always obtained.

▶ In general, however, high accuracy is not guaranteed if
|x⊤ y| ≪ |x|⊤ |y|

0.001
10-6
10-9
10-12
10-15
0 2000 4000 6000 8000 10 000

Concluding remarks:
▶ Constants in the error bounds can be reduced by specific

implementation strategies for computing the inner product
▶ Computation of the outer product xy T is not backward stable

02 ErrorAnalysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 ErrorAnalysis

Uploaded by

Copyright:

Available Formats

Numerical Algorithms

October 20, 2022

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ Forward- and Backward Error Analysis for Inner Product

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ The accuracy of floating point systems is characterized by the

Samir Moustafa (University of Vienna) Numerical Algorithms

where fl : R → M is the mapping into a given floating point system

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ bfloat16 : 16 bit word

▶ For performing rounding error analysis, some assumptions have to be

fl(x op y) = (x op y)(1 + δ) , |δ| ≤ εm

▶ “The computed value of x op y is “as good as”

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ Consider the standard inner product, s = x⊤ y with x, y ∈ Rn

▶ Let si = x1 y1 + x2 y2 + . . . + xi yi be the i-th partial sum

Samir Moustafa (University of Vienna) Numerical Algorithms

with |δi | ≤ εm , i ∈ {1, 2, 3}

Samir Moustafa (University of Vienna) Numerical Algorithms

ŝ3 = fl(ŝ2 + x3 y3 ) = (ŝ2 + x3 y3 (1 ± δ)) (1 ± δ)

ŝn = x1 y1 (1 ± δ)n + x2 y2 (1 ± δ)n + x3 y3 (1 ± δ)n−1

▶ How can we further simplify this?

Samir Moustafa (University of Vienna) Numerical Algorithms

For further simplification, we can use the following lemma:

▶ Lemma: If |δi | ≤ εm and ρi = ±1 for i ∈ {1, . . . , n} and nεm < 1,

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ Applying this lemma to

ŝn = x1 y1 (1 ± δ)n + x2 y2 (1 ± δ)n + x3 y3 (1 ± δ)n−1

we obtain the following backward error result:

ŝn = x1 y1 (1 + θn ) + x2 y2 (1 + θn′ ) + x3 y3 (1 + θn−1 )

▶ The computed inner product is exact for a set of perturbed input

▶ Each relative perturbation is bounded by γn = nεm

▶ Note: We could also perturb the xi and leave the yi untouched

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ More generally, for any order of evaluation, we have

fl(x⊤ y) = (x + ∆x)⊤ y = x⊤ (y + ∆y)

with |∆x| ≤ γn |x| , |∆y| ≤ γn |y|

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ We get the relative forward error bound

|x⊤ y − fl(x⊤ y)| |x|⊤ |y|

|x⊤ x − fl(x⊤ x)|

so high relative accuracy is always obtained.

Samir Moustafa (University of Vienna) Numerical Algorithms

0 2000 4000 6000 8000 10 000

Samir Moustafa (University of Vienna) Numerical Algorithms

▶ Constants in the error bounds can be reduced by specific

▶ Computation of the outer product xy T is not backward stable

Samir Moustafa (University of Vienna) Numerical Algorithms

You might also like