02 Precision

Summary of arithmetic operations
• Given
digits of precision:
• Multiplication – preserves significant digits since the exponent and
mantissa are calculated separately
• Addition – significant figures in one operand may be chopped since
both operands must have the same (highest) exponent
• Least significant figures of one operand are assumed to be insignificant in
the result, which has the higher exponent
• Multiplication is guaranteed to preserve significant figures where

the th figure may or may not be rounded
• Addition may preserve digits if significant figures are eliminated
through addition and subtraction
• Loss of precision
• Catastrophic cancellation
Loss of precision example
• Evaluate the following using three digits of precision:
1) Evaluate each term
2) Round and chop
3) Sum terms
• What is the expected value (at three digits of precision)?
• What is the relative error?

Reformulation
•Reformulate
to reduce loss of precision:
1) Multiply by the conjugate
• Note the loss of precision in the numerator

2) Apply the trigonometric identity:
3) Re-evaluate
Reformulation Example
•1) Identify possible loss of precision
2) Find a more stable formulation
Identify a potential loss of precision and reformulate:
The final operation can cause catastrophic cancellation when

is large
Reformulation Example 2
•Identify
a potential loss of precision and reformulate:
Loss of precision occurs for large (relative to )

• Solve by rationalizing the numerator
Test for
Trigonometric Identities
• Pythagorean theorem
1 , 𝑦)
(𝑥
2 =
𝑦
2 +
𝑥 1
• Angle sum and difference identities

Reformulation Example 3
•
Identify a potential loss of precision and reformulate:
Loss of precision can occur for small values of
simplify...
Test for and

Review of Taylor Series
• Replace functions with their Taylor approximations
• Importance in Numerical Methods
• Eliminate cancellation by replacing function with Taylor series
• Replace roundoff error with truncation error
• Bounded truncation error!
Review of Taylor Series
• Maclaurin Series
• Polynomial representation of a differentiable function centered
around zero
• Creating -order Maclaurin polynomial requires that the function be
differentiable times
• What is ? What is ?
General Taylor Series
• Identical
to a Maclaurin series shifted to the position
• The Maclaurin series is the Taylor series where
Taylor series approximations
• The
order of the series is the order of the highest term
• Calculate the 2rd-order Maclaurin polynomial for:
• Calculate the 2rd-order Taylor polynomial at

Truncation error
• Compute
the 2nd-order Maclaurin polynomial for:
3
• The error is on the order of the highest-order term not

represented:
• Error is on the order of , where

• Big-O notation,
• Error behaves like the polynomial
• This can be thought of as an upper bound: , where is some constant
• A smaller value provides a smaller error
• If , a higher value of provides a smaller error
Remainder Term
• Taylor’s Theorem with a Remainder Term
where is some value in

• The term on the right is known as the Lagrange remainder
• Apply Taylor’s Theorem with an Remainder Term to our
existing error term:
• We can bound the error to some multiple of the magnitude

of the function – control truncation error
Approximating special functions
• Calculate
the Maclaurin series for
• How does the error of a 7th order approximation change if we halve ?
The maximum error is reduced by

• What is the maximum error for values of ?
Since , we can use the Lagrangian remainder to bound
at
Using Maclaurin Series
• Preserve precision in
• Loss of precision occurs when is small
• Can compute to arbitrary precision

• Turn roundoff error into truncation error
Using Maclaurin Series
• Compute
Mean Value Theorem
• If
is a continuous function on the closed interval and
possesses a derivative at each point of the open interval ,
then
for some in .
• Unless you have a better approximation, just use
• Remove the catastrophic cancellation using the Mean Value
Theorem for large and small :
( 2 𝑥2−𝜖 )
𝜖𝑒
2𝜖
2 𝑥 +𝜖
Quadratic Equation
• Find the roots of a polynomial:
• Quadratic Equation:
• Two cases that can cause computational problems

• If , then the QE is undefined. The polynomial isn’t a quadratic, it’s
linear.
• If , catastrophic cancellation can occur in the numerator
• If , then can cause precision loss
• If , then can cause precision loss
• aka – this root is small, and many significant digits are cancelled
Catastrophic Cancellation in QE
•• Find
the roots of the polynomial
IEEE 64-bit float:
IEEE 32-bit float:
• Relative error =
QE Reformulation
• When
, there is one small root
• Assume . We can reformulate the small root to make the
equation stable.
?
QE Simplification
• For
, we have two roots given by:
• The large root,
• The small root,
• Simplify the calculation – express in terms of .
• Reformulate :
• Substitute into :
Standard QE Algorithm
•• For

− 𝑏 − √ 𝑏 2 − 4 𝑎𝑐 𝑥 = −2 𝑐
𝑥1 = 2
2𝑎 𝑏+ √ 𝑏2 −4 𝑎𝑐
• For
− 𝑏+ √ 𝑏2 − 4 𝑎𝑐 −2𝑐
𝑥1 = 𝑥 2 =
2𝑎 𝑏− √ 𝑏2 −4 𝑎𝑐
• If
where
then

02 Precision

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 Precision

Uploaded by

Copyright:

Available Formats

Summary of arithmetic operations

• Multiplication is guaranteed to preserve significant figures where

1) Evaluate each term

2) Round and chop

• What is the relative error?

1) Multiply by the conjugate

• Note the loss of precision in the numerator

Identify a potential loss of precision and reformulate:

The final operation can cause catastrophic cancellation when

Loss of precision occurs for large (relative to )

• Angle sum and difference identities

Loss of precision can occur for small values of

Test for and

• Calculate the 2rd-order Taylor polynomial at

• The error is on the order of the highest-order term not

• Error is on the order of , where

where is some value in

• We can bound the error to some multiple of the magnitude

• How does the error of a 7th order approximation change if we halve ?

The maximum error is reduced by

• Can compute to arbitrary precision

• Two cases that can cause computational problems

IEEE 64-bit float:

IEEE 32-bit float:

• The large root,

• The small root,

• Simplify the calculation – express in terms of .

You might also like