You are on page 1of 6

Assignment 1: Solutions

Applied Math 205


October 17, 2007

1. (30 points) The length of the hypotenuse (h) of a right-angle triangle is given by
p
h = a2 + b 2 , (1)

where a and b are the sides of the triangle constituting the right angle. Write a MATLAB function
myhypot(a,b), which computes h given a and b using only double precision arithmetic. In your
solutions, draw a flow chart of the algorithm used. Notice that a straight-forward implementation of
(1) is susceptible to numerical overflow, i.e., if either |a| or |b| are close to the largest (or smallest)
representable machine number, a2 will be too big to be represented and the algorithm will fail even if
the expected result can in fact be represented. As a test in double precision, use the values a = 1×10300
and b = 2 × 10300 for which your function should yield accurate results. The problem of computing
the absolute value of a complex number is exactly the same.
A generalized version of this algorithm, computes the 2-norm of an n-dimensional vector, defined as
v
u n
uX
kxk2 = t x2i . (2)
i=0

In the same spirit as myhypot, write a MATLAB function mynorm(v) to compute the 2-norm of vector
represented as an array in the variable v.
Useful MATLAB built-in functions are max, min and sqrt.
Solution:
myhypot.m

function h = myhypot(a, b)
m = min(a,b);
n = max(a,b);
h = n.*sqrt(1 + (m/n).^2);

mynorm.m

function h = mynorm(v)
vmax = max(abs(v));
u = v./vmax;
h = vmax.*sqrt(sum(u.^2));

2. (40 points) Write a double precision MATLAB function myexp(x) to evaluate ex using the series
expansion

x x2 x3 X xn
e =1+x+ + + ··· = . (3)
2! 3! n=0
n!

1
Provide a flow chart for the algorithm you implemented in the function.
Plot the absolute and relative error (compared with the built-in function exp(x) for a range of x from
-100 to 100 on a log-log scale. Briefly explain the result.
The way established numerical libraries compute the exponential is not through a series expansion but
by using the identities:
ex+y = ex ey and eax = (ex )a . (4)
For example, a way of calculating e123 is
2
+2×101 +3×100
e123 = e1×10 = e100 × (e10 )2 × (e1 )3 , (5)

provided e100 , e10 and e are known. Develop and write a function myexp2(x) that evaluates the
exponential using decomposition of decimal numbers in the exponent similar to shown in (5). Your
function should also work for negative values of x.
The values of the exponentials of powers of 10 are provided on the course website through the file
expm1.dat..Since ex ≈ 1 for x ≪ 1, ex − 1 is tabulated in the file to provide better accuracy if required.
This table can be loaded in MATLAB by the command load ’expm1.dat’, or you can hardwire the
table in your function by copy-pasting. You may find the MATLAB built-in functions like rem, mod,
fix, round, ceil and floor useful for decomposing x in multiples of powers of 10. Also include the
results of this function in the previous plot.
Standard libraries use a binary version of this algorithm for exponential, trigonometric and hyperbolic
functions., which you may verify, does not require the use of the power-of (ˆ-operator in MATLAB)
function. An efficient implementation of the power-of function is instructive. How the table itself is
generated for the first time, is also an interesting question. However, we will not bother ourselves too
much with these details, even though you have enough background to answer these issues.
Solution:
myexp.m

function mysum = myexp(x)


% Does not work for vectors x
mysum = 1;
myterm = x;
n = 2;
while(mysum+myterm~=mysum)
mysum = mysum + myterm;
myterm = myterm.*x./n;
n = n+1;
end

myexp2.m

function y = myexp2(x)
if(x<0) % For negative values of x, compute 1/exp(-x)
y = 1./myexp2(-x);
elseif(x==0)
y = 1;
else
dat = [ ... % expm1.dat copied verbatim
9.999999999999999999805e-10 1.000000000500000136887e-09; % 1
1.000000000000000000001e-08 1.000000005000000105234e-08; % 2
9.999999999999999999846e-08 1.000000050000001591961e-07; % 3

2
Figure 1: Flowchart for series summation

3
80
10
Absolute in myexp
Absolute in myexp2
Relative in myexp
60
10 Relative in myexp2

40
10

20
10

Error
0
10

−20
10

−40
10

−60
10
−100 −80 −60 −40 −20 0 20 40 60 80 100
x

Figure 2: Absolute and relative errors in the various functions.

1.000000000000000000036e-06 1.000000500000166516403e-06; % 4
1.000000000000000000078e-05 1.000005000016666821257e-05; % 5
1.000000000000000000078e-04 1.000050001666708378648e-04; % 6
1.000000000000000000064e-03 1.000500166708341662908e-03; % 7
1.000000000000000000064e-02 1.005016708416805841508e-02; % 8
1.000000000000000000081e-01 1.051709180756476291752e-01; % 9
1.000000000000000000108e+00 1.718281828459045312840e+00; % 10
1.000000000000000000087e+01 2.202546579480671789497e+04; % 11
1.000000000000000000069e+02 2.688117141816135609425e+43; % 12
];
dat = double(dat);
n = floor(log10(x)); % x = quotient x 10^n + remain
divisor = dat(10 + n, 1); % divisor = 10^n
quotient = floor(x/divisor); %
remain = x - (quotient*divisor);
y = (1+dat(10 + n, 2))^quotient;
x = remain;
n = floor(log10(x));
while(remain>0 && n>-10) % repeat the same process with the
divisor = dat(10 + n, 1); % remainder and multiply the result
quotient = floor(x/divisor); % to the answer at each step
remain = x - (quotient*divisor);
y = y * (1+ dat(10 + n, 2))^quotient;
x = remain;
n = floor(log10(x));
end %while
end %if

Explanation: See figure 2. The absolute error using myexp.m is seen to be increasing exponentially with
|x|. But it is incorrect to conclude in general that the computation is not accurate. For positive values
of x, the absolute error may seem large, but in comparison to the value of theexp(x), it is accurate
to machine precision. However, for negative values of x the value of the exponential itself is small,
thus the relative error is also large. Hence we conclude that the series summation loses accuracy for

4
negative values of x. This loss of accuracy can be attributed to the alternation of signs in the series,
the large magnitudes of terms and the fact that these large terms are supposed to add up to a small
number.
The myexp2.m function is always accurate to machine precision. Of course that is so because we have
treated negative values of x separately.
3. (30 points) You are asked to derive a finite difference expression for the fourth derivative of a function
f (x) using uniformly spaced points a distance h apart from each other. The expression must be at
least second order accurate. What is the minimum number of points you must use in your stencil?
Derive such an expression (using the minimum number of points you estimated).
The fourth derivative of exp(x) at x = 1 is e = exp(1). Use your expression to compute the fourth
derivative in a MATLAB subroutine for values of h = 2−1 , 2−2 . . . 2−45 and calculate the error in your
computation. Plot this error on a log-log scale as a function of h. Explain as many features on this
graph as quantitatively as possible for you.
Solution: We expect there to be 5 point in a second-order accurate finite difference expression for
the fourth derivative. In general, for the fourth derivative on a non-uniform grid, we would require 6
points for second order accuracy. But on an uniform grid, the grid points can be chose symmetrically
about the point in question, thus automatically getting rid of odd powers of h in the error, where h is
the grid spacing. This is demonstrated in the derivation below.
Expand the neighbouring 4 points in a Taylor series about xk , the k th grid point as

h2 ′′ h3 ′′′ h4 iv h5 v h6 vi
fk+1 = fk + hfk′ + fk + fk + fk + fk + f + ··· , (6)
2 6 24 60 360 k
h2 h3 h4 h5 h6 vi
fk−1 = fk − hfk′ + fk′′ − fk′′′ + fkiv − fkv + f + ··· , (7)
2 6 24 60 360 k
4h ′′ 8h ′′′ 16h iv 32h v 64h6 vi
2 3 4 5
fk+2 = fk + 2hfk′ + f + f + f + f + f + ··· , (8)
2 k 6 k 24 k 60 k 360 k
4h2 ′′ 8h3 ′′′ 16h4 iv 32h5 v 64h6 vi
fk−2 = fk − 2hfk′ + f − f + f − f + f + ··· . (9)
2 k 6 k 24 k 60 k 360 k
The odd derivatives can be easily eliminated by adding fk+1 to fk−1 and fk+2 to fk−2 ,

h4 iv h6 vi
fk+1 + fk−1 − 2fk = h2 fk′′ + fk + f + ··· , (10)
12 180 k
4h4 iv 16h6 vi
fk+2 + fk−2 − 2fk = 4h2 fk′′ + f + f + ··· . (11)
3 k 45 k
And now, eliminating the second derivative gives
h6 vi
fk+2 − 4fk+1 + 6fk − 4fk−1 + fk−2 = h4 fkiv + f + ··· , (12)
3 k
which can be solved for the fourth derivative. The final expression is:
fk+2 − 4fk+1 + 6fk − 4fk−1 + fk−2 h2 vi
fkiv = − f + ··· . (13)
h4 3 k

The error in computing the derivative also has contribution from round off error in evaluation of the
various fk ’s. The round off error in evaluation of each fk is approximately ǫ|fk |. Using the further
approximation that fk+1 ≈ fk+2 ≈ fk , gives an estimate for round off error plus truncation error to be

ǫ|fk | h2
Error = E(h) = 16 4
+ |fkvi | + · · · ..., (14)
h 3

5
45
10
Computed
40
Analytical estimate
10

35
10

30
10

25
10

Absolute error
20
10

15
10

10
10

5
10

0
10

−5
10
−14 −12 −10 −8 −6 −4 −2 0
10 10 10 10 10 10 10 10
h

Figure 3: Error in the finite difference expression for the fourth derivative.

(especially notice the absolute signs since in the worst case, errors add up). We compare this expression
with the numerical computations in figure 3.
For values of h bigger than 10−2 , the error decreases with decreasing h. The rate of this decrease is
h2 , For h below 10−3 , the error to grows with decreasing h. The slope of that part of the curve is -4.
These are the two limits of E(h) derived. The minimum occurs at
 1/6

h=2 ≈ 0.005, (15)
2

and the error there is about 4 × 10−5 . This is about as accurate a finite difference approximation for
the fourth derivative can get. As the graph shows, the prediction matches with the computation quite
well.

You might also like