# Assignment 1: Solutions

Applied Math 205 October 17, 2007
1. (30 points) The length of the hypotenuse (h) of a right-angle triangle is given by h= a2 + b 2 , (1)

where a and b are the sides of the triangle constituting the right angle. Write a MATLAB function myhypot(a,b), which computes h given a and b using only double precision arithmetic. In your solutions, draw a ﬂow chart of the algorithm used. Notice that a straight-forward implementation of (1) is susceptible to numerical overﬂow, i.e., if either |a| or |b| are close to the largest (or smallest) representable machine number, a2 will be too big to be represented and the algorithm will fail even if the expected result can in fact be represented. As a test in double precision, use the values a = 1×10300 and b = 2 × 10300 for which your function should yield accurate results. The problem of computing the absolute value of a complex number is exactly the same. A generalized version of this algorithm, computes the 2-norm of an n-dimensional vector, deﬁned as
n

x

2

=
i=0

x2 . i

(2)

In the same spirit as myhypot, write a MATLAB function mynorm(v) to compute the 2-norm of vector represented as an array in the variable v. Useful MATLAB built-in functions are max, min and sqrt. Solution: myhypot.m function h = myhypot(a, b) m = min(a,b); n = max(a,b); h = n.*sqrt(1 + (m/n).^2); mynorm.m function h = mynorm(v) vmax = max(abs(v)); u = v./vmax; h = vmax.*sqrt(sum(u.^2)); 2. (40 points) Write a double precision MATLAB function myexp(x) to evaluate ex using the series expansion ∞ x2 xn x3 x e =1+x+ + + ··· = . (3) 2! 3! n! n=0 1

Provide a ﬂow chart for the algorithm you implemented in the function. Plot the absolute and relative error (compared with the built-in function exp(x) for a range of x from -100 to 100 on a log-log scale. Brieﬂy explain the result. The way established numerical libraries compute the exponential is not through a series expansion but by using the identities: ex+y = ex ey and eax = (ex )a . (4) For example, a way of calculating e123 is e123 = e1×10
2

+2×101 +3×100

= e100 × (e10 )2 × (e1 )3 ,

(5)

provided e100 , e10 and e are known. Develop and write a function myexp2(x) that evaluates the exponential using decomposition of decimal numbers in the exponent similar to shown in (5). Your function should also work for negative values of x. The values of the exponentials of powers of 10 are provided on the course website through the ﬁle expm1.dat..Since ex ≈ 1 for x ≪ 1, ex − 1 is tabulated in the ﬁle to provide better accuracy if required. This table can be loaded in MATLAB by the command load ’expm1.dat’, or you can hardwire the table in your function by copy-pasting. You may ﬁnd the MATLAB built-in functions like rem, mod, fix, round, ceil and floor useful for decomposing x in multiples of powers of 10. Also include the results of this function in the previous plot. Standard libraries use a binary version of this algorithm for exponential, trigonometric and hyperbolic functions., which you may verify, does not require the use of the power-of (ˆ-operator in MATLAB) function. An eﬃcient implementation of the power-of function is instructive. How the table itself is generated for the ﬁrst time, is also an interesting question. However, we will not bother ourselves too much with these details, even though you have enough background to answer these issues. Solution: myexp.m function mysum = myexp(x) % Does not work for vectors x mysum = 1; myterm = x; n = 2; while(mysum+myterm~=mysum) mysum = mysum + myterm; myterm = myterm.*x./n; n = n+1; end myexp2.m function y = myexp2(x) if(x<0) % For negative values of x, compute 1/exp(-x) y = 1./myexp2(-x); elseif(x==0) y = 1; else dat = [ ... % expm1.dat copied verbatim 9.999999999999999999805e-10 1.000000000500000136887e-09; % 1 1.000000000000000000001e-08 1.000000005000000105234e-08; % 2 9.999999999999999999846e-08 1.000000050000001591961e-07; % 3 2

Figure 1: Flowchart for series summation 3

10

80

10

60

Absolute in myexp Absolute in myexp2 Relative in myexp Relative in myexp2

10

40

10 Error 10

20

0

10

−20

10

−40

10 −100

−60

−80

−60

−40

−20

0 x

20

40

60

80

100

Figure 2: Absolute and relative errors in the various functions. 1.000000000000000000036e-06 1.000000000000000000078e-05 1.000000000000000000078e-04 1.000000000000000000064e-03 1.000000000000000000064e-02 1.000000000000000000081e-01 1.000000000000000000108e+00 1.000000000000000000087e+01 1.000000000000000000069e+02 1.000000500000166516403e-06; 1.000005000016666821257e-05; 1.000050001666708378648e-04; 1.000500166708341662908e-03; 1.005016708416805841508e-02; 1.051709180756476291752e-01; 1.718281828459045312840e+00; 2.202546579480671789497e+04; 2.688117141816135609425e+43; % % % % % % % % % 4 5 6 7 8 9 10 11 12

]; dat = double(dat); n = floor(log10(x)); % x = quotient x 10^n + remain divisor = dat(10 + n, 1); % divisor = 10^n quotient = floor(x/divisor); % remain = x - (quotient*divisor); y = (1+dat(10 + n, 2))^quotient; x = remain; n = floor(log10(x)); while(remain>0 && n>-10) % repeat the same process with the divisor = dat(10 + n, 1); % remainder and multiply the result quotient = floor(x/divisor); % to the answer at each step remain = x - (quotient*divisor); y = y * (1+ dat(10 + n, 2))^quotient; x = remain; n = floor(log10(x)); end %while end %if Explanation: See ﬁgure 2. The absolute error using myexp.m is seen to be increasing exponentially with |x|. But it is incorrect to conclude in general that the computation is not accurate. For positive values of x, the absolute error may seem large, but in comparison to the value of theexp(x), it is accurate to machine precision. However, for negative values of x the value of the exponential itself is small, thus the relative error is also large. Hence we conclude that the series summation loses accuracy for 4

negative values of x. This loss of accuracy can be attributed to the alternation of signs in the series, the large magnitudes of terms and the fact that these large terms are supposed to add up to a small number. The myexp2.m function is always accurate to machine precision. Of course that is so because we have treated negative values of x separately. 3. (30 points) You are asked to derive a ﬁnite diﬀerence expression for the fourth derivative of a function f (x) using uniformly spaced points a distance h apart from each other. The expression must be at least second order accurate. What is the minimum number of points you must use in your stencil? Derive such an expression (using the minimum number of points you estimated). The fourth derivative of exp(x) at x = 1 is e = exp(1). Use your expression to compute the fourth derivative in a MATLAB subroutine for values of h = 2−1 , 2−2 . . . 2−45 and calculate the error in your computation. Plot this error on a log-log scale as a function of h. Explain as many features on this graph as quantitatively as possible for you. Solution: We expect there to be 5 point in a second-order accurate ﬁnite diﬀerence expression for the fourth derivative. In general, for the fourth derivative on a non-uniform grid, we would require 6 points for second order accuracy. But on an uniform grid, the grid points can be chose symmetrically about the point in question, thus automatically getting rid of odd powers of h in the error, where h is the grid spacing. This is demonstrated in the derivation below. Expand the neighbouring 4 points in a Taylor series about xk , the k th grid point as
′ fk+1 = fk + hfk +

fk−1 fk+2 fk−2

h6 vi h2 ′′ h3 ′′′ h4 iv h5 v fk + fk + fk + fk + f + ··· , 2 6 24 60 360 k h6 vi h2 ′′ h3 ′′′ h4 iv h5 v ′ f + ··· , = fk − hfk + fk − fk + fk − fk + 2 6 24 60 360 k 3 4 5 2 4h ′′ 8h ′′′ 16h iv 32h v 64h6 vi ′ f + f + f + f + f + ··· , = fk + 2hfk + 2 k 6 k 24 k 60 k 360 k 4h2 ′′ 8h3 ′′′ 16h4 iv 32h5 v 64h6 vi ′ f − f + f − f + f + ··· . = fk − 2hfk + 2 k 6 k 24 k 60 k 360 k

(6) (7) (8) (9)

The odd derivatives can be easily eliminated by adding fk+1 to fk−1 and fk+2 to fk−2 ,
′′ fk+1 + fk−1 − 2fk = h2 fk +

h4 iv h6 vi fk + f + ··· , 12 180 k 4h4 iv 16h6 vi ′′ f + f + ··· . fk+2 + fk−2 − 2fk = 4h2 fk + 3 k 45 k

(10) (11)

And now, eliminating the second derivative gives
iv fk+2 − 4fk+1 + 6fk − 4fk−1 + fk−2 = h4 fk +

h6 vi f + ··· , 3 k

(12)

which can be solved for the fourth derivative. The ﬁnal expression is:
iv fk =

h2 vi fk+2 − 4fk+1 + 6fk − 4fk−1 + fk−2 − fk + · · · . h4 3

(13)

The error in computing the derivative also has contribution from round oﬀ error in evaluation of the various fk ’s. The round oﬀ error in evaluation of each fk is approximately ǫ|fk |. Using the further approximation that fk+1 ≈ fk+2 ≈ fk , gives an estimate for round oﬀ error plus truncation error to be Error = E(h) = 16
2 ǫ|fk | vi h + |fk | + · · · ..., 4 h 3

(14)

5

10

45

10

40

Computed Analytical estimate

10

35

10

30

Absolute error

10

25

10

20

10

15

10

10

10

5

10

0

10 −14 10

−5

10

−12

10

−10

10

−8

10 h

−6

10

−4

10

−2

10

0

Figure 3: Error in the ﬁnite diﬀerence expression for the fourth derivative. (especially notice the absolute signs since in the worst case, errors add up). We compare this expression with the numerical computations in ﬁgure 3. For values of h bigger than 10−2 , the error decreases with decreasing h. The rate of this decrease is h2 , For h below 10−3 , the error to grows with decreasing h. The slope of that part of the curve is -4. These are the two limits of E(h) derived. The minimum occurs at h=2 3ǫ 2
1/6

≈ 0.005,

(15)

and the error there is about 4 × 10−5 . This is about as accurate a ﬁnite diﬀerence approximation for the fourth derivative can get. As the graph shows, the prediction matches with the computation quite well.

6