Professional Documents
Culture Documents
and
ClimateGate
Derek O’Connor
www.derekroconnor.net
1 Introduction
This note is prompted by reports of errors in the calculation of two simple statistics: the
mean of a vector x, or x̄ = 1n i xi , and its variance Var(x) = 1n i (xi − x̄)2 .
P P
1
Attamen errores non sunt Artis sed Artificum, from the ‘Author’s Preface to the Reader’, Philosophiae
Naturalis Principia Mathematica, First Edition, July 5. 1686.
1
Two Statistical Calculations Derek O’Connor
The following information was taken from a paper [4], by Yun He & Chris Ding of the
NERSC-Lawrence Berkeley Labs who were doing a large-scale simulation of ocean circula-
tion. At each step of the simulation the following was done:
1. Sea Surface Heights are calculated at each point on a 64 ×120 latitude-longitude grid.
The Fortran code below does the summation part of these calculations.
sum = 0.0
do i = 1, 64 latitude index
do j = 1, 120 longitude index
sum = sum + ssh(i,j)
end do
end do
The order of summation can be changed by interchanging the i and j indices and by revers-
ing their order.
Table 1 shows the results that He & Ding got with the Fortran code shown above, on a single
processor, using IEEE double precision (∼16 decimal digits). He & Ding point out that these
results are completely wrong — not one digit is correct. We will analyse this problem in
Section 3 and explain why He & Ding got such inaccurate results.
Microsoft’s Excel spreadsheet has been in use for many years and has gone though many
versions. Many millions in business, government, and universities, use some version of
Excel. Most users do not have the time or ability to test the quality of Excel’s calculations,
and when errors do occur most users do not see the result as erroneous. Here is an example
where Excel gets a wrong answer to a simple problem.
We wish to calculate the mean and standard deviation of the set of numbers
xi = ai + M, ai = 1 i = 1, 3, 5, 7, 9 and ai = 2, i = 2, 4, 6, 8, 10.
where M is a large constant. This contrived example is designed to reveal flaws in the
standard deviation calculation.
The exact values of the mean and variance are
n n
1X 1 X 1
x̄ = xi = (ai + M) = (15 + 10M) = M + 1.5
n i=1 10 i=1 10
n 10 10
1 X 1 X 1X
Var(x) = (xi − x̄)2 = (ai + M − M − 1.5)2 = (ai − 1.5)2
n − 1 i=1 10 − 1 i=1 9 i=1
10
1X 1
= (±0.5)2 . = 2.5 = 0.277̇ . . .
9 i=1 9
√
SDev(x) = 0.277̇ . . . = 0.5270462766947299 rounded to 16 digits and does not involve M.
Table 2 shows the results of Excel 2000’s calculations with M = 108 , 1010 , 1014 , 1015 . The
last line of the Table 2 contains Excel 2000’s values for the standard deviation. None of these
values is correct. This result is not new: for many years and versions the variance function
in Excel has been calculated by a bad algorithm which gives the bad results shown here.2
We will analyse this problem in Section 4 and explain why Excel 2000 and later versions get
such inaccurate results.
2
Note that the sum in the last column is wrong also.
2 Preliminaries
We assume we are working in a floating point number system with base b and precision p.
The derived parameter, ǫ M = b1−p , is called machine epsilon, the distance between 1.0 and
the next higher floating point number. In such a system we can show that fl(1 + δ) = 1 if
δ < 21 ǫ M . That is, δ is insignificant compared to 1.
— TO BE COMPLETED —
F(b, p, emin , emax ) is the floating point number system with base b, precision p,
and exponent range [emin − emax ]. F is a finite subset of the rationals Q.
6. Cancellation Error
Theorem 1. In any floating point system F(b, t, −, −), without guard digits,
the relative error in fl(x − y) can be as large as b − 1.
Note: this can be as large as 100% for b = 2 and 900% for b = 10.
IEEE Double Precision : F(b, p, emin , emax ) = F(2, 53, −1021, 1024)
Both the mean and variance calculations are essentially summation problems. Calculating
the sum of n numbers is probably the simplest and most widely-used calculation in comput-
ing, from the home spreadsheet user to scientists who use earth and cosmos simulators.
We wish to explain why the simple problem of summation can give rise to such wildly-
inaccurate results. Such results usually indicate that the problem, that is, the data (x1 , x2 , . . . , xn ),
is ill-conditioned with respect to summation. However, a bad result may indicate a bad al-
gorithm.
We view a solution of a problem as a mapping or transformation from
We need to be very careful when using these four words because of the confusion that sloppy
use can cause.
— TO BE COMPLETED —
The following is a synopsis of pages 89–91, Trefethen and Bau, Lecture 12 [2].
kJ f (x)k
κ( f (x)) = kxk . (2)
k f (x)k
The relative condition (number) κ( f (x)) measures the relative change in the
solution f (x) due to a perturbation δx. If κ( f (x)) is small then the problem f is
well-conditioned. If κ( f (x)) is large then the problem f (x) is ill-conditioned.
For a one-dimensional function the expression in (2) becomes
| f ′(x)|
κ( f (x)) = |x| . (3)
| f (x)|
Notes :
1. We use the notation κ( f (x)) rather than κ( f ) to remind us that the con-
dition number depends on f and x.
Errors in Algorithms
k f (x) − fˆ(x)k
ef = is the relative error in f . (6)
k f (x)k
Assume we want to do a calculation f (x). We want to determine the effect of using x̂. Let
x̂ = x(1 + ex ) and e f = ( f (x) − f ( x̂))/ f (x).
— TO BE COMPLETED —
Example 2 (Condition vs Stability). Neumaier[6] has this example that nicely illustrates the differ-
ence between condition and stability. Calculate
p p
f (x) = x−1 − 1 − x−1 + 1, 0 < x < 1. (7)
√ √
Stability. For x ≈ 0, x−1 − √ 1 ≈ √x−1 + 1,√and the calculation of f (x) suffers from massive cancel-
lation. When x ≈ 1 f (x) ≈ 0 − 2 = − 2, and no cancellation occurs. Thus the calculation of
f (x) is unstable when x ≈ 0 and stable when x ≈ 1.
Condition. We have, using (3)
1 1
f ′ (x) = √ − √ ,
2 x2 x−1 + 1 2 x2 x−1 − 1
and, after some simplification, we get
f ′ (x) 1
κ( f (x)) = x = √ . (8)
f (x) 2 1 − x2
Hence f (x) is ill-conditioned near x = 1 because lim x→1 κ( f (x)) = ∞ and well-conditioned near x = 0
because lim x→0 κ( f (x)) = 12 .
Thus, the calculation of f (x) is stable but ill-conditioned near x = 1, and is unstable but well-
conditioned near x = 0.
This example shows clearly that condition and stability are two independent aspects of the
same problem.
The standard algorithm is based on the partial-sum recurrence si = si−1 + xi , with s0 = 0 and
i = 1, 2, . . . , n. The general sum algorithm is not well-known but is interesting because it
allows us to sum the elements of any set X where ‘+’ is defined, and to do this in any order.
The function Delete2(X ) deletes and returns two elements xi and x j . The elements chosen
depend on the Delete2 function. The two chosen elements are added and the result s added
back into X. Thus the size of X decreases by 1 for each iteration of the while – loop, which
halts when the size of the set X reaches 1. On exit, the set X contains one element, the sum
of the elements in the initial set.
In keeping with Trefethen & Bau’s definition of a problem, we view the summation of n
numbers, s(x), as a mapping from Rn to R. That is,
n
X
s : Rn → R, where s(x1 , x2 , . . . , xn ) = xi . (9)
i=1
√
We have Js (x) = [1, 1, . . . , 1] and so kJs (x)k1 = n, kJs (x)k2 = n, and kJs (x)k∞ = 1. Also
ks(x)k1 = ks(x)k2 = ks(x)k∞ = | xi |. Using these values in (2) we get3
P
P
|xi |
κ1 (s(x)) = n P
| xi |
P 12
√ |xi |2 (10)
κ2 (s(x)) = n P
| xi |
maxi {|xi |}
κ∞ (s(x)) = P
| xi |
3 P Pn
We use the abbreviated form xi for i=1 xi , in what follows.
P
We can see that these three condition numbers have the same denominator, | xi |, and if this
is small relative to the numerator the problem will be ill-conditioned. This can happen if x
has many positive and negative elements that cancel each other.
X X X
Rule of Thumb: xi is Ill-Conditioned if xi << |xi |.
P P P
The inequality | xi | << |xi |, is called Massive Cancellation in xi .
Example 3 (Massive Cancellation). Consider the problem x = [1, M, −M, M, . . . , M, −M] ∈ Rn+1 .
P P
We have xi = 1 and |xi | = 1 + nM. Hence
Each of these condition numbers can be made as large as we please by choosing M large enough.
Note, however, that it is not the value of M that causes the problem. The real culprit is the fact that
P
the denominator xi = 1 is small due to massive cancellation. This occurs no matter what the values
we have for M or n.
We now demonstrate that the SSH problem suffers from massive cancellation and is thus
ill-conditioned.
The exact sum of the SSH problem may be calculated using such systems as Maple, Math-
ematica, Maxima, etc., and we get sexact = 0.3579858392477036 (rounded to 16 digits). The
1-norm condition of the SSH problem, using the exact sum is
5.3025040611697 × 1016
P
|xi |
κ1 (sexact ) = n P = 7680 ≈ 1021 (12)
| xi | 0.3579858392477036
Even if we use the inexact sum of Matlab we get κ1 (sM ) ≈ 1019 . This shows that the
SSH summation problem is highly ill-conditioned. Hence we may expect trouble with this
summation.
An upper bound on the relative forward error is:
|scalc − sexact |
er (s) , ≈ κǫM , (13)
|sexact |
where ǫ M ≈ 2.2 × 10−16 is machine epsilon for IEEE double precision. Thus, for the SSH
problem we have the upper bound
Hence, we can expect to get no digits accurate in this sum, and we should not be surprised
that in IEEE double precision, Matlab’s sum(x) gives sM = 34.41476821899414, which is
100 times larger than the correct answer.
Although we cannot hope to get a correct answer for this problem in Matlab, the simple
and fast Matlab calculation cond1 = n*sum(abs(x))/abs(sum(x)) can warn us of trouble
ahead.
Calculating the exact value for this problem requires at least 26 digits of precision. This
precision (and higher) can be attained using 16-digit arithmetic by compensated summation
and other methods. However, even if we have the exact answer the fundamental difficulty
remains: the problem is ill-conditioned, i.e., a small change in the data will cause a huge
change in the result. No computational ‘trick’ can avoid this. Instead, we must ask the
question ‘why is this problem ill-conditioned?’ Is it an artifact the program that generates
the data, or the mathematical model of the ocean, or is it feature of the ocean itself? We
cannot answer these questions here. He & Ding [4] do not mention ill-conditioning and give
the impression that an accurate (or exact) calculation of the sum solves their problem. It does
not solve the problem, as this ‘maxim’ of Nick Trefethen implies:
raises the question: why do He & Ding use 16-digit numbers when they are comparing them
to satellite data which has much lower precision? Scientific modellers and programmers
would do well to study Trefethen’s Maxims, and remember Newton’s admonition.
Chan, et al.,[3] give the 2-norm condition number for this problem:
s
kxk 2 1 s2 (x)
κ2 (S (x)) = √ = 2 1+ . (16)
S (x) n S (x)
Higham[5], pages 32 and 528, gives a component-wise condition number:
P
|xi − x̄| |xi |
κc (S (x)) = 2 , x̄ = s(x)/n. (17)
S (x)
4
“Maxims about numerical mathematics, computers, science and life”, L. N. Trefethen, SIAM News,
v. 31, no. 1 (1998), p 4. Download here: http://www.comlab.ox.ac.uk/people/nick.trefethen/publication/PDF/1998_76.pdf
The calculation of S (x) using the definition in (15) is straight-forward but requires two
passes over the data: one to calculate s(x) and one to calculate S (x).
We can get a one-pass algorithm by rearranging the expression for S (x) in (15) to give:
n !2 n
X s(x) X 1
S (x) = xi − = x2i − s2 (x). (18)
i=1
n i=1
n
The algorithms for calculating the variance by these two formulas are shown below. The one-
pass algorithms is more elegant and ‘efficient’ than the two-pass algorithm but is numerically
unstable. Unfortunately, it is often ‘trotted out’ as a clever trick in elementary statistics
books. Worse still, Microsoft’s programmers thought it was a clever trick and used it in
Excel until quite recently. First, let us see why the one-pass algorithm is bad and then we
will see how Excel fares.
sumx = 0 sumx = 0
for i := 1 to n do sumsqx = 0
sumx := sumx + x[i] for i := 1 to n do
endfor sumx := sumx + x[i]
xbar := sumx/n sumsqx := sumsqx + x[i]ˆ2
sumsqd = 0 endfor
for i := 1 to n do sumsqd := (sumsqx - sumxˆ2/n)
sumsqd := sumsqd + (x[i] - xbar)ˆ2 return sumsqd
endfor endalg OnePassSSQ
return sumsqd
endalg TwoPassSSQ
The differences between these algorithms is obvious. What is not so obvious is this important
distinction: when properly implemented in floating point arithmetic, the Two-Pass algorithm
can never give a negative result, but the One-Pass algorithm can give a negative, hence mean-
ingless result. This has plagued amateur programmers for many years, and it seems to have
plagued the unfortunate programmer named Harry in the Climategate Affair (see Section 5
below).
We will use the vector x = [M, M + 1, M + 2] to examine the behaviour of these algorithms.
This simple contrived vector is chosen because it will allow us to see precisely where the
rounding errors occur and their effect on subsequent calculations.
Example 4 (Exact Arithmetic). We have x = [M, M + 1, M + 2]. Let S 1 (x) be the sum-of-squares cal-
culated by the One-Pass algorithm, and S 2 (x) the sum-of-squares calculated by the 2-pass algorithm.
These give:
n
X
s(x) = xi = 3M + 3 = 3(M + 1). (19)
i=1
n
X s2
S 1 (x) = x2i − = [M 2 + (M + 1)2 + (M + 2)2 ] − [3(M + 1)]2 /3
i=1
n
= (3M 2 + 6M + 5) − 3(M 2 + 2M + 1) (20)
= (5 − 3) = 2
Xn
S 2 (x) = (xi − s/n)2
i=1
= (M − M − 1)2 + (M + 1 − M − 1)2 + (M + 2 − M − 1)2 (21)
2 2 2
= −1 + 0 + 1 = 2
Thus both S 1 (x) and S 2 (x) give the same answer, in exact arithmetic. Notice, however, that the
intermediate calculations, (20) for S 1 (x) and (21) for S 2 (x), are different.
The condition numbers for s(x) and S (x) with x = [M, M + 1, M + 2] are:
1 P
|x |2 2 r
√ i 6 M 2 + 12 M + 10
κ2 (s(x)) = n P = < 1, for all M > 1. (22)
| xi | 9 M 2 + 18 M + 9
s s r
kxk 22
P 2
xi (3M 2 + 6M + 5)
κ2 (S (x)) = = = ≈ M, for large M. (23)
S (x) 2 2
This problem illustrates an important point about the condition of a problem : the condition of the
summation problem, κ2 (s(x)) < 1, is perfect as shown in (22), whereas the sum-of-squares condition,
κ2 (S (x)) ≈ M, in (23), can be made as ill-conditioned as we please by choosing M large enough.
Thus it is not the data that is ill-conditioned, but the calculation being performed on the data.
Now, let us perform the calculations of the previous example using floating point arithmetic.
This is tedious but worth the effort because the result can be generalized. Before we begin
the analysis we must remember the following points when using floating point arithmetic:
1. Associativity may not hold and so we assume that all expressions are evaluated from
left to right, i.e., fl(a ◦ b ◦ c ◦ d) = fl(fl(fl(a ◦ b) ◦ c) ◦ d) .
1 1
2. Unit Roundoff Error : u = ǫ M = b1−p = 2−53 ≈ 10−16 .
2 2
3. Relative Insignificance of δ : fl(x + δ) = fl(x) if δ < u x = x × 10−16 .
Example 5 (Floating Point Arithmetic). We have x = [M, M + 1, M + 2] and we assume that these
numbers are representable in the floating point number system
Range Errors
— TO BE COMPLETED —
The Matlab function S1vS2(pows) shown below implements both algorithms for x = [M, M, M+
2] and the results plotted for various values of M. Matlab uses IEEE double-precision arith-
metic. Note that this problem is so simple that each summation is performed in one line of
code without any loops over the data.
function S = S1vS2(pows);
n = length(pows);
S = zeros(n,2);
for p = pows
M = 2ˆp;
x = [M M+1 M+2];
s = x(1) + x(2) + x(3);
S(p,1) = x(1)ˆ2 + x(2)ˆ2 + x(3)ˆ2 - sˆ2/3;
S(p,2) = (x(1)-s/3)ˆ2 + (x(2)-s/3)ˆ2 + (x(3)-s/3)ˆ2;
end
Plotting code omitted
Figure 1 shows how bad the one-pass algorithm is compared to the two-pass algorithm.
Recall that in exact arithmetic S (x) = 2 for all values of M. The one-pass algorithm gives
the correct result for M < 226 , while the two-pass algorithm gives the correct result for
M < 253 . In fact the one-pass algorithm gives the same results as the two-pass algorithm
using single precision, thus losing half the attainable precision.
4 4
3.5 3.5
3 3
2.5 2.5
S (x)
S (x)
2 2
1
1.5 1.5
1 1
0.5 0.5
0 0
0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60
p p p p p p
p, where x=[2 2 +1 2 +2]; p, where x=[2 2 +1 2 +2];
We saw in Table 2 that Microsoft Excel 2000 gave completely wrong results for the variance
calculation. We explain here how Excel goes wrong by analysing a simpler problem which
is given in Table 3. The exact mean is M + 1 and the variance is 1, for each set. As we can
see, Excel 2000 gets the wrong result for the variance of the second set.
Assuming that Microsoft Excel 2000 uses IEEE double precision (∼ 16 digits), then using
the second variance formula S 22 = 12 [(3M 2 +6M +5)−(3M 2 +6M +3)] we get Ŝ22 = fl(S22 ) = 1
for data set 1 because the constants 5 and 3 are not zero relative to M 2 = 1014 . We get Ŝ 22 = 0
for data set 2 because the constants 5 and 3 are zero relative to M 2 = 1016 .
This shows that Microsoft Excel 2000 uses the bad, unstable, but faster one-pass method.
Microsoft was repeatedly told about this error but refused to fix it. They finally fixed it in
Excel 2003 and then charged people for an upgrade!
The spreadsheet Gnumeric is a free clone of Excel. Indeed the initial version was a per-
fect clone, repeating the errors of Microsoft Excel 2000. The Gnumeric-ers, to their credit,
quickly fixed it once they were told about the error.
Here is the latest news on Excel 2007 from The Inquirer5
A thread on Google Group microsoft.public.excel reveals that Excel 2007 loses its grip with
arithmetic that involves the number 65,535.
Several examples are shown, perhaps the simplest of which is the calculation ( 850 × 77.1 ),
which should produce 65,535 but instead returns 100,000.
There’s all sorts of speculation as to how this bug occurred, postulating floating-point and round-
ing errors and the like, but it seems much more likely that some Excel developer simply punted
at some point and the Vole’s stringent quality control (cough) never caught it.
Some might recall that mathematical errors have been discovered in Excel periodically in various
releases going back at least as far as Excel 5.
Microsoft people appear to have been involved in the discussion and confirmed the bug.
5
http://www.theinquirer.net/gb/inquirer/news/2007/09/25/math-bug-found-excel and follow Google Groups link
5 ClimateGate
This section examines some of the programming errors found in one Fortran program that
was in the Climategate files. Here is a concise description of how the Climategate files
became public:
“On November 17, 2009, someone posted to the Internet a vast archive of mate-
rials that had been hacked or leaked from the CRU. When packed, the materials
took up about 62 MB, and consist of more than 1,000 emails from prominent
members of the CRU and more than 3,000 documents that included everything
from raw data to annotated computer code to lengthy reports documenting the
frightfully disorganized state of the CRU’s vitally important data files.”
This quotation is referring to the Climate Research Unit (CRU), University of East Anglia
(UEA), which supplies much of the scientific knowledge and data to the UN’s Intergov-
ernmental Panel on Climate Change (IPCC). The compressed 62 MB file expands to about
150 MB and contains various emails, documents, and computer code in various languages
(Fortran and IDL mainly).
Many errors occur in scientific programs because the programmers (amateurs usually) have
an imperfect understanding of computer arithmetics.
Fortran has many arithmetics but we concentrate on just two: 32-bit signed integer and 32-bit
floating point arithmetic.
Integer Arithmetic: Most computers use 2s-complement integer arithmetic for all types
of integers. The range of the 32-bit signed integers is
[−231 , 231 − 1] = [−2147483648, 2147483647].
An integer overflow occurs when a program calculates an integer value that is outside the
integer range. The reaction to this overflow depends on how the program was compiled.
The usual reaction is silent overflow and we get intmax+1 = intmin or intmin-1 = intmax,
as shown in Figure 2.
Floating Point Arithmetic: The range of the 32-bit reals is about ±10±38 , with machine
precision ǫ M = 2−23 ≈ 10−7 (≈ 7 decimal digits precision). Machine precision ǫ M is the
distance between 1 and the next higher floating point number. Hence if |x| is the magnitude
of a floating point number, then the magnitude of the next higher number is |x|(1 + ǫ M ), and
the distance between these is ǫ M |x|. There are two important consequences of these facts:
-2^{31} -1 0 +1 2^{31}-1
+1
-1
• No floating point number can exist between |x| and |x|(1 + ǫ M ). A number which falls
between these two must be rounded to one or the other.
• The distance between these two floating point numbers varies with the magnitude of
x. This means that large consecutive floating point numbers have large gaps between
them, and small numbers have small gaps.
Ian (Harry) Harris is or was a scientist-programmer at the CRU. The UEA website says that
Harris specialises in dendroclimatology, climate scenario development, data manipulation
and visualisation, programming.
Harris appears to have been given a program (anomdtb.f90) written by another person and
told to get it working.6 This is a particularly nasty job if the code has been badly written and
not properly commented. It is to Harris’s credit that he kept a fairly detailed journal or log
of the work he did on the program.
This is an extract from the file HARRY_READ_ME.txt (Harris’s journal) where he tries to figure
out how a squared variable becomes negative:
6
The comments at the top of anomdtb.f90 say the program was written by Tim Mitchell on 11.02.02.
The errors occur in the subroutine Anomalise which extends from line 286 to line 576 in
anomdtb.f90. The main computational task in this subroutine seems to be the calculation
of the sum and sum-of-squares of data arrays which are then used to calculate standard
deviations for the data.
The program Harris is sweating over has at least two serious errors and if you are not ad-
equately trained in computer arithmetic, programming (and Fortran in this case), you will
never find the cause of these errors.7
7
Worse still, you may not find the errors if the output is wrong but plausible. This is why rigorous testing
of software is essential.
The program defines all variable as global. It uses the default Fortran types integer and
real. Any value of either type occupies 32 bits. Here are the statements that cause the first
error:
We can see that the variable being squared is an element of the integer array DataA, and that
these squared values are accumulated in the real variable OpTotSq.
Here is a skeleton version of the program that uses the DataA values given in the debug output
in Harris’s journal above. It shows how the negative OpTotSq values occur:
PROGRAM Climgate1
implicit none
integer,parameter :: dim = 16
integer,dimension(dim) :: DataA
data DataA /93,172,950,797,293,83,860,222,452,561,49920,547,672,710,211,403/
real :: OpTotSq, ROpTotSq
integer :: k
print*, " k DataA(k) DataA(k)**2 OpTotSq ROpTotSq"
print*, "---------------------------------------------------------------"
OpTotSq = 0.0
ROpTotSq = 0.0
do k = 1, dim
OpTotSq = OpTotSq + DataA(k)**2
ROpTotSq = ROpTotSq + real(DataA(k))**2
print "(i5,i8,5x,i12,5x,2f15.2)",k, DataA(k), DataA(k)**2, OpTotSq, ROpTotSq
end do
END
This program was compiled and run in the Release .NET mode with Silverfrost FTN95
Fortran compiler Version 5.4 for Windows.8 This mode allows silent integer overflow.
The output, which is identical to that of anomdtb.f90, shows that OpTotSq becomes nega-
tive because the result of DataA(k)**2 is negative due to integer overflow. Also shown is a
quick fix: real(DataA(k))**2 converts the integer DataA(k) to 32-bit floating point which
has the approximate range ±1038 and the subsequent squaring does not cause a floating point
overflow.
8
Harris seems to be using the Portland Group Fortran 90 compiler, pgf90, which is is one of the best
commercial compilers available.
The problem occurs because the array DataA is declared to be integer in Line 041. The
semantics of Fortran specify that (int op int) --> int. When the loop reaches k = 11
we get DataA(11) = 49920 and DataA(11)**2 should be 2492006400, but this is greater than
2147483647 by 344522753. This overflow causes the result to wrap around to -2147483648,
to which 344522753-1 is added, and we get -1802960896. Thus the square of a number has
become negative. This may seem to be crazy arithmetic, but it is standard in Fortran and
other languages.9 If programmers do not understand the arithmetics that the programming
language uses, then they will make and be baffled by such ‘errors’.
Error No. 2.
Further examination of the code reveals the unstable One-Pass standard deviation algorithm.
Here is one example (of three), starting at line 382:
P
The variable OpTot accumulates the sum of DataA/Factor, xi /α, while OpTotSq accumu-
lates the sum-of-squares of DataA/Factor, (xi /α)2 . The variable OpEn counts the number
P
(n) of data elements accumulated and the symbol α stands for the variable Factor. The last
statement is the Fortran version of the mathematical statement:
s P !2
n (xi /α)2
P
(xi /α)
Sdev(x) = α − , (27)
n−1 n n
9
It may be possible to set a compiler switch that generates code to ‘trap’ an integer overflow at run-time.
See 5.2 below.
which is the formula for the One-Pass algorithm (with the factor α). We have shown in Sec-
tion 4 why the One-Pass algorithm is bad, but unfortunately, many scientist-programmers do
not know this because few have taken rigorous courses in programming and numerical meth-
ods. Ironically, in this piece of code the programmer has fixed the integer overflow problem
by converting DataA to real. However I suspect that this conversion was done because the
programmer is unsure about what happens in Fortran when an integer is divided by a real
(Factor).
Although the One-Pass algorithm may not cause errors for the data given here, it would have
been better to use the Two-Pass algorithm. Indeed, if the Two-Pass algorithm had been used,
Error No. 1 (integer overflow) would not have occurred. The One-Pass is thoroughly bad: it
is unstable and it is prone to overflow and underflow.
Perhaps the most insidious aspect of the One-Pass algorithm is this: it works most of the
time.
Most scientists and engineers write or use programs to calculate various things and to organ-
ise their data into neat tables, lists, etc. Here is a warning that I have had on my Numerical
Algorithms website for many years10
Writing high-quality mathematical software is a very demanding and difficult task which
is best left to experts.
A corollary to this is that there is a lot of junk software in use today because of the
inability of users to distinguish between good and bad software.
Scientists and engineers would be better off using the highly-regarded set of Fortran subrou-
tines called Lapack11, or using a numerical system such as Matlab, which incorporates most
of the standard numerical algorithms, but are written and tested by experts. Indeed, Matlab
does not rely on its own experts to write the low-level mathematical algorithms (called math
kernels), but uses those written by experts at Intel and AMD who know how to get the best
out of their own processors. These math kernels are supplied by the CPU manufacturers and
are tuned for each class of processor.
The program anomdtb.f90 was, obviously, written by amateurs, as we see below:
1. Virtually no comments: This is a messy program doing a messy job (processing badly-
organised data). Comments would have helped clean up some of this mess.
2. Global Variables, Subroutines called without arguments: All variables are declared in
the main program. This is a cardinal sin because it breaks the rule that use of global
variables should be minimized if not eliminated.
3. Pointers and dynamic array allocation. Why? : The arrays used in the program are not
very large and do not need to be reclaimed. Besides, do the amateurs who wrote this
program understand dangling pointers, memory leaks, garbage collection, etc.?
10
http://www.derekroconnor.net/NA/na2col.html
11
Free, and also in C.
5. Poor structure. Three inline standard deviation calculations are performed rather than
using a single function.
shows that he does not know the parameters of the floating point arithmetic he is using.
He shows also that he does not understand Fortran’s integer arithmetic, which is the
source the problem.
7. Ignorance of standard numerical algorithms and their limitations: Whoever wrote the
program did not know that the One-Pass algorithm for calculating the standard devia-
tion is unstable.
— TO BE COMPLETED —
Appendix
PROGRAM MachParms
implicit none
integer*4 :: k,u,v
real*4 :: S
real*8 :: D
print*
print*,"--- Silverfrost FTN95: Machine Arithmetic and Parameters ---"
print*
print*, "Largest Integer*4 = ", huge(u)
print*, "Machine Epsilon S = ", epsilon(S)
print*, "Precision S = ", precision(S)
print*, "Min Exponent S = ", minexponent(S)
print*, "Max Exponent S = ", maxexponent(S)
print*, "Largest S = ", huge(S)
print*, "Smallest S = ", tiny(S)
print*,"-------------------------------"
print*, "Machine Epsilon D = ", epsilon(D)
print*, "Precision D = ", precision(D)
print*, "Min Exponent D = ", minexponent(D)
print*, "Max Exponent D = ", maxexponent(D)
print*, "Largest D = ", huge(D)
print*, "Smallest D = ", tiny(D)
print*,"-------------------------------"
print*
u = -2147483643
v = 2147483643
do k = 1,10
u = u - 1
v = v + 1
print "(i3,i15,3x,b32.32,5x,i15,3x,b32.32)", k,u,u,v,v
end do
END
-2^{31} -1 0 +1 2^{31}-1
+1
-1
These are two comments on integer overflow in Fortran, found on the WWW:
• Peter wrote:
I understand from previous postings that the Fortran standard does not require any check-
ing for integer overflow.
I notice the Intel Fortran compiler 9.1 has removed the compiler switch to check for inte-
ger overflow. How have other compiler manufacturers dealt with this?
On the PC, the Pentium has an overflow flag which is set by integer arithmetic, so overflow
checking is a single conditional jump instruction. So I am wondering if there is some
reason why this capability should be removed.
• Hewlett-Packard: Handling Integer Overflow
Trapping on integer overflow is disabled by default for Fortran and C; an integer overflow
does not generate a SIGFPE error. Detecting integer overflows requires not only that the
trap be enabled but also that the compiler insert special code in the executable file to check
for overflows.
To enable integer overflow checking for Fortran, use a !$HP$ CHECK_OVERFLOW INTEGER ON
directive (in HP Fortran/9000, use $CHECK_OVERFLOW INTEGER_4 or INTEGER_2) to obtain
the overflow checking code, and use an ON INTEGER OVERFLOW statement to handle the trap.
(The !\$H$ CHECK_OVERFLOW directive does not enable checking for operations in libraries.
Using the exponentiation operator involves a library call in HP Fortran, so it is not possi-
ble to enable integer overflow checking for exponentiation operations.) There is no way
to enable integer overflow checking in C. HP C provides no mechanism to insert over-
flow checking code into your executable, because the C language does not define integer
overflow as an error.
References
[1] Donald E. Knuth, The Art of Computer Programming: Seminumerical Algorithms, 2nd Edition, Vol. 2,
Addison-Wesley, 1981.
[2] Lloyd N. Trefethen and David Bau III, Numerical Linear Algebra, SIAM, 1997.
[3] T. F. Chan and G. H. Golub and R. J. LeVeque, Updating formulae and a pairwise algorithm for
computing sample variances, Technical Report STAN-CS-79-773, Stanford University, Dept. of
Computer Science, 1979.
[4] Yun He and Chris H.Q. Ding, “Using Accurate Arithmetics to Improve Numerical Reproducibility and
Stability in Parallel Applications”, Journal of Supercomputing 18 (2001), no. 3.
[5] Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd Edition, SIAM, Philadelphia,
2002.
[6] Arnold Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001.