You are on page 1of 10

Influenceof Statistical Method Used on the

ResultingEstimate of Normal Range

Allen H. Reed, Richard J. Henry, and William B. Mason

The choice of statistical method can greatly influence the calculated nor-
mal range apart from any biological or chemical considerations. Nonpa.
rametric normal range estimates are, for practical purposes, as accurate
as estimates that assume the distribution of data to be gaussian or log.
gaussian when the distribution assumed is true. When data are not distrib-
uted as assumed, nonparametric estimates are more accurate.

Additional Keyphrases haptoglobin data #{149} k factor method #{149} chi-square


lest #{149} Kolmogorov-Smirnov test #{149} nonparametric estimates . gaussian and
log-gaussian distribution of data . outliers #{149} no. samples for estimate

When the clinical chemist establishes a “normal disagree with the a priori assumption that most
range,” his intention is to make a statement about biological measurements are adequately described
typical values in a population larger than those by gaussian or log-gaussian curves. rfhese authors
involved in his study. His statement is statistical prefer the use of nonparametric2 methods for
in nature. Rather than deriving the normal range, estimating the normal range because they apply
he is deriving an estimate of the normal range, regardless of the underlying form of the statistical
intended to apply to this larger population, and population from which data are obtained. If the
this estimate has a certain amount of uncertainty distribution of the data is gaussian or log-gaussian,
associated with it. normal range estimates obtained by nonparametric
If the normal range calculation is viewed in this methods will require more samples to obtain the
context the next logical step is to compare statis- same precision of estimation as those obtained by
tical methods of normal range estimation on the methods based on the assumption that the dis-
basis of how well they estimate the true normal tribution is gaussian. However, if data are neither
range. We will show that the method used most gaussian nor log-gaussian, and they are treated as if
commonly today is not the most appropriate. they are, results obtained can he severely l)iased.
It has become standard to calculate normal Two nonparametric methods of normal range
range estimates by assuming data to be described estimation are the method of PE’s3 with associated
either by gaussian or log-gaussian curves.’ Usually, nonparametric confidence intervals and the
choice is limited to these two frequency functions. method of nonparametrie TI’S. r PE method is
No other option is considered unless there is strong discussed in (5). Recently Brunden et al. (6)
evidence that the data are described by neither applied nonparametric TI methods to the estima-
curve. tion of normal ranges for various blood constit-
Recently the validity of the gaussian model has
been the subject of considerable discussion (1-5).
Elveback et at. (3), Mainland (4), and others Parametric estimates are estimates derived from data as-
sumed to be described h’ a specific frequency distribution, such
as gaussian. Parameters characterizing the assumed distribution
curve are first estimated and then used for estimating normal
From Bio-Science Laboratories, 7600 Tyrone Ave., Van Nuys, range endpoints. Nonparametric estimates do not involve any
Calif. 91405. a priori assumption regarding frequency distribution.
1 Often called “normal’’ and “log-normal” curves. Log-normal Abbreviations used: ei, percentile estimate; TI, tolerance
and log-gaussian mean that logarithms of the data are distributed interval, an interval that includes a specified proportion, P, of
n a gaussian fashion. the population with a specified probability, y; NED, normal equiv-
Received Sept. 8, 1970; accepted Feb. 1, 1971. alent deviate; K-S, Kolmogorov-Smirnov (test).

CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971 275


assumed does not apply, estimates can he seriously
Table 1. One Hundred Values of Serum biased.
Haptoglobin, in mg/100 ml (as Hemoglobin-
Binding Capacity), Listed in Increasing Order’ To see how well nonparametric estimates per-
form for truly gaussian or truly log-gaussian sam-
14 59 82 100 128 ples, Table 2b was constructed from simulated
21 62 84 100 129
data. The first pair of columns under “Normal
21 65 85 101 135
range estimates” consists of estimates obtained
30 66 86 101 136
32 67 87 101 141 from 100 gaussian random numbers drawn from
35 67 87 103 142 the gaussian distribution with true 95% normal
36 69 88 105 147 range set equal to line 3, Table 2a-i.e., 12-179.
36 71 88 106 147 The second pair of columns under “Normal range
40 72 89 108 150 estimates” consists of estimates obtained from 100
44 76 90 108 161 log-gaussian random numbers with true 95%
47 76 90 108 162 normal range equated to line 6 of Table 2a-i.e.,
48 77 93 109 170 31-231.
48 77 94 113 174 Note that the nonparametric estimates of the
48 77 94 114 174
last two lines of table 2b do almost as well as those
50 77 95 114 176
51 78 96 114 179 for which distribution assumptions are correct-
52 79 96 116 181 i.e.,lines 1-3 of the gaussian random sample or
54 79 97 116 191 lines 4-6 of the log-gaussian random sample.
58 80 98 119 199 On the other hand, when an incorrect distri-
59 81 98 126 225 bution is assumed, nonparametric estimates are
“Samples obtained from workers at Bio.Science Laboratories. much better. For the gaussian random sample,
compare lines 10-11 with lines 5-6. For the log-
gaussian random sample, compare lines 10-11 with
lines 2-3.
The negative lower normal limits that occur in
uents in dogs. Here, we compare these two non- Table 2b merit comment. For the first column this
parametric estimation methods with gaussian and occurs because the gaussian distribution is not
log-gaussian methods and with each other. restricted to positive numbers, and -12 and -5
were included in the random numbers that were
An Example generated. Clinical laboratory data, however,
cannot he negative. Hence the nonparametric
Serum samples were obtained from one hundred estimates cannot be negative with real data. For
apparently healthy persons. The samples were the last column the true distribution is log-
analyzed for haptoglohin after electrophoresis on gaussian, and negative lower limits occur with
cellulose acetate (7). Table 1 lists in ascending those estimation methods in which data are
order the values obtained; the histogram is uni-
modal and skewed to the right (Figure 1). This is
typical of histograms encountered in much of 23
22
biological data, including quantitative measure-
ments of blood and urine constituents.
Normal range estimates for these data obtained
by eleven statistical estimation procedures are
given in Table 2, part a. The first three estimates
assume gaussian distribution, the second three
assume log-gaussian distribution, and the re- 13

maining five make no specific assumptions re-


garding the distribution from which data are 10
obtained. Statistical estimation procedures used
in Table 2a are discussed below. Table 2a illustrates
both the variety of estimates possible for a given
distribution assumption when different methods of
estimation are used, and the variation in estimated
normal limits when different distributions are
assumed.
The estimates of lines 7-11 (Table 2a) are valid 10 30 50 70 90 110 130 150 170 190 210 230
IIAPTOG LOB IN
regardless of the statistical distribution from which
data are obtained. On the other hand, if one of the Fig. 1. Histogram of serum haptoglobin determinations,
methods of lines 1-6 is used and the distribution in mg/100 ml, for 100 healthy adults

276 CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971


Table 2. Influence of Statistical Method Used on the Results for Estimate of Normal Rangee
a. Haptoglobin data
Distribution Normal range
Line assumption Estimation method estimate

1 Gaussian 95-90 tolerance interval 2 188


factor)
(k
2 Gaussian 95-50 tolerance interval 11 180
(k factor)
3 Gaussian Percentile 12 179
95-90 tolerance interval
4 Log-gaussian
(k factor)
28 258
5 Log-gaussian 95-50tolerance interval 31 233
(k factor)
31 231
6 Log-gaussian Percentile
7 Nonparametric 95-90 tolerance interval 14 225
r = 1, a = 1
8 Nonparametric 95-50 tolerance interval 21 191
r = 2, a = 3
9 Nonparametric 95-50 tolerance interval 21 199
r = 3, a 2 =

10 Nonparametric 95-50 averaged tolerance intervals 21 195


11 Nonparametric Percentile 21 195

b. Random sample data


Normal ran ge estimates
Gaussian Log-gaussian
random sample random sample
Distribution (true 95% range: (true 95% range:
Line assumption Estimation method 12-179) 31-231)

1 Gaussian 95-90 tolerance interval 5 188 -17 213


(k factor)
2 Gaussian 95-50tolerance interval 13 180 -6 202
factor)
(k
3 Gaussian Percentile 14 179 -6 201
95-90 tolerance interval 13 346 29 257
4 Log-gaussian (k factor)
5 Log.gaussian 95-50tolerance interval 16 306 32 233
(k factor)
6 Log-Gaussian Percentile 16 303 32 231
7 Nonparametric 95-90tolerance interval -12 211 24 339
r = 1, a = 1
8 Nonparametric 95-50 tolerance interval -5 183 26 245
r = 2, a = 3
9 Nonparametric 95-50 tolerance interval 15 184 32 245
r = 3, s = 2
10 Nonparametric 95-50 averaged tolerance intervals 5 184 29 245
11 Nonparametric Percentile 5 184 29 245
“Table 2a consists of normal range estimates obtained from the 100 haptoglobin samples of Table 1. The estimates of lines 1-3 as-
sume gaussian distributed data.The estimatesof lines4-5 assume log-gaussian distributed data. The estimates of lines 7-11 are non-
parametric. The first two columns under ‘Normal range estimates’s are for simulated data consisting of 100 gaussian random num-
bers obtained from the gaussian probability distribution with true 95% normal range of 12-179 (ef. line 3, Table 2a). In the last two col-
umns, estimates are obtained from 100log-gaussian random numbers with true 95% normal range of 31-231 (ef. line 6, Table 2a).

assumed to be gaussian distributed. In the case of 97.5 percentiles of the gaussian distribution are
real data, negative estimates can occur with - l.96a and + l.96a. PE’S are obtamed by
estimation methods that assume gaussian distribu- replacing and a by their estimates from the ob-
tion (20). served data, i and s. If data are distributed
according to the log-gaussian distribution, their
Gaussian PE’S and TI’s-The k Factor Method logarithms have a gaussian distribution. ‘l’hese
facts are the basis for the estimates of lines 3 and
For a gaussian distribution the population mean 6, Table 2. They are the familiar normal range
and standard deviation, and a, determine every estimates given as . ± 2s or log transforms of this
point of the curve. In particular the true 2.5 and expression.

CLINICAL CHEMISTRY, Vol. 17, No. 471 277


E±2#{231} a specify P = 0.95 and -y = 0.90, for example, and
.90
130 if n test values are randomly selected from the
target population, there is a number Ic such that
the probability that 95% or more test results in the
25
population are between L = - ks and U = +
trie 9lYi ks is 90%. As before, r and s are estimates of and
.... I A Jpuvsi#{248}s
a for the data at hand. Tables of appropriate
I values of Ic may be found in reference 9 for n up to
115 100, for P = 0.95, and for -y = 0.75,0.90, and 0.95.
Gaussian TI’S have come to be called the “method
plc ‘ ‘
of Icfactors.”
Note that if P = 0.95, Equation 1 is satisfied as
long as 95% or more of the gaussian population is
EL±2c b
included in the interval between L and U. There
is no restriction on the width of the TI. In fact,

:
the wider the computed TI the more likely it is that
it will include at least 95% of the population values.
We will now show that, in general, the width of
the TI is directly related to the choice of -y. Con-
sider a hypothetical population whose test values
70 have a gaussian distribution with true mean
= 100, and true standard deviation a = 10.
a. ‘ - For this gaussian frequency distribution of test
30 60 100 000
SAMPLE SIZE. ‘I
values, the middle 95% ranges from 80.4 to 119.6,
the 2.5 and 97.5 percentile points, respectively.
Fig. 2. Accuracy and precision of estimated normal
These are the true 95% normal limits. To see how
limits by the k factor method with = 0.90. In this ex-
ample, the true distribution is gaussian with = 100 and -y influences normal range estimates, the “ex-
= 10. In a, the “expected” or average value, E,, of the pected” or average values of L and U and their
estimated upper limit, is shown as a solid point for standard deviations were calculated for two
selected values of n. Vertical lines correspond to ±2 different eases in which -y is 0.90 and 0.50, respec-
so intervals about 1?. In b, corresponding data are
shown for the lower limit
tively.
In Figure 2a are graphed two standard deviation
intervals about the expected value of U-i.e. E ±
2a-for -y = 0.90 and for numbers of samples n =
Next consider gaussian TI’S. Thus, for P = 0.95 30, 60, 100, and 1000. Figure 2b shows the corre-
and -y = 0.90, a 95-90 TI is a pair of numbers L sponding graphs for E1. ± 2a. Note that when
and U, such that 95% or more of the population n = 30, the expected value of U is ELr = 123.9, and
values are greater than L and less than U, and this that as n increases this expected value becomes
statement is true with probability 0.90 (i.e., 90%). closer to the true value, 119.6. In addition, as one
A method for computing a TI when the data have would expect, the amount of variation in repeated
a gaussian distribution was first presented by calculations of L and U decreases for larger n.
Wald and Wolfowitz (8). They determined a con- For example, when n = 30, a value of U as high as
stant k, depending on the number of samples n, U = 131 is within the two sigma interval about
such that E.
Figures 3a and 3b are of the same form. The
Pr
(rU ,-
1 exp I
r -
1fx-\21
dx P = -y
only difference is that -y = 0.5 instead of 0.9-i.e.,
JL V2ira L \ a jj
a computed TI will encompass 95% or more of the
population with 50% probability rather than 90%
(1)
as in Figure 2. Note that the centers of the 2-
for L = ks and U =
- + ks. sigma intervals tend to be closer to the correct
To apply the method to normal range estima- values when ‘y = 0.5 than when -y = 0.9.
tion, assume a priori that the data are gaussian or Previously, one of us (H. J. H.) has advocated
log-gaussian distributed. If they are log-gaussian that the Ic factor method with -y = 0.90 be used in
their logs are gaussian distributed and calculations normal range estimation (9). However, the above
are based on these logarithms. In that case the analysis shows that the k factor method with
antilogs of L and U are estimated limits of the 1’ = 0.90 results in normal range estimates that are
normal range. If the data are taken as gaussian generally too wide unless the number of samples
distributed, the estimated normal limits are L and is extremely large (n 1000). If the normal range
U. In the context of normal range estimation, estimate is too wide, this weakens the usefulness of
Equation 1 may be paraphrased as follows: If we the test because its diagnostic power is decreased.

278 CLCAL CHEMISTRY, Vol. 17, No. 4, 1971


The worth of the Ic factor method ultimately de-
pends on the adequacy of the gaussian assumption .99

for normal range estimation. In the next section,


this question is examined more closely.
.90
., .. ..

Statistical Tests of the Gaussian Assumption


A frequently used graphical method for assessing .50 ./J
validity of the gaussian assumption is to plot
ordered data points on probability paper (10). On
probability paper the ordinate is scaled according it (/
.10
to percentage points of the gaussian frequency
function and the abscissa is rectilinear. A log-
gaussian assumption can also be assessed in the .0l
same fashion by plotting logarithms of data on
probability paper. If probability paper is not
available, the NED plot (9) on ordinary rectilinear
graph paper achieves the same purpose. For the 0 20 40 60 80 00 20 140 180 160 200 220
I4APTOGI.OSIN
NED plot, equally spaced points on the ordinate
scale are renumbered according to a gaussian Fig. 4. Plot of cumulative frequencies of haptoglobin
scale. values (Table 1) on probability paper
The value of gaussian and log-gaussian plots in
testing the distribution of data is illustrated by hility paper. In the plot of Figure 5, the curvature
reference to the haptoglobin values mentioned is reversed.
earlier. Figure 4 is a plot on probability paper of It would be difficult to make a decision between
the one hundred haptoglobin values. If these data the gaussian or log-gaussian models on the basis
have a gaussian distribution, data points will fall of these curves. In the remainder of this section a
on a straight line except for slight random fluc- more rigorous statistical analysis will be described
tuations. Note that there is some curvature and in which it is concluded that neither model is
moderate departure from a straight line. Figure 5 appropriate.
shows logarithms of the data plotted on proha- Chi-square tests (10) were applied to the data
and to logs of the data. For testing goodness of fit
to the gaussian distribution the computed chi-
square value was 16.7. This is significant at the
Eu±2O 0 5% level (15.5) but not at 1% (20.1). This means:
.50
if the assumption is true that the data are dis-
125
tributed like a random sample from the gaussian

120 I I
I
i /‘“
US 97V5 probability
obtaining
distribution,
a chi-square
then the probability
value as large or larger than
16.7 is less than 0.05 but greater than 0.01. The
of

115 ehi-square test for log-gaussian distribution re-


sulted in a computed chi-square4 of 13.2. This is
not quite significant at the 5% level. However,
the chi-square is known to give different results
depending on how data are grouped.
An alternative test of goodness of fit that does
b not depend on the grouping of data and that is
generally more powerful than the chi-square test,
85 the Kolmogorov-Smirnov (K-s) test (11), was
also applied. The K-S value for testing gaussian
80 distribution, 0.098, is significant at the 5% level
(0.089) but not at 1% (0.1031). The K-S value for
testing log-gaussian distribution, 0.1029, is signifi-
75
cant at 5% and is almost significant at the 1%
level.
30 60 100 1000
SAMPLE SIZE,n
Natural logarithms of the data were tested for gaussian dis-
Fig. 3. Accuracy and precision of estimated normal tribution with class intervals grouped so that expected frequen-
limits by the Ic factor method with ‘y = 0.50. Details as in cies were at least 0.5 in the tails and at least in the remaining
Figure 2 class intervals [ef. (J)[.

CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971 279


.99

.90

/1/i/I. #{149}‘

.50

Fig. 5. Plot of cumulative frequencies of


logarithms of haptoglobin values (Table 1) /..

on probability paper ‘..


-I.

.01

I I I I I I I I I I I
1.1 1.2 1.3 .4 .5 1.6 1.7 1.8 .9 2.0 2.1 2.2 2.3 2.4
LOG NAPTOGLOBIN

Thus, it is clear there is very little to support a of the normal range has rank 100 - a + 1 = 100. Since
conclusion that the data follow either a gaussian or = 225, the upper limit is 225. An estimated normal range
corresponding to the 95-90 TI is therefore 14 to 225.
log-gaussian distribution. In spite of this difficulty,
If we are content to dei’ive a 95-50 TI, in which ‘ = 0.50 is
the estimated normal range depends very much on the probability of coveting 95% of the population, then from
the distribution selected, as previously shown in Somerville’s tables, ni = 5. If r is set to 2 then s = 3 and the
Table 2a. estimated normal range has lower limit X(2) = 21 and upper
limit x1951 = 191.
Note in this case where in is 5, that we have arbitrarily
Nonparametric Normal Range Estimation choseti r = 2 and a = 3, so that r + a = m. If r = 3 and a = 2
is choseti, however, a second normal range estimate is oh-
A nonparametric solution to the ri problem was tamed, namely 21 to 199. A third TI estimate is obtained by
originally given by Wilks (13). Wilks’ solution is averaging the two TI estimates above. The estimated normal
based on the binomial probability distribution. range by this method is 21 to 195 (ef. lines S to 10, Table 2).
For n individuals randomly selected from the Now, consider the method of percentile estima-
population about which inferences are desired, a tion. The PE method is also based on ranking the
95% TI is computed by ordering (ranking) the n test results in order of magnitude. An estimate of
test results X(i),. . ., Xn and then finding the highest the 2.5 percentile of the frequency distribution of
rank r and the smallest rank n s + 1 greater
-
the target population is the 2.5 percentile of the
than r such that observed sample frequency distribution. The
n-.+1/n\ sample 2.5 percentile is the lth ordered sample
j=r
( .1
.) (0.95)’(0.O5)”- -y (2) value where 1 = 0.025(n+ 1). The corresponding
estimate of the 97.5 percentile is the lth largest
where -y is the probability of including 95% of the sample. For most values of n, 1 is not a whole
population within the TI. Test results for the numl)er and it is best to interpolate between the
individuals ranked rth and n s + 1X(r) and
-
two ordered sample values whose ranks are nearest
X(_ .+ l)-are the endpoints of the normal range as and on each side of 1.
estimated by the TI method.
Tables have been published in Somerville (14) Exam pie: For the haptoglobin data, n = 100, and therefore

tabulating m = r + s for values of -y = 0.5, 0.75, = 0.025(101) = 2.525


0.9, 0.95, and 0.99, and for sample sizes of 50 to
Since X(2) = 21 and X(i) = 21, interpolation is unnecessary
1000. When the normal range to be estimated is for the estimated lower limit. To compute the upper limit,
the middle 95% of the population, r and s are the second highest ranked value is X(99) = 199, and the third
usually chosen equal if m is an even number, or highest ranked value is X() = 191. By interpolation, the
different by 1 if m is an odd number. estimated upper limit is 194.5, which is rounded to 195.

Example: To derive an interval having a 90% probability The PE method gives single numbers as es-
of covering 95% of the target population-i.e., a 95-90 TI- timates of the population normal limits. It is also
for the 100 haptoglobin values cited earlier, note that from
possible to derive a confidence interval for each of
Table 1 of reference 14, ni 2. Choose r = 1, and obtain
X(1) = 14 as the lower limit of the estimated normal range. the normal limits. A confidence interval is a con-
In order that r + a = m, a = 1 and the estimated upper limit tinuous interval that covers the true value of the

280 CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971


normal range endpoint with specified probability,
say 0.90. A 90% confidence interval for the lower
Table 3. Nonparametric 90% Confidence Intervals
for Normal Limits’
endpoint of the normal range is obtained by
No. of samples, Rank
choosing a and b such that From To a b

b-i /fl\
120 131 1 7
(.)(o.025)i(o.975)n-1 0.90 (3)
j=a J 132 159 1 8
160 187 1 9
The ath and bth ranked test values (X(a), X(5)) then 188 189 1 10
comprise a 90% confidence interval for the 2.5 190 216 2 10
percentile in the population. Therefore, the prob- 217 246 2 11
ability is less than 0.1 that xb, the upper limit of the 247 251 2 12
confidence interval, is less than the true 2.5 252 276 3 12
percentile, or X(0), the lower limit of the confidence 277 307 3 13
interval, is greater than the true 2.5 percentile- 308 310 3 14
i.e., the converse of Equation 3. Table 3 gives 311 338 4 14
339 366 4 15
values of a and b for various values of n, and also
367 369 5 15
relates the confidence interval for the upper limit
to a and b. ath lowest sample value = lower limit of 90% confidence
#{176}

interval for 2.5 percentile intargetpopulation.bth lowest sample


Example: A study conducted in our laboratory involved 204 value = upper limit of 90% confidence interval for 2.5 percentile
women aged 20-29 who were selected from eight geographical in target population. To obtain ranks corresponding to a 90%
regions of the U.S. and considered to be free from disease by confidence interval for the 97.5 percentile, subtract the values
their examining physicians. For the group, the lowest 10 given for a and b from a + 1.
albumin values (g/100 ml) ranked in order, are X(l) = 3.5,
Xe) = 3.6, Zn) = X(.l)= Xe) = ... = X(9) = 3.7, Xio> = 3.8. Table 4. Nonparametric 70% Confidence Intervals
The highest are X(195) = ... = X(198) = 5.0, x11991 = . . - = for Normal Limits,
= 5.1, X(ioa = Xnl) = 5.2. Estimated endpoints of the No. of sa mples, n Rank
normal range are obtained from 1 = 0.025(205) = 5.125. From To a b
Note that interpolation is easy because of the many identical
values. Thus, an estimate of the normal range for albumin for 75 82 1 4
American women aged 20-29 is 3.7 to 5.1. Also, from Table 3, 83 111 1 5
a = 2 and S = 10; hence, a 90% confidence interval for the 112 134 1 6
lower limit of the normal range is Xe) = 3.6 to X1101 = 3.8. 135 142 2 6
A 90% confidence interval for the upper limit is X(195) =
143 173 2 7
5.0 to X(2n:I) = 5.2. 174 188 2 8
Table 3 was constructed from Equation 3. In 189 206 3 8
order to evaluate Equation 3 the binomial prob- “ath lowest sample value = lower limit of 70% confidence
ability distribution was approximated by the interval for 2.5 percentile in target population. bth lowest sample
value = upper limit of 70% confidence interval for 2.5 percentile
poisson distribution in some cases and gaussian in target population. To obtain ranks corresponding to a 70%
distribution with continuity correction in other confidence interval for the 97.5 percentile, subtract the values
cases (10). Results were checked by means of the given for a and b from n + 1.
incomplete beta function (15).
If n < 120, it is not possible to obtain two-sided
90% confidence intervals for the 2.5 or 97.5
tests for gaussian or log-gaussian distribution has
percentiles-i.e., we cannot set limits on both sides
been demonstrated. For these reasons we see little
of the true 2.5 percentile or 97.5 percentile that justification for computing normal range es-
hold with probability 0.90. One alternative is to
timates by any method requiring gaussian or
reduce the confidence (i.e., the probability) with log-gaussian assumptions.
which the computed interval covers the desired Consider next TI estimates. If the normal range
population percentile. In Table 4 ranks are given
is defined as the interval between the 2.5 and 97.5
for 70% confidence intervals, which can he cal- percentiles of the target population then a TI is not
culated beginning with n = 75. necessarily an estimate of this interval. A con-
tinuous interval that includes any 95% of the
Comparison of Methods for population satisfies the mathematical requirement
Estimating Normal Range
of a 95% tolerance interval. As a consequence,
If data are in fact gaussian or log-gaussian there may be an indeterminacy in the choice of
distributed we have shown by example that non- endpoints of the estimated normal range. This was
parametric estimates are practically indistinguish- shown in the example of TI computations for
able from the best gaussian or log-gaussian es-
timates. We have also shown that when neither
‘A view held by many statisticians is that for a sufficiently
assumption is true, nonparametric estimates are large miumber of samples tio teal data are really gaussian dis-
much better. The inconclusiveness of preliminary tributed [ef. (4)].

CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971 281


haptoglobin with -y = 0.5. In that case m = 5 and tribution is a modification of the r10 statistic pro-
two equally reasonable choices for TI’S give ranges posed by Dixon (17). Our criterion is to reject the
of 21 to 191 and 21 to 199. largest value, X(), if the distance between it and
Alternatively, it is possible to average the two X()) is more than one third the range-i.e., reject
TI estimates. An example of this averaged esti- X(,,) if
mate was given on line 10 of Table 2. This averaged
Ti agreed with the PE of line 11. It is not clear, how- r = X(,,) - X(,l)
ever, that the averaged TI estimate any longer has X() - X(l) 3
the probability property of a ri-i.e., the property
This same criterion may be applied to determine if
of including a specified percentage of the popula-
X(1) can reasonably be called an outlier. For this
tion between estimated limits with probability -y.
purpose the criterion is to reject X(1) if
In addition, TI estimates will tend to be too wide
if -y> 0.5. This was shown in Table 2 and Figure XC) - X(l) 1
>-
2 applying the k factor method to data following X(n) - X(l) 3
a gaussian distribution and the same situation
holds with nonparametric methods. Example: In the case of albumin values for women aged 20 to
29, the lowest value originally was X(1) = 2.5 and the second
In contrast to the TI method, the PE method
lowest was Xe) = 3.5. Calculation of r resulted in
provides definitive estimates for endpoints of the
normal range. Moreover, when associated con- 3.5-2.5
52 25 27
1
fidence intervals are calculated for the endpoints,
they define regions of uncertainty that are useful and thus
for determining precision of the estimates. For the
I
albumin data cited previously, the 2.5 percentile, r>3
which is taken as the lower normal limit in the
population, is estimated to be 3.7 with a 90% con- Subsequent examination of information available on this
woman revealed that her A/G ratio was 0.7 (also low) and
fidence interval of 3.6 to 3.8. This means, for
that she was being treated with cortisone. On the basis of this
example, that an albumin value of less than 3.6 information, it was felt that this woman’s test result does not
g/100 ml has less than 5% probability of being belong to the population we wished to describe and X(i) = 2.5
within the true normal range. Likewise, a value was discarded as an outlier.
that is greater than 3.8 g/100 ml has less than 5% Note also for the same albumin data that the estimated
probability of being outside the true normal range. normal range is unchanged if this outlier is excluded. Only the
limits of the confidence interval are changed. This is typical
Values in the interval 3.6 to 3.8 all have identical and is due to the frequent occurrence of identical values.
significance.
Dixon computed critical values of the statistic r10,
The Problem of Outliers assuming a gaussian distribution, for sample sizes
up to 30. If the data are gaussian distributed and
Unless the number of samples is extremely large, the sample size is 30, our rule has a significance
normal range estimation by nonparametric level approximately equal to 2.5%-i.e., the
methods almost entirely depends on the one or two probability that r is greater than one-third, is
lowest and highest values. Thus, one or two persons close to 0.025. If n > 30, the rule is conservative.
who happen to he included in the sample but who A word of caution is warranted with respect to
are subclinically ill, and therefore do not properly outliers. It is characteristic of currently available
belong to the population for which we wish to nonparametric methods that extreme sample
estimate the normal range, could have a con- values play a much greater role in calculation of
siderable influence on the final normal range normal range estimates than do intermediate
estimate. Also, there is always the danger that an values. As a result nonparametric PE’s are vulner-
extreme value is spurious for reasons of technical or able to distortion if outliers are present. We have
clerical errors. On the other hand, it should be argued that nonparametric estimates are in general
recognized that nonparametric methods based on preferable to gaussian or log-gaussian estimates.
the extremes of ranked data are quite sensitive to Our demonstrations have assumed subjects were
overzealous discarding of unrepresentative cases. obtained from a homogeneous population and were
It is difficult to choose a middle ground. characteristic of that population. What about the
There is a large statistical literature on the more realistic case where mixed populations are
problem of outliers (16-18). Although several involved, where an individual’s deviation from
writers have sought probabilistic criteria for the intended population is not clear-cut hut he
rejection of outliers, their methods invariably may be in an early stage of disease? Clearly, the
assume the data to he a homogeneous random problem of outliers cannot he resolved by mathe-
sample from a gaussian distribution. A rule that matical formula, such as the r criterion. At best,
seems to us to be more oriented towards assessing the r criterion is a rough screening device, which is
homogeneity of sample rather than gaussian dis- no substitute for careful specification of a method

282 CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971


for obtaining subjects from the population about
which normal range estimates are desired. Table 5. Limits of Population Percentage
Excluded by Use of the 2.5 Percentile Estimate
For guarding against outliers owing to technical (Joint Probability Level = 0.90)#{176}
error it is advisable to obtain a sufficiently large
No. Samples, Rank of Percentage of population excluded
volume of each specimen so that repeat deter- n PE Mm. Max.
minations can be made for the three highest and
39 1 0.1 7.5
three lowest values. This assumes the constituent
119 3 0.7 5.3
being measured is known to be stable for the 199 5 1.0 4.6
period between analyses. If a repeat value is 279 7 1.1 4.3
substantially different from the original we would 399 10 1.2 4.0
discard both values prior to calculating PE’s. The 2.5 2.5
purpose of repeating the analysis is to ensure that A percentileestimate (PE) of the 2.5percentilewillexclude
#{176}
those values that most influence normal range from the normal range (values to the left of the 2.5 PE are ex-
estimates should truly reflect biological variation, cluded) a percentage of the populationthat is unlikelyto be
exactly2.5%.Thistableshows the minimum and maximum per.
and not be due to gross error. Note that we are not centages that are likelyto be excluded as a function of the
suggesting that values be averaged. This is statis- number of samples, a. Both limitshold simultaneously with
tically objectionable. Unless it is obvious that an 90% probability. This table is similar to one Herrera (19) con-
structed for the tenth percentile.
original value is technically incorrect it should he
retained unchanged.
The effect of the outlier problem in normal
range estimation is reduced considerably if a val calculation has served to alert us to the im-
sufficiently large number of samples is obtained. precision in the calculated estimates.
This is discussed further in the next section. To obtain an indication of the precision of PE’S,
consider limits of the population percentage ex-
Number of Samples for Normal Range Estimation cluded by use of the 2.5 PE. Optimally, the 2.5 PE
will have 2.5% of the population to the left of its
If n = 39, the 2.5 PE has rank 1 = 1, and normal value. If so, the 2.5 PE would exclude exactly 2.5%
limits are estimated by the lowest and highest of the population from the estimated normal
values obtained. In this case, a single outlier will range. However, the PE is obtained from a limited
result in a distorted estimate of one of the normal amount of data and it is unlikely to exclude this
limits. If n = 79, 1 = 2 and as few as two outliers exact percentage. For example, if n = 119, the
will distort the normal range estimate if they are rank of the 2.5 PE is 1 = 0.025(120) = 3. When
both low or both high. n = 119, Table 5 indicates that XC) may exclude as
It is possible to reduce vulnerability to outliers little as 0.7% or as much as 5.3% of the popula-
by obtaining a sufficiently large number of samples tion. Table 5 was constructed in such a way that
that two-sided confidence intervals may be cal- the above statement is true with probability 90%.
culated from the data. In that case, the experi- Although these limits become narrower as n
menter has, in addition to the single estimate of increases, they are still quite wide when n = 400.
the 2.5 percentile, an interval derived from the Thus one should strive to obtain as many values
sample values that includes within it, with speci- as possible.
fied confidence level, the true 2.5 percentile in the As a practical working rule we recommend that
population. This was previously illustrated in the n = 120 be taken as the smallest number of
example for albumin in women aged 20-29. samples from which to calculate normal range
It can be seen from Table 3 that the least sample estimates and confidence intervals. But it should
size which permits 90% confidence intervals for the be recognized that sometimes this number will be
normal limits is n = 120. On this basis, we may say inadequate.
that n = 120 is the smallest number of samples
that should be used in calculating normal range Summary and Recommendations
estimates. Even this size of n will sometimes be
found to be inadequate. When n = 120 a 90% The normal range calculation has been treated
confidence interval derived according to Table 3 as a statistical estimation problem. We have
for the 2.5 percentile is the interval from X(l) to X(7). argued that the a priori assumption that data are
It may happen that X(l) and X(7) are far apart, re- described by either the gaussian or log-gaussian
sulting in a confidence interval that is quite large. curve is unwarranted. An example has been used
This would occur if XC) is an outlying value, but to show that nonparametric estimates (which make
suppose there is no basis for excluding it. Another no prior assumptions) are, for practical purposes,
possibility is that biological variation in the as accurate as gaussian or log-gaussian estimates if
chemical constituent being measured is so large the assumptions happen to he true, whereas if
that n = 120 is not sufficient for precise estimation gaussian or log-gaussian assumptions are inappro-
of normal limits. In any case the confidence inter- priate, the corresponding estimates can he biased.

CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971 283


We recommend that use of the k factor method 5. Elveback, L. II. amid Taylor, W., Statistical methods of esti-
mating percentiles. Ann. N.Y. Aced. Sd. 161, 538 (1969).
for normal range estimation be discontinued, since
6. Brunden, M. N., Clark, J. J., and Sutter, M. L., A general
the k factor method is valid only for gaussian method of determining normal ranges applied to blood values of
distributions. We recommend that use of graphical dogs. Amer. J. Gun. Pat/mo!. 53, 332 (1970).
methods (such as the NED plot) for testing, whether 7. Javid, J., amid Horowitz, H. I., An improved technic for the
data follow a gaussian or log-gaussian distribution, quantitation of serum haptoglobin. Amer. J. Gun. Pat/mo!. 34, 35
(1960).
be discontinued because they are not conclusive.
8. Wald, A. amid Wolfowitz, J., Tolerance limits for a normal dis-
Of the nonparametric estimates discussed, we tribution. Ann. Math. Statist. 17, 208 (1946).
recommend the use of percentile estimates to- 9. Henry, H. J., Clinical Chemistry, Principles and Technics.
gether with nonparametric confidence intervals Hoeber, New York, N.Y., 1964, p 364.
for the true percentile. Confidence intervals pro- 10 Hald, A., Statistical theory with engineering applications.
vide a rigorous probability statement about pre- Wiley, New York, N.Y., 1952, pp 676 and 688.

cision of estimated normal limits. Our recommen- 11. Lilliefors, H. W., On the Kolmogorov-Smirnov test for nor-
mality with mean and variance unknown. J. Amer. Statist. Ass.
dation for the minimum number of samples in 62, 399 (1967).
order to estimate a normal range with accuracy 12. Cochran, W. G., The chi-square test of goodness of fit. Ann.
is n = 120. This is the smallest number of sample Math. St at i.st. 23, 329 (1952).
values that permit 90% confidence intervals for 13. Wilks, S. S., Statistical prediction with special reference to
the endpoints of the normal range. the problem of tolerance limits. Ann. Math. Statist. 13, 400 (1941).
14. Somerville, P. N., Tables for obtaining nonparametric toler-
ance limits. Ann. Math. Statist. 29, 399 (1958).
15. Pearson, E. S. and Hartley, H. 0., Biometrika Tables for
Statisticians, 1, Cambridge University Press, New York, N.Y.,
References 1966, p 150.
16. Anscombe, F. J., Rejection of outliers. Technometrics 2, 123
1. Letters to the editor-Normal ranges and Gaussian distribu- (1960).
tions. Cr.IN. CHEM. 16, 809 (1970).
17. Dixon, W. J., Processing data for outliers. Biometrics 9, 74
2. Letters to the Journal-The “normal” range. J. Amer. Med. (1953).
Ass. 212, 883 (1970). 18. Grubbs, F. E., Procedures for detecting outlying observation
3. Elveback, L. II., Guiller, C. L., and Keating, F. H., Health, in samples. Technometrics 11, 1 (1969).
normality and the ghost of Gauss. J. Amer. Med. Ass. 211, 69 19. Herrera, L., The precision of percentiles in establishing nor-
(1970). mal limits in medicine. J. Lab. Gun. Med. 52, 34 (1958).
4. Mainland, 1)., Normal values in medicine. Ann. N.Y. Aced. 20. Henry, R. J., Improper statistics characterizing the normal
&i. 161, 327 (1969). (See also editorial, this issue, CLIN. cHEM.). range. Amer. J. Gun. Patho!. 34, 326 (1960).

284 CLINICAL CHEMISTRY, Vol. 17, No. 4, 1971

You might also like