You are on page 1of 13

Chemometrics and Intelligent Laboratory Systems 52 Ž2000.

61–73
www.elsevier.comrlocaterchemometrics

Comparison of alternative measurement methods: determination


of the minimal number of measurements required for the
evaluation of the bias by means of interval hypothesis testing
S. Kuttatharmmakul, D.L. Massart, J. Smeyers-Verbeke )
ChemoAC, Pharmaceutical Institute, Vrije UniÕersiteit Brussel, Laarbeeklaan 103, B-1090 Brussel, Belgium
Received 20 March 1999; accepted 4 May 2000

Abstract

The classical approach of hypothesis testing for the detection of bias has a major disadvantage in that the risk of adopting
a method that has an unacceptable bias is not well controlled. To control this risk, interval hypothesis testing was introduced
in method validation wS. Kuttatharmmakul, D.L. Massart, J. Smeyers-Verbeke, Anal. Chim. Acta, 391 Ž1999. 203; C. Hart-
mann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander Heyden, P. Vankeerberghen, D.L. Massart, Anal. Chem., 67 Ž1995.
4491x. However, the application of interval hypothesis testing in the evaluation of bias can lead to the false rejection of a
method, which, in reality, has an acceptable bias. To limit this risk Žof false rejection., an appropriate number of measure-
ments is required. Formulae to determine this sample size are proposed. However, the required number of measurements
depends on the precision of the analytical methodŽs., the bias that analysts are prepared to accept with a high probability and
the risk that one is willing to take of incorrectly rejecting a method that has an acceptable bias. An evaluation of the equa-
tions proposed was undertaken by means of simulations.
The reliability of the formulae proposed depends on the quality of the precision estimates used in the formulae. When the
precision estimates applied correspond well with the true precision parameters, the sample size determined assures that the
risk of incorrectly rejecting the alternative method that has an acceptable bias does not exceed the specified level. When the
true precision is worse than the precision estimates used in the formulae, the probability that a method with an acceptable
bias will be rejected is much higher than the specified level. q 2000 Elsevier Science B.V. All rights reserved.

Keywords: Method comparison; Alternative measurement method; Bias; Repeatability; Time-different intermediate precision; Interval hy-
pothesis testing; Sample size

1. Introduction has been developed with the advantages that it is


cheaper, less time-consuming or simply easier to use
In a laboratory, it often happens that a new analyt- than the existing method Žsubsequently termed refer-
ical method Žsubsequently termed alternative method. ence method.. Before the reference method can be
replaced by the alternative method, it must be
) demonstrated that the results obtained with the alter-
Corresponding author. Tel.: q32-2-477-4737; fax: q32-2-
477-4735.
native method are equivalent to the results of the ref-
E-mail address: asmeyers@vub.vub.ac.be ŽJ. Smeyers- erence method. A comparison of the performance
Verbeke.. Že.g., precision and bias. of both methods has then to

0169-7439r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 Ž 0 0 . 0 0 0 7 9 - 4
62 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

be performed. In Ref. w1x, an approach for the intra- pendent t-test at the a s 5% significance level. The
laboratory comparison of two methods has been pro- statistical hypotheses are:
posed. It considers, among other techniques, interval
hypothesis testing for the evaluation of the bias. H0 : mA s m B , i.e., d s mA y m B s 0 Ž there is no bias.
In a statistical test for the evaluation of bias, the H1 : mA / m B , i.e., d s mA y m B / 0 Ž there is a bias.
usual practice is to test the null hypothesis that there Ž 1.
is no bias, i.e., there is no difference between the
means obtained by the two methods ŽH 0 : mA s m B or
d s mA y m B s 0, where d is the true difference be- where d is the true difference or bias between the
tween the two means, mA and m B . against the alter- population means mA and m B .
native hypothesis that there is a bias, i.e., that the two However, it is improbable, from a chemical point
means are different ŽH 1: mA / m B or d / 0.. How- of view, that the two methods to be compared will
ever, the above hypothesis testing procedure Žwhich yield exactly the same results. In other words, some
is referred to as point hypothesis testing since the bias is always to be expected. The hypotheses to be
significance of a result is tested against a single fixed tested must serve the objective of the statistical anal-
point, e.g., zero. might not be the most appropriate for ysis. Indeed, the question should not be ‘‘Are the
the validation of the method because the risk of means the same?’’ but rather ‘‘Do the means not dif-
adopting a method that, in fact, has an unacceptable fer by more than an acceptable amount?’’ It should
bias is not well controlled, i.e., the risk can be unac- be observed that even if the difference between the
ceptably high. To control this risk, the interval hy- methods is so small as to be irrelevant for the appli-
pothesis testing was introduced in method validation cation, the classical t-test approach may detect it as
w1,2x. statistically significant, if the sample size is large
In this paper, attention focuses on sample size de- enough. On the other hand, an important difference
termination for the evaluation of the bias from the between the means of the two methods Ži.e., a large
comparison of alternative measurement methods by bias. can be declared to be nonsignificant by a clas-
means of interval hypothesis testing. In addition, sical t-test if the measurement precision is bad
through the selection of an appropriate sample size, andror the sample size is small w2x. This means that
the risk of rejecting a method that has an acceptable the risk of accepting the null hypothesis when that
bias is limited, i.e., it does not exceed a given allow- hypothesis is false Ži.e., the b error. is not well con-
able risk. trolled. In terms of the problems addressed within this
paper, this implies that the risk that a method that has
an unacceptable bias will be adopted is unacceptably
high.
Therefore, if one considers that it is really impor-
tant not to conclude that there is no bias when, in fact,
2. Interval hypothesis testing there is one, the null hypothesis described in Eq. Ž1.
is not appropriate. It has been shown w2x that in order
Interval hypothesis testing has become the stan- to limit the risk of erroneously accepting a method
dard approach in bioequivalence studies w3–5x and that is too highly biased, the statistical hypotheses
has also been proposed for drug stability tests w6x. It have to be formulated in the following way:
has been introduced in method validation for the
evaluation of the bias of an analytical method w1,2x. H0 : mA y m B F yl or mA y m B G l
Although the interval hypothesis testing procedure Ž there is a Ž relevant. bias.
has been explained in these articles, the principles and Ž 2.
H1 : yl - mA y m B - l
main advantages of this approach compared to the
classical t-test are summarized below. Ž there is no Ž relevant. bias.
The classical way of comparing the means ob-
tained by two methods A and B is to apply an inde- where l is the acceptable bias.
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 63

These hypotheses are called interval hypotheses. ceptable when, in fact, it is Ži.e., the b error of the
The acceptance of the null-hypothesis ŽH 0 . now leads interval hypothesis test.. To control the b error, an
to the conclusion that the bias is not acceptable while appropriate sample size requires to be determined and
acceptance of the alternative hypothesis ŽH 1 . leads to used in the statistical test. In this paper, formulae to
the conclusion that the bias is acceptable. Therefore, estimate the sample size required to control the prob-
compared to the classical hypotheses specified in Eq. ability b of incorrectly rejecting an alternative
Ž1. the null and alternative hypotheses have been re- method with an acceptable bias d Ž d - l. are pro-
versed. Consequently, the risk of erroneously accept- posed and evaluated by means of simulation. They are
ing a method that is too highly biased Žthe b error of based on similar work performed for bioequivalence
point hypothesis testing. now becomes the a error of studies w7x in which only within-subject variance
interval hypothesis testing and, thus, it is limited to Žcomparable to the repeatability or within-day vari-
a % through the significance level a applied in the ance in the bias evaluation. is considered. Evaluation
test. of the bias also requires the consideration of interme-
To carry out the significance test, the interval hy- diate precision conditions, e.g., time-different inter-
potheses of Eq. Ž2. are decomposed into two one- mediate precision condition w8x. This increases the
sided hypotheses, which are then tested by means of complexity of the problem and is included in the ap-
a one-sided t-test at the significance level a w2,5x: proach proposed here.

H 0Ž1. : mA y m B F yl
H 1Ž1. : mA y m B ) yl
3. Experimental setup for the evaluation of the
bias of an analytical method by means of compar-
and: ison with a reference method and symbols used

H 0Ž2. : mA y m B G l
Ž 3. To understand how the formulae for the sample
H 1Ž2. : mA y m B - l size determination are derived, it is important to de-
scribe the experimental setup for the evaluation of the
Rejection of both null hypotheses, H 0Ž1. and H 0Ž2. , bias of an analytical method by means of comparison
leads to the conclusion that the bias of the alternative with a reference method.
method is acceptable. Since H 0Ž1. is rejected when the For the evaluation of the bias, repeated measure-
one-sided Ž1 y a . 100% lower confidence limit for ments on the same sample are performed with the
mA y m B Žwhich also corresponds to the lower limit reference method Žmethod A., as well as with the al-
of the two-sided Ž1 y 2 a . 100% confidence interval. ternative method Žmethod B.. Measurements are ob-
is larger than the lower acceptance limit, yl, and tained under repeatability conditions, if one operator
H 0Ž2. is rejected when the one-sided Ž1 y a . 100% performs nA measurements using method A on the
upper confidence limit Žs the upper limit of the same instrument within a short time interval Že.g.,
two-sided Ž1 y 2 a . 100% confidence interval. is within one run or 1 day., a similar procedure will be
smaller than the upper acceptance limit, l, it follows adopted, i.e., for method B, n B measurements are
that the bias is considered acceptable at the a s 5% performed by one operator on the same instrument
significance level if the two-sided Ž1 y 2 a . 100 s within a short time interval.
90% confidence interval for mA y m B is completely If other possible sources of variation, such as the
contained in the acceptance interval Žyl, l.. time effect, have to be taken into account measure-
As a consequence of reversing the null and alter- ments are performed under time-different intermedi-
native hypothesis, the risk one runs to conclude that ate precision conditions. This means that for method
the bias is acceptable when, in fact, it is not Ži.e., the A and method B, measurements are performed dur-
a error of the interval hypothesis test. is now con- ing pA and p B days, respectively. Moreover, each
trolled. However, there is no a priori knowledge about day nA replicate measurements, under repeatability
the risk one runs to conclude that the bias is not ac- conditions, are obtained with method A, and n B
64 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

replicate measurements, under repeatability condi- with:


tions, are obtained with method B. This experimental
setup allows an evaluation of the time-different inter-
Ž pA y 1 . s 2yA q Ž p B y 1 . s 2y B
sp2 s Ž 6.
mediate precision, as well as that of the repeatability pA q p B y 2
for both methods w1x:
and; s 2yA : the variance of the daily means of method
A; s 2y B : the variance of the daily means of method B.
sr2A : repeatability of method A It can be shown w1,8x that the variance of the daily
means for each method can be expressed as a func-
tion of the repeatability and the time-different inter-
sr2B : repeatability of method B
mediate precision as follows:

ž /
2
s IŽT. : time-different intermediate precision of 1
A
method A s 2yA s s IŽT
2
.A y 1 y sr2A , Ž 7.
2
nA
s IŽT. B
: time-different intermediate precision of
method B
s 2y B s s IŽT
2
ž
.B y 1 y
1
nB / sr2B . Ž 8.

The evaluation of the bias is performed by a com-


parison of the means obtained by both methods with
an interval hypothesis test. As explained earlier, this 4. Formulae to estimate the minimum number of
is done Žfor a s 0.05. by comparing the 90% confi- measurements required
dence interval for mA y m B with the acceptance in-
terval Žyl, l.. If the measurements are performed The sample size required will depend, among other
under time-different intermediate precision condi- issues, on the risk b the analyst is willing to run to
tions as specified earlier, the confidence interval for conclude that the bias is not acceptable if, in reality,
mA y m B is obtained as follows: the bias is equal to a specified value d , less than the
acceptable bias "l. To explain how the formulae are
žy A y yB / yt a sd F mA y m B F žy A y yB / qt a sd , derived, the evaluation of the bias under repeatability
Ž 4. conditions as described earlier is first considered.
Fig. 1 represents the distribution of the estimated
where; yA : grand mean Žs mean of the daily means. bias around the true bias d s mA y m B . If the true
for method A s Ý is1pA
yi A rpA ; yi A : daily mean for the bias d is zero, the probability b of incorrectly con-
ith day Žmethod A. s Ý njs1 A yi jA rnA ; yi jA : jth repli- cluding that the bias is not acceptable and, therefore,
cate measurement on ith day Žmethod A.; y B : grand the probability that the alternative method will be re-
mean Žs mean of the daily means. for method B is jected originates from the incorrect acceptance of
obtained in the same way as yA by replacing yi A , yi jA , H 0Ž1. : mA y m B F yl or H 0Ž2. : mA y m B G l, both
pA and nA with yi B , yi j B , p B and n B , respectively. with a probability equal to br2 Žsee Fig. 1a.. It fol-
ta : the one-sided tabulated t-value at a s 0.05 and lows from the figure that:
the degrees of freedom Ždf. s Ž PA q P B y 2.. sd : the l s Ž tb r2 q ta . sd . Ž 9.
estimated standard deviation of the difference be-
tween the two means. Since sd , the true standard deviation of the differ-
For measurements performed under time-different ence between the two means, is unknown, sd is used
intermediate precision conditions as specified earlier as an estimate:
sd is given by: l s Ž tb r2 q ta . sd , Ž 10 .

sd s (ž sp2
1
pA
q
1
pB / Ž 5.
where tb r2 and ta are the one-sided tabulated values
of the t-statistic for the significance level br2 and a ,
respectively, with n s Ž nA q n B y 2.; sd is the esti-
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 65

with sp2 the pooled variance estimate for the repeata-


bility variances of the two methods:

Ž nA y 1 . sr2A q Ž n B y 1 . sr2B
sp2 s . Ž 12 .
nA q n B y 2

nA and n B are the number of replicated measure-


ments obtained under repeatability conditions for
methods A Žreference method. and B Žalternative
method., respectively.
Combining Eqs. Ž10. – Ž12. gives:

l s Ž tb r2 q ta .

= ) Ž nA y 1 . sr2A q Ž n B y 1 . sr2B
nA q n B y 2 ž 1
nA
q
1
nB / .

Ž 13 .
To solve the equation for the optimal value of nA
and n B , estimates of the repeatability variance for
both methods are required and values for a and b ,
as well as for l, have to be specified. Values of nA
and n B are then substituted into the equation until the
result obtained from the right-hand side is not larger
than the acceptable bias l.
If one intends to apply the same number of repli-
cates for the two methods being compared, nA s n B
s n, Eq. Ž13. can be rewritten as:

Fig. 1. The distribution of the estimated bias around the true bias,
d s mA y m B . The curves at sides represent the null hypotheses of
the interval hypothesis test as specified in Eq. Ž3., mA y m B Fy l
l s Ž tb r2 q ta . ( sr2A q sr2B
n
. Ž 14 .
or mA y m B G l. The center curve represents alternative hypothe-
ses: a. mA y m B s 0; b. mA y m B s D Ž0 - D - l r2.; c. mA y The minimum number of replicated measurements n
m B s l r2. Alternative hypotheses where mA y m B - 0 are not required to control the risk b of incorrectly rejecting
shown. The black areas correspond to a Ž s 5%., the rejection re- an alternative method, which, in reality, is not biased
gions for the null hypothesis H 0 Žwhen H 0 is indeed true.. Shad- Ži.e., d s 0. can then be obtained by rearranging Eq.
ing represents b , the rejection regions for the alternative hypothe-
Ž14.:
sis H 1 Žwhen H 1 is indeed true..

2 sr2A q sr2B
n s Ž tb r2 q ta . . Ž 15 .
mated standard deviation of the difference between l2
the two means. Eq. Ž10. is based on the t-distribution. Therefore,
For measurements performed under repeatability it assumes that the repeatability variance of both
conditions, sd is obtained as follows: methods is equal Ž sA2 s s B2 .. If the repeatability for

sd s (ž sp2
1
nA
q
1
nB / Ž 11 .
both methods cannot be assumed to be equal Ži.e.,
sA2 / s B2 ., the t-test for the comparison of the means
of two samples with unequal variances has to be ap-
66 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

plied Žsee, e.g., Ref. w10x.. The required sample size ative manner until the result obtained from the
can still be obtained from Eq. Ž15.. The only modifi- right-hand side of Eq. Ž16. is not larger than n. Then,
cation is that the number of df associated with tb r2 n is the minimum number of replicated measure-
and ta is obtained from the Satterthwaite approxima- ments required. A worked example is given in Ap-
tion w11x. pendix A.
Very often, the evaluation of the bias and preci- From Fig. 1, it follows that if the true bias d ap-
sion is performed simultaneously. Therefore, prior proaches the acceptable bias, l the b error Žshaded
knowledge of the precision estimates of the alterna- areas. is no longer evenly divided between the two
tive method B might not be available. Analogously to tails of the distribution around the true bias d . The
what ISO proposes w9x, if sr2B is unknown sr2A is used more d approaches l, the larger the overlap between
as a substitute. In this case, Eq. Ž15. becomes: the distribution around d s D and the distribution
around the acceptable bias l. Simultaneously, the
2 sr2A overlap between the distribution around d s D and
n s 2 Ž tb r2 q ta . . Ž 16 . the distribution around yl decreases. The sum of
l2
both tails Žshaded areas. equals b . Therefore, k b is
The reliability of the formulae of course depends on the largest fraction of b that occurs in one tail of the
the quality of the variance estimates used. distribution around the true bias d ; k is a constant
The solution to Eq. Ž16. for a number of repli- that lies between 0.5 and 1.
cated measurements n, is iterative since the value of Consequently, Eq. Ž10. can be generalized as Žsee
Ž tb r2 q ta . depends on its associated df, n , which is Fig. 1b for illustration.:
also a function of n Ži.e., n s 2 n–2.. An initial value
of n, n init , can be obtained from Eq. Ž16. by replac- l y < d < s Ž t k b q ta . sd Ž 0.5 F k F 1 . . Ž 17 .
ing Ž tb r2 q ta . with the corresponding values from
the standard normal distribution, Ž zb r2 q za .. To find
the minimum number of replicated measurements n, The value of k depends on the bias d of the alterna-
round the value of n init up to the next integer n and tive method that analysts want to accept with a high
calculate the right-hand side of Eq. Ž16. with the probability Žs 1 y b . and the specified probability
value of Ž tb r2 q ta . where n s 2 n y 2. If the result b.
obtained is not larger than n, then n is the minimum Moreover, if the evaluation of the bias is per-
number of measurements required. If it is larger than formed under time-different intermediate precision
n, then increase the value of n by 1 and recalculate conditions as described earlier, the standard devia-
the right-hand side of Eq. Ž16. with the correspond- tion of the difference between the two means, sd , is
ing adjustment of Ž tb r2 q ta .. This is done in an iter- obtained from Eq. Ž5. and, hence, Eq. Ž17. becomes:

1r2

l y < d < s Ž t k b q ta .
Ž pA y 1 . ž s I2Ž T . A y
ž 1y
1
nA / /
sr2A q Ž p B y 1.
ž s I2Ž T . B y
ž 1y
1
nB / /
sr2B

ž 1
q
1
/ .
pA q p B y 2 pA pB

Ž 18 .

If Ži. the number of days during which measurements cates per day, for both methods are planned to be
will be performed, as well as the number of repli- equal Ži.e., pA s p B s p and nA s n B s n. and Žii. the
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 67

precision parameters of the alternative method are With a s 0.05 and b s 0.2, Ž tb r2 q ta . varies be-
unknown but the precision estimates of the reference tween 1.29 q 1.66 s 2.95 Ž n s 100. and 1.40 q 1.86
method are substituted, Eq. Ž18. can be rewritten as: s 3.26 Ž n s 8, i.e., p s 5 days.. As an alternative to
1r2 the iterative process for the determination of the
l y < d < s Ž t k b q ta .
2
p ž s I2Ž T . A y
ž 1y
1
nA / /
sr2A .
number of days p, a constant value of 3.26 could be
used as an approximation for the value of Ž tb r2 q ta ..
This might result in the number of measurements be-
Ž 19 .
ing greater than that which is necessary. Moreover, if
Rearrangement of Eq. Ž19. yields a general for- n s 2 Žwhich is strongly recommended since this will
mula for the number of days during which measure- lead to a balanced design in which the number of df
ments Ž n per day. using both methods are required to associated with the repeatability is almost the same as
be recorded to limit the probability b of incorrectly the number of df associated with the between-day
concluding that the bias d ŽN d N- l. is not accept- component., Eq. Ž21. simplifies to:
able: 1
2
s IŽT .A y ž 1y
1
/ sr2A p G 2 Ž 3.26 .
2
2
s IŽT .A y 1 y ž 2 / sr2A
, Ž 22 .
2 n l2
p G 2 Ž t k b q ta . 2
Ž l y < d <. sr2A
Ž 0.5 F k F 1 . , Ž 20 .
pG
21
l2 ž s I2Ž T . A y
2 / . Ž 23 .

where t kb and ta are the one-sided tabulated t-values ŽSince the constant 3.26 is only an approximation of
at the significance level k b and a , respectively, the the value Ž tb r2 q ta ., 2Ž3.26. 2 Žs 21.26. in Eq. Ž22.,
associated number of df is n s 2 p y 2. it can be further simplified to 21 as shown in Eq.
Since the value of k depends among other factors Ž23...
on the bias d of the alternative method that one wants This can be interpreted as follows. If with both
to accept with high probability, several situations are methods ŽA and B. measurements are performed dur-
considered further, i.e., N d Ns 0, lr4, lr2 and lr2 ing p days with 2 measurements per day, there is a
-N d N- l. probability of not more than 5% Ž100 a %. that an al-
ternative method that, in fact, is too highly biased
4.1. The alternatiÕe method is not biased (i.e., d s ŽN d N) l. will be accepted and, at the same time,
m A y m B s 0) there is a probability of not more than 20% Ž100 b %.
that an alternative method, which, in reality, is not
If the true bias d s 0, the probability b of incor- biased will be rejected.
rectly rejecting the alternative method originates from
the incorrect acceptance of H 0Ž1. : mA y m B F yl or 4.2. The bias of the alternatiÕe method is half the ac-
H 0Ž2. : mA y m B G l both with a probability equal to ceptable bias (N d Ns l r 2)
br2 Žsee Fig. 1a.. Therefore, k is equal to 0.5 and
Eq. Ž20. becomes: If the bias N d Ns lr2, the probability b of incor-
1 rectly rejecting the alternative method originates from

p G 2 Ž tb r2 q ta .
2
2
s IŽT .A y 1 y ž n / sr2A
. Ž 21 .
the incorrect acceptance of either H 0Ž1. : mA y m B F
yl Žif d - 0. or H 0Ž2. : mA y m B G l Žif d ) 0. Žsee
l2 Fig. 1c.. Therefore, k is equal to 1 and Eq. Ž20. be-
The minimal number of days p can be obtained by comes:
an iterative process similar to that described earlier
1
for Eq. Ž16.. However, in Eq. Ž21., the df associated
with the t-values is n s 2 p y 2 and the number of p G 8 Ž ta q tb .
2
2
s IŽT .A y 1 y ž n / sr2A
. Ž 24 .
replicates per day n is specified by the analyst. l2
68 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

With a s 0.05 and b s 0.2, Ž ta q tb . varies be- lr4. Therefore, k can be considered equal to 1 and
tween 1.66 q 0.84 s 2.50 Ž n s 100. and 1.86 q 0.89 Eq. Ž20. becomes:
s 2.75 Ž n s 8, i.e., p s 5 days.; therefore, a conser-
1
vative Žhigh. value of 2.75 can be used as an approx-
imation. Moreover, if n s 2, Eq. Ž24. simplifies to:
p G 2 Ž tb q ta .
2
2
s IŽT .A y 1 y ž n
2
/ sr2A
, Ž 27 .
l
2
s IŽT .A y 1 y ž 1
2 / sr2A
ž ly
4 /
2
p G 8 Ž 2.75 . , Ž 25 . 1

sr2A
l2
p G 32 Ž tb q ta .
2
2
s IŽT .A y 1 y ž n / sr2A
. Ž 28 .
pG
60
l2 ž s I2Ž T . A y
2 / . Ž 26 .
Analogous to the simplification of Eq. Ž24. to Eq.
9l 2

Ž26., if n s 2, Eq. Ž28. reduces to:


ŽSince the constant 2.75 is only an approximation of
the value Ž ta q tb ., 8Ž2.75. 2 Žs 60.5. in Eq. Ž25., it sr2A
can be further simplified to 60 as shown in Eq. Ž26...
This can be interpreted as follows. If with both
pG
c
l2 ž s I2Ž T . A y
2 / , where c s 27. Ž 29 .

methods ŽA and B. measurements are performed dur-


When the specified b is larger than 20%, the
ing p days with two measurements per day, there is
probability b is contributed to by the incorrect ac-
a probability of not more than 5% Ž100 a %. that an
ceptance of both H 0Ž1. : mA y m B F yl and H 0Ž2. :
alternative method that, in fact, is too highly biased
mA y m B G l. Simulations Žresults not shown. indi-
ŽN d N) l. will be accepted and, at the same time,
cate that for k s 0.9, reasonable approximations for
there is a probability of not more than 20% Ž100 b %.
the values of b between 30% and 50% are attained.
that an alternative method for which, in reality, the
With a s 0.05, the minimal number of days can be
bias is half the acceptable bias will be rejected.
obtained from Eq. Ž29. with c s 22, 18 and 14 for
With a s 0.05, the number of days required to
b s 30%, 40% and 50%, respectively.
have 80% Žs 100Ž1 y b .%. probability that an al-
ternative method, which, in reality, has a bias equal
to half the acceptable bias, will be accepted is about 4.4. The alternatiÕe method with the bias l r 2 -N d
three times the number of days required to have the N- l
same probability that an alternative method that is not
biased will be accepted.
Both Eqs. 24 and 26 only apply for b F 0.5. When For b F 50%, the number of measurements re-
b is larger than 0.5, the term Ž ta q tb . in Eq. Ž24. has quired can be obtained from:
to be replaced by Ž ta y t 1ykb .. However, since it 1
makes no sense to allow a probability larger than 50%
to incorrectly reject an alternative method with an p G 2 Ž zb q za .
2
2
s IŽT .A y 1 y ž n
2
/ sr2A
, Ž 30 .
acceptable bias, this case will not be pursued further. Ž l y < d <.
where za and zb are the one-sided tabulated z-val-
ues at the significance level a and b , respectively.
4.3. The alternatiÕe method has a bias N d Ns l r 4
Analogous to Eq. Ž24., Eq. Ž30. is derived from
Eq. Ž20. by setting k to 1. The formula only applies
When the specified b is not larger than 20%, the for b F 0.5.
probability b is mainly located in one tail, either in The number of measurements required to accept,
the left tail of the sampling distribution for d s with a high probability 1 y b , an alternative method,
ylr4 or in the right tail of the distribution for d s which, in reality, has a bias larger than half the ac-
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 69

ceptable bias will generally be large. Therefore, the Table 1


one-sided tabulated t-value in Eq. Ž20. can be re- Parameters applied in the simulations
placed by the corresponding z-value from the stan- Case study 1 ls 3
dard normal distribution as it appears in Eq. Ž30.. Reference method srAs1.50, s DAs1.50, s IŽT.As 2.12,
n A s 2, mA s100
Alternative method sr B s1.50, s D B s1.50, s IŽT. B s 2.12,
n B s 2, m B s100q d
5. Formulae evaluation
Case study 2
Reference method srAs6, s DAs10, s IŽT.As11.66,
The reliability of the proposed formulae ŽEqs. Ž21., nA s 2, mA s100
Ž23., Ž24., Ž26., Ž28. – Ž30.. in the estimation of the Alternative method sr B s6, s D B s10, s IŽT. B s11.66,
optimal sample size, required to control the probabil- n B s 2, m B s100q d
Ž2.1. l s 20 Ž2.2. l s 12
ity b of incorrectly concluding that the bias is not
acceptable, was checked by means of computer sim- Case study 3 ls 3
ulations using Matlab 4.0 ŽThe MathWorks, Natick, Reference method srAs1.50, s DAs1.50, s IŽT.As 2.12,
MA, USA. as the programming environment. Test n A s 2, mA s100
Ž3.1. Alternative method: sr s 2, s D s 2, s IŽT. s 2.83,
results for the two analytical methods being com- B B
n B s 2, m B s100q d
B

pared, with a sample size as derived from the formu- Ž3.2. Alternative method: sr s 3, s D s 3, s IŽT. s 4.24,
B B B
lae, were generated Žsee Section 5.1.. The b error of n B s 2, m B s100q d
incorrectly rejecting the method, which, in fact, has
m s true value of the test result; l s acceptable bias; d s true
an acceptable bias was then determined Žsee Section
value of the bias Žfour values are considered, i.e., 0, l r4, l r2,
5.2.. The reliability of the formulae in the estimation 3 l r4.; sr s repeatability standard deviation; s D s between-day
of the optimal sample size was judged from the standard deviation; s IŽT. s time-different intermediate standard
2 2 2.
agreement of the b error obtained from the simula- deviation Ž s IŽT . s sr q s D .
tions and the b specified in the formulae Žsee Sec-
tion 5.3..
acceptable bias Ž l. are considered. In case study 2.1,
l is about two times s IŽT. while in case study 2.2, l
5.1. Data simulation is of the same magnitude as s IŽT. .
The reliability of the proposed formulae also de-
The parameters for the simulation are given in pends on the quality of the precision estimates ap-
Table 1. Since the b error depends on the precision plied in the calculation. The influence on the b error
of the methods to be compared and on the acceptable of precision estimates, which do not correspond with
bias and the true bias, three case studies with differ- the true precision parameters, is investigated in case
ent values of these parameters were considered. study 3. Case study 3.1 considers a small deviation
Case study 1 represents the ideal situation where of the precision estimates from the true values, i.e.,
the measurement variances of both the reference the precision of the alternative method is considered
method and the alternative method are equal and to be slightly worse than that of the reference method.
rather low Ži.e., repeatability Ž% rsd. s 1.5, between- ŽRecall that in the formulae, the precision estimates
day precision Ž% rsd. s 1.5, time-different interme- of the reference method are considered to apply to the
diate precision Ž% rsd. s 2.1.. The acceptable bias l alternative method, see Section 4.. In case study 3.2,
is about 1.5 times the time-different intermediate a larger deviation is investigated, i.e., the precision
precision standard deviation Ž s IŽT. .. In every case standard deviations of the alternative method are
study four values of the true bias d are considered, twice those of the reference method.
i.e., 0, lr4, lr2, 3 lr4. For each situation considered, 10,000 pairs of data
Case study 2 is given as an example of less pre- sets were simulated Žeach pair is for the reference
cise data Ži.e., repeatability Ž% rsd. s 6, between-day method and the alternative method.. Each data set
precision Ž% rsd. s 10, time-different intermediate consists of p times n test results, where the number
precision Ž% rsd. s 11.7.. Two magnitudes of the of days p is determined from the proposed formulae
70 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

Žsee Table 2. and the number of replicates per day n ror is obtained as the percentage of data sets for which
equals 2. The results for each data set were generated the null hypothesis is accepted and, thus, this leads to
from Eq. Ž31. using the RANDN function in Matlab the conclusion that the alternative method is not ac-
4.0 w12x, i.e.: ceptable.
yi j s m q d i q e i j , Ž 31 . For case study 3, the results were also evaluated
by the Cochran test w13x. The Cochran test was only
where yi j is the test result related to the jth replicate applied for the pairs of data sets for which a different
of the ith day; i s 1, 2, . . . , p and j s 1, 2; m is the variance was detected by a two-sided F-test at the
true value of the test result, i.e., 100 and Ž100 q d . significance level a s 0.05.
for methods A and B, respectively, with d s true bias
of which four values are considered, i.e., 0, lr4, lr2
and 3 lr4; d i is the random day effect for the ith day; 5.3. Results and discussions
d i is assumed to have a normal distribution with mean
zero and variance s D2 , i.e., d i ; NŽ0, s D2 . and s IŽT.
2
Table 3 compares the b error obtained from the
2 2 2 2 2
s sr q s D with s D , sr , and s IŽT. the true values simulations with the b error specified for the sample
of the between-day variance component; the repeata- size determinations.
bility variance and the time-different intermediate For case studies 1 and 2, it follows from Tables 2
precision Žvariance., respectively; e i j is the random and 3 that the number of days p estimated from the
error under repeatability conditions for the ith day iterative Žor exact. formulae Ži.e., ŽEqs. Ž21., Ž24.,
and the jth replicate; e i j ; N Ž0, sr 2 .. Ž28., Ž30.. corresponds to the smallest number neces-
sary to ensure that the probability of incorrectly re-
5.2. Determination of b error jecting the alternative method with an acceptable bias
does not exceed the specified b . High imprecision of
The interval hypothesis testing procedure was per- the analytical methods, as well as a Žrelatively. small
formed as described earlier Žsee Eq. Ž2... The b er- acceptable bias l, Že.g., case study 2.2. does not im-

Table 2
The minimum number of days p Žcalculated by the proposed formulae. used in the data simulation for the case studies described in Table 1
Formulae True bias Specified The minimum number of days p
Žequations. Žd . b error Ž%. Case 1, 3 a Ž l s 3. Case 2.1 Ž l s 20. Case 2.2 Ž l s 12.
21 0 20 8 6 15
23 b 0 20 8 7 18
28 lr4 20 10 8 19
lr4 30 8 7 16
lr4 40 7 6 13
lr4 50 6 5 11
29 b lr4 20 11 8 23
lr4 30 9 7 19
lr4 40 7 6 15
lr4 50 6 5 12
24 lr2 20 20 16 42
26 b lr2 20 23 18 50
30 3 lr4 50 33 26 71

The specified a is 0.05.


a
For case study 1, 3.1 and 3.2, the minimum number of days p determined by the formulae are the same since the precision of the
alternative method B is supposed not to be available before the method comparison experiment is performed and therefore only the precision
parameters of the reference method A are used in the formulae. The precision parameters of the alternative method B Žgiven in Table 1. are
only used for the data simulation.
b
These are the approximate formulae.
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 71

Table 3
Beta error obtained from the simulations applying the sample size ŽŽ p y 1. and p . as given in Table 2
Formulae True bias Specified b error Ž%. obtained from the simulations
Žequations. Žd . b error Ž%. Case 1 Case 2.1 Case 2.2 Case 3.1 Case 3.2
p py1 p py1 p py1 p p
21 0 20 15 22 19 31 19 22 33 74
23 a 0 20 15 22 11 19 11 13 33 74
28 lr4 20 16 21 16 22 19 23 29 66
lr4 30 26 33 22 31 27 30 42 76
lr4 40 33 42 31 41 38 43 51 82
lr4 50 42 54 41 56 48 54 61 87
29 a lr4 20 13 16 16 22 14 15 25 58
lr4 30 21 26 22 31 20 23 36 69
lr4 40 33 42 31 41 30 34 51 82
lr4 50 42 54 41 56 43 48 61 87
24 lr2 20 19 21 18 21 19 21 31 53
26 a lr2 20 14 15 14 17 14 14 26 48
30 3 lr4 50 50 51 50 51 50 51 61 73

The significance level a s 0.05 is applied for all situations.


a
These are the approximate formulae.

pair the efficiency of the formulae in the estimation For case study 3, where the true variances are
of the minimum number of days required for the bias larger than the variance estimates applied in the for-
evaluation provided that the precision estimates ap- mulae, it can be seen from Table 3 that the b errors
plied in the formulae correspond well with the true obtained from the simulations based on the sample
precision parameters. For all different values of the size p determined by the formulae are much larger
true bias considered Ž d s 0, lr4, lr2, 3 lr4., the than the specified b . For example, with case study
reliability of the iterativerexact formulae is good. 3.1, where the precision of the alternative method, in
Comparison of the sample size p estimated from fact, is only slightly worse than that of the reference
the approximate formulae ŽEqs. Ž23., Ž26. and Ž29.. method, the b errors obtained from the simulations
and the exact formulae ŽEqs. Ž21., Ž24. and Ž28.. re- based on the sample size p determined by Eq. Ž21.
veals Žsee Table 2. that there is a good agreement Ži.e., p s 8. and Eq. Ž24. Ži.e., p s 20. are 33% and
when p is not too large, i.e., p is smaller than about 31%, respectively. For case study 3.2, where the pre-
23. The approximate formulae overestimate the re- cision standard deviations of the alternative method
quired sample size somewhat especially if p be- are, in fact, twice those of the reference method, the
comes large, i.e., p is larger than about 50. For ex- b errors corresponding to Eqs. Ž21. and Ž24. become
ample, in case study 2.2 with d s lr2, the number 74% and 53%, respectively.
of measurement days p obtained from Eq. Ž24. is es-
timated to be 42, whereas the approximate formula
ŽEq. Ž26.. estimates p to be 50. This is due to the fact 6. Conclusion
that the approximate formulae include t-values with
small df Ži.e., small sample sizes p .. The larger p The article describes the determination of the min-
estimated from the approximations resulted in b er- imum number of measurements Žsample size deter-
rors, which are considerably smaller than specified mination. required to limit the b error of the interval
Ž14% compared to 19% for the example mentioned.. hypothesis testing in the bias evaluation by means of
However, since it is very unlikely that measurements the comparison of alternative measurement methods.
will be performed during more than 50 days, the ap- Compared to the classical point hypothesis testing
proximate formulae can be considered useful in prac- approach, null and alternative hypotheses are re-
tice. versed in the interval hypothesis testing procedure.
72 S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73

The b error, therefore, corresponds to the risk of in- analytical methodŽs. is worse than that used in the
correctly rejecting of an alternative method, which, in determination of the number of measurements, addi-
reality, has an acceptable bias. The proposed ap- tional measurements can be performed. The number
proach considers repeatability and time-different in- of additional measurements to be performed can be
termediate precision conditions but can easily be determined by the proposed formulae using the new
adapted to other precision conditions, e.g., operator- precision estimates.
instrument time-different intermediate precision w1,8x
or reproducibility conditions w9x, by incorporating the
appropriate precision measures in the calculation of Acknowledgements
the sample size. For example, if measurements are
performed under reproducibility conditions that im-
The authors thank the reviewer for the useful
ply that different laboratories participate in the com-
comments. This work has received financial support
parison as described in Section 8 of ISO-5725 w9x, the
from the European Commission ŽStandards, Mea-
proposed formulae ŽEqs. Ž21., Ž23., Ž24., Ž26., Ž28. –
Ž30.. can still be applied with the replacement of surements and Testing Programme Contract SMT4-
CT95-2031. and the Belgian government ŽThe Prime
time-different intermediate precision variance Žs 2IŽT. .
Minister Services- Federal Office for Scientific,
by the reproducibility variance. The required number
Technical and Cultural Affairs, Standardisation Pro-
of days p becomes then the required number of par-
gramme Research Contract NOr03r003..
ticipating laboratories.
The sample size required is a function of the mea-
surement precision, the significance level a , the risk
b one is willing to take to reject a method, which, in Appendix A
fact, has an acceptable bias and the magnitude of the
true bias d ŽN d N- l. that one wants to accept with a Example: The analyst wants to determine the
high probability 1 y b . To simplify the calculations, number of measurements required for the bias evalu-
the precision variances of the two analytical methods ation of a newly developed analytical method for
being compared are assumed here to be equal and the Penicillin Tablets. The experiment is performed un-
same number of measurements is foreseen for both der repeatability conditions. The results are ex-
analytical methods. The formulae can, however, be pressed as a percentage of the labelled amount on the
generalized to handle cases where different numbers tablet Ž%.. From previous experiments, an estimate of
of measurements are to be used for both analytical the repeatability standard deviation for the reference
methods, if an estimate of the precision of both method is available: s rA s 1.5%. Since no prior
methods is available. knowledge of the precision of the alternative method
From the simulations, it can be concluded that the is available, sr2B is considered to be equal to sr2A . The
proposed formulae approximate the necessary sam- acceptable bias l is considered to be 3%. The statis-
ple size well. The reliability of the proposed formu- tical test is performed at the significance level of a
lae mainly depends on the quality of the precision es- s 0.05. If, in reality, the alternative method is not
timates used. Simulations have shown that when the biased Ž d s 0., the analyst wants to limit the proba-
precision estimates correspond well with the true bility that the method will be rejected to 20%, i.e.,
precision parameters, the sample size determined with b s 0.2.
the proposed formulae is efficient in limiting the risk Ži. Calculate n init by replacing Ž tb r2 q ta . in Eq.
b Žof incorrectly concluding that the bias is not ac- Ž16. with Žzb r2 q z a .:
ceptable when, in fact, it is. not to exceed the speci-
fied level. When the true precision is poorer than the 2
sr2A 2
1.5 2
n i n i t s 2 Ž zb r2 q za . 2
s2 Ž z 0.1 q z 0.05 .
precision estimates used in the formulae, the proba- l 32
bility b of rejecting a method, which, in fact, has an
acceptable bias can be larger than expected. If from 2
1.5 2
s 2 Ž 1.28 q 1.65 . s 4.29.
the measurements one notices that the precision of the 32
S. Kuttatharmmakul et al.r Chemometrics and Intelligent Laboratory Systems 52 (2000) 61–73 73

Žii. Round n init up to the next integer, n. Thus, n same time, there is a probability of not more than 20%
s 5. that a new method, which, in reality, is not biased will
Žiii. Calculate the right-hand side of Eq. Ž16. with be rejected.
the value of Ž tb r2 q ta . at n s 2 n y 2:
at ns Ž 2 = 5 . y 2 s 8,
Ž t 0.1 q t 0 .05 . s Ž 1.40 q 1.86 . References
s 3.26.
Therefore: w1x S. Kuttatharmmakul, D.L. Massart, J. Smeyers-Verbeke,
Anal. Chim. Acta 391 Ž1999. 203.
2 sr2A 2
1.5 2 w2x C. Hartmann, J. Smeyers-Verbeke, W. Penninckx, Y. Vander
2 Ž tb r2 q ta . s 2 Ž 3.26 .
s 5.31. Heyden, P. Vankeerberghen, D.L. Massart, Anal. Chem. 67
l2 32 Ž1995. 4491.
Živ. Since 5.31 is larger than n s 5, increase the w3x V.W. Steinijans, D. Hauschke, Int. J. Clin. Pharmacol., Ther.
value of n to 5 q 1 s 6 and adjust the value of Ž tb r2 Toxicol. 30 Ž1990. S45.
w4x V.W. Steinijans, D. Hauschke, Clin. Res. Reg. Affairs 10
q ta . accordingly: Ž1993. 203.
at ns Ž 2 = 6 . y 2 s 10, w5x D.J. Schuirmann, J. Pharmacokinet. Biopharm. 15 Ž1987. 657.
w6x U. Timm, M. Wall, D. Dell, J. Pharm. Sci. 74 Ž1985. 972.
Ž t 0.1 q t 0 .05 . s Ž 1.37 q 1.81 . w7x D.J. Schuirmann, Drug Inf. J. 24 Ž1990. 315.
w8x International Standard, Accuracy ŽTrueness and Precision. of
s 3.18.
Measurement Methods and Results, ISO 5725-3, Geneva,
Žv. Recalculate the right-hand side of Eq. Ž16.: 1994.
w9x International Standard, Accuracy ŽTrueness and Precision. of
2 sr2A 2
1.5 2 Measurement Methods and Results, ISO 5725-6, Geneva,
2 Ž tb r2 q ta . s 2 Ž 3.18 .
s 5.06.
l2 32 1994.
w10x D.C. Montgomery, Design and Analysis of Experiments, 4th
Since 5.06 is not larger than n s 6, it can be con- edn., Wiley, New York, 1997, pp. 41–45.
cluded that the minimum number of replicated mea- w11x F.E. Satterthwaite, Biomed. Bull. 2 Ž1946. 110.
surements required for each method is 6. w12x Matlab Reference Guide, The MathWorks, Natick, MA, USA,
Therefore, if six measurements are performed with 1992, pp. 402–403.
w13x D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De
both methods ŽA and B., there is a probability of not Jong, P.J. Lewi, J. Smeyers-Verbeke, Handbook of Chemo-
more than 5% that a new method that, in fact, is too metrics and Qualimetrics: Part A, Elsevier, Amsterdam, 1997,
highly biased ŽN d N) 3%. will be accepted and, at the p. 97.

You might also like