Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
Alex Tytun
Office of Biostatisticsand Research, New York City Departmentof Health, 125 Worth
Street, New York, New York 10013, U.S.A.
and
Hans K. Ury
Medical Methods Research Department,The PermanenteMedical Group, 3700 Broad-
way, Oakland,California94611, U.S.A.
SUMMARY
A simple approximationis provided to the formula for the sample sizes needed to detect a
difference between two binomial probabilitieswith specified significancelevel and power. The
formulafor equal sample sizes was derived by Casagrande,Pike and Smith (1978, Biometrics 34,
483-486) and can be easily generalizedto the case of unequal sample sizes. It is shown that over
fairlywide rangesof parametervalues and ratios of samplesizes, the percentageerrorwhichresults
from using the approximationis no greaterthan 1%. The approximationis especiallyuseful for the
inverse problemof estimatingpower when the sample sizes are given.
where P=2(P1+P2) and Q- 1-P. Now (1) is the formula for the sample size in each
group, that would be derived by analyzing the classic critical ratio test without the
continuitycorrection(see, e.g., Fleiss, 1973). CPS showed that the correctedformula
for the sample size per group provides an excellent approximationto the sample size
obtainedby an exact analysisof power (Bennett and Hsu, 1960; Haseman, 1978).
Suppose that considerationsof relative cost or other factors make it desirableto select
samplesof unequal size from the two populations.Let the requiredsample size from the
firstpopulation be denoted by m, and that from the second by rm (0<r<Cc), with r
specifiedin advance. The total sample size is, say, N= (r+ l)m.
As noted by H. K. Ury, in a technical report of the Permanente Medical Group,
Oakland,1978, a simple modificationof the CPS development leads to the value
as the approximatesample size from the firstpopulation,and rwlas that from the second,
which are required to assure a power of 1- {3 against the alternativeP1<P2, where
P=(Pl+rP2)/(r+l) and Q=1-P. Formula (3) agrees closely with one derived from
other principlesby Ury. Note that (2) is a special case of (3), and (1) of (4), when r= 1.
The analysisthat follows will thereforebe for the generalcase of possiblyunequalsample
SiZES.
To a remarkabledegree of accuracy,m is approximatelyequal to m8, where
m4 = n>'+ (r + 1)/r6. (5)
Define x = 2(r + 1)/rm'6,so that the proportionatedifferencebetween m and mt is, say,
Note that
and R'(x) > 0 for all x > 0. Providedthat m'6 ¢ 4(r + 1)/r, x S 0.50 and R(x) > 0.01. Thus,
for moderately large values of m' (say, m'>120), moderately large values of 6 (say,
b>0.1) and sample sizes that are not too disproportionate(say, 0.50sr<2), the use of
the simpler expressionin (5) results in a percentage error no greater than 1%.
When one is confrontedwith the inverse problem of estimatingpower for prespecified
sample sizes, (5) is far simplerto manipulatethan (3). Suppose that a one-tailed test with
significancelevel (x is to be performed,and suppose also that there is interestin detecting
a difference between P1 and P2>P17 that N is the available total sample size and that
m8=N/(r+1) is the size of the sample from the first population. Equations (4) and (5)
combine to yield
required, (6) may be used for a wider range of values of r (e.g. 0.33s rs3) than that in
which (3) and (5) agree well.
Suppose, for example, that (x= 0S05,that the probabilitiesP1= 0.15 and P2= 0.25 are
considered sufficientlydifferentto warrantrejecting the hypothesisof no difference,and
that a total samplesize of 360 is available,then Table 1 gives the value of Zf3from (6) and
the correspondingapproximatepower for several values of r. Note the asymmetryin the
table: for example, sample sizes of 270 and 90 from the first and second populations
(correspondingto r- 0.33) yield an approximatepower of 0.63, whereas sample sizes of
90 and 270 (correspondingto r= 3) yield an approximatepower of 0.58. Other things
being equal, power is increased when relatively more observations are taken from the
populationwhose underlyingprobabilityis furtherfrom 0.50.
Table 1
Approximatepowersfor detect-
ing a differencebetween P,=
0.15 and P2= 0.25 usinga one-
sided significance test with a
total sample size of 360 and b7
significclncelevel of 0.05
r? Zf3 Power
0.33 0.32 0.63
0.50 0.49 0.69
- 1 0.60 0.73
2 0.41 0.66
3 0.19 0.58
t r is the ratio of the sample
size from the second population
to that from the first.
The power values in Table 1 agree to two decimal places with those obtained by
inverting (4) and (3), which must be done by trial and error or iteratively. For more
extreme values of r, the discrepancyincreases and may be unacceptablylarge.
If a two-tailed test with significancelevel cz is performed,z,, must be replaced by Zc,/2
and 6 must be redefinedas |P2-P11
ACKNOWLEDGEMENT
This research was supported in part by a grant from the National Institute of Dental
Research.
RE S U M E
On fournit une approximationsimple a la formule necessaire pour determiner des tailles
d'echantillonsdestinees a tester la differenceentre deux probabilitesbinomiales avec niveau de
significationet puissance fixees. La formule des tailles egales d'echantillon a ete obtenue par
Casagrande,Pike et Smith(1978, Biometrics347483-486) et elle se generalisefacilementau cas de
tailles inegales. On montreque pour une large plage des valeursdes parametreset des rapportsde
tailles d.7echantillon,le pourcentage d'erreur en utilisant l'approximationne depasse pas 1%.
L'approximationest particulierementutile pour le probleme inverse d'estimationde la puissance
quand les d'echantillonsont donnees.
REFERENCES
Bennett, B. M. and HSU? P. (1960). On the power futlctioll of the exact test for the 2 x 2 contingency
table. Bio1xletrika47, 393-398.
Casagrande, J. T., Pike, M. C. and Smith, P. G. (1978). An improved approximate formula for
calculating sample sizes for comparing two binomial distributions. Biometrics 34, 483-486.
Fleiss, J. L. (1973). Statistical Met1lods for Rates cl1ldProportiopls.New York: Wiley.
Haseman, J. K. (1978). Exact sample sizes for vIse with the Fisher-Irwin test for 2 x 2 tables.
Bio1netrics 34, 106-109.