You are on page 1of 5

A Simple Approximation for Calculating Sample Sizes for Comparing Independent Proportions

Author(s): Joseph L. Fleiss, Alex Tytun and Hans K. Ury


Source: Biometrics, Vol. 36, No. 2 (Jun., 1980), pp. 343-346
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2529990 .
Accessed: 10/12/2014 14:38

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.

http://www.jstor.org

This content downloaded from 128.235.251.160 on Wed, 10 Dec 2014 14:38:51 PM


All use subject to JSTOR Terms and Conditions
BIOMETRICS 36, 343-346
JUne1980

A Simple Approximation for Calculating Sample Sizes for


Comparing Independent Proportions
loseph L. Fleiss
Division of Biostatistics,Columbia University School of Public Health, 600 West 168
Street, New York, New York 10032, U.S.A.

Alex Tytun
Office of Biostatisticsand Research, New York City Departmentof Health, 125 Worth
Street, New York, New York 10013, U.S.A.
and
Hans K. Ury
Medical Methods Research Department,The PermanenteMedical Group, 3700 Broad-
way, Oakland,California94611, U.S.A.

SUMMARY
A simple approximationis provided to the formula for the sample sizes needed to detect a
difference between two binomial probabilitieswith specified significancelevel and power. The
formulafor equal sample sizes was derived by Casagrande,Pike and Smith (1978, Biometrics 34,
483-486) and can be easily generalizedto the case of unequal sample sizes. It is shown that over
fairlywide rangesof parametervalues and ratios of samplesizes, the percentageerrorwhichresults
from using the approximationis no greaterthan 1%. The approximationis especiallyuseful for the
inverse problemof estimatingpower when the sample sizes are given.

Suppose a study is being planned to compare two binomial probabilitiesin independent


samples of equal size so that, if (x is the significancelevel for a one-tailed test and if
Pl<P2 are the two underlyingprobabilities,then 1-13should be the power of the test.
Casagrande,Pike and Smith [CPS] (1978) presented an approximateformula for the
sample size in each of the two samples requiredto achieve the desired power.
Define zp to be the upper 100(1-p) percentile of the standardnormal distribution,
define 6 = P2- P1 and define
n/ = {ZA
o/(2PQ)+ Z,B
A/(P1
01 + P2Q2)} ( 1)

where P=2(P1+P2) and Q- 1-P. Now (1) is the formula for the sample size in each
group, that would be derived by analyzing the classic critical ratio test without the
continuitycorrection(see, e.g., Fleiss, 1973). CPS showed that the correctedformula

n = 4 {1 + 4(1 + n'5 )} (2)

Key words:Sample sizes; Fourfoldtables; Power.


343

This content downloaded from 128.235.251.160 on Wed, 10 Dec 2014 14:38:51 PM


All use subject to JSTOR Terms and Conditions
344 BioPrletrics, June 1980

for the sample size per group provides an excellent approximationto the sample size
obtainedby an exact analysisof power (Bennett and Hsu, 1960; Haseman, 1978).
Suppose that considerationsof relative cost or other factors make it desirableto select
samplesof unequal size from the two populations.Let the requiredsample size from the
firstpopulation be denoted by m, and that from the second by rm (0<r<Cc), with r
specifiedin advance. The total sample size is, say, N= (r+ l)m.
As noted by H. K. Ury, in a technical report of the Permanente Medical Group,
Oakland,1978, a simple modificationof the CPS development leads to the value

n = m | 1+ j{1 + ( ,, )}] (3)

as the approximatesample size from the firstpopulation,and rwlas that from the second,
which are required to assure a power of 1- {3 against the alternativeP1<P2, where

, [z >/{(r + l )PQ}+ z 8/(rPtQI+ P2Q2)]2 (4)

P=(Pl+rP2)/(r+l) and Q=1-P. Formula (3) agrees closely with one derived from
other principlesby Ury. Note that (2) is a special case of (3), and (1) of (4), when r= 1.
The analysisthat follows will thereforebe for the generalcase of possiblyunequalsample
SiZES.
To a remarkabledegree of accuracy,m is approximatelyequal to m8, where
m4 = n>'+ (r + 1)/r6. (5)
Define x = 2(r + 1)/rm'6,so that the proportionatedifferencebetween m and mt is, say,

R(X)= 4 7X=2+x 2/(1 +x)

Note that

lim R (x) = 0? lim R (x) = 1


X0 XzJ

and R'(x) > 0 for all x > 0. Providedthat m'6 ¢ 4(r + 1)/r, x S 0.50 and R(x) > 0.01. Thus,
for moderately large values of m' (say, m'>120), moderately large values of 6 (say,
b>0.1) and sample sizes that are not too disproportionate(say, 0.50sr<2), the use of
the simpler expressionin (5) results in a percentage error no greater than 1%.
When one is confrontedwith the inverse problem of estimatingpower for prespecified
sample sizes, (5) is far simplerto manipulatethan (3). Suppose that a one-tailed test with
significancelevel (x is to be performed,and suppose also that there is interestin detecting
a difference between P1 and P2>P17 that N is the available total sample size and that
m8=N/(r+1) is the size of the sample from the first population. Equations (4) and (5)
combine to yield

8-(r + 1)6}-za o/{(r+ 1)PQ}


o/{rb2m
Z o/(rPlQ1+ P202) (6)

as the approximatepercentile correspondingto the actual power. Tables of the normal


curve will then provide the power itself. Since only rough estimates of power are usually

This content downloaded from 128.235.251.160 on Wed, 10 Dec 2014 14:38:51 PM


All use subject to JSTOR Terms and Conditions
A SimpleApproximation
for CalculatingSampleSizes 345

required, (6) may be used for a wider range of values of r (e.g. 0.33s rs3) than that in
which (3) and (5) agree well.
Suppose, for example, that (x= 0S05,that the probabilitiesP1= 0.15 and P2= 0.25 are
considered sufficientlydifferentto warrantrejecting the hypothesisof no difference,and
that a total samplesize of 360 is available,then Table 1 gives the value of Zf3from (6) and
the correspondingapproximatepower for several values of r. Note the asymmetryin the
table: for example, sample sizes of 270 and 90 from the first and second populations
(correspondingto r- 0.33) yield an approximatepower of 0.63, whereas sample sizes of
90 and 270 (correspondingto r= 3) yield an approximatepower of 0.58. Other things
being equal, power is increased when relatively more observations are taken from the
populationwhose underlyingprobabilityis furtherfrom 0.50.

Table 1
Approximatepowersfor detect-
ing a differencebetween P,=
0.15 and P2= 0.25 usinga one-
sided significance test with a
total sample size of 360 and b7
significclncelevel of 0.05
r? Zf3 Power
0.33 0.32 0.63
0.50 0.49 0.69
- 1 0.60 0.73
2 0.41 0.66
3 0.19 0.58
t r is the ratio of the sample
size from the second population
to that from the first.

The power values in Table 1 agree to two decimal places with those obtained by
inverting (4) and (3), which must be done by trial and error or iteratively. For more
extreme values of r, the discrepancyincreases and may be unacceptablylarge.
If a two-tailed test with significancelevel cz is performed,z,, must be replaced by Zc,/2
and 6 must be redefinedas |P2-P11

ACKNOWLEDGEMENT
This research was supported in part by a grant from the National Institute of Dental
Research.

RE S U M E
On fournit une approximationsimple a la formule necessaire pour determiner des tailles
d'echantillonsdestinees a tester la differenceentre deux probabilitesbinomiales avec niveau de
significationet puissance fixees. La formule des tailles egales d'echantillon a ete obtenue par
Casagrande,Pike et Smith(1978, Biometrics347483-486) et elle se generalisefacilementau cas de
tailles inegales. On montreque pour une large plage des valeursdes parametreset des rapportsde
tailles d.7echantillon,le pourcentage d'erreur en utilisant l'approximationne depasse pas 1%.
L'approximationest particulierementutile pour le probleme inverse d'estimationde la puissance
quand les d'echantillonsont donnees.

This content downloaded from 128.235.251.160 on Wed, 10 Dec 2014 14:38:51 PM


All use subject to JSTOR Terms and Conditions
346 Bio7netrics, June 1980

REFERENCES
Bennett, B. M. and HSU? P. (1960). On the power futlctioll of the exact test for the 2 x 2 contingency
table. Bio1xletrika47, 393-398.
Casagrande, J. T., Pike, M. C. and Smith, P. G. (1978). An improved approximate formula for
calculating sample sizes for comparing two binomial distributions. Biometrics 34, 483-486.
Fleiss, J. L. (1973). Statistical Met1lods for Rates cl1ldProportiopls.New York: Wiley.
Haseman, J. K. (1978). Exact sample sizes for vIse with the Fisher-Irwin test for 2 x 2 tables.
Bio1netrics 34, 106-109.

Received May 1979; revisefl Jllly 1979

This content downloaded from 128.235.251.160 on Wed, 10 Dec 2014 14:38:51 PM


All use subject to JSTOR Terms and Conditions

You might also like