You are on page 1of 6

c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.elsevier.com/locate/cmrp

Research Methods

Approach to sample size calculation in medical research

Ajit Kumar, Shivani Dogra, Avneet Kaur, Manoj Modi, Anup Thakur, Satish Saluja*
Department of Neonatology, Sir Ganga Ram Hospital, New Delhi, India

abstract

Keywords: Adequate sample size is of paramount importance in medical research. Inadequate number of
Sample size subjects in a study may lead to inconclusive results and erroneous interpretation. This ne-
a Error cessitates estimation of sample size for a research project. There are formulae for calculation of
b Error sample size for different study designs. A researcher can manually compute sample size using
Power these formulae. Alternatively, there are several statistical softwares and online calculators,
which can compute sample size for various research designs. Common types of study designs
in medical research are estimation of a proportion or a mean in a defined population; and
hypothesis testing for a difference in two qualitative (proportions) or two quantitative out-
comes (means). For each type of study design, investigator needs to have some a priori infor-
mation like precision, acceptable limits of a & b error, confidence level and effect size.
Copyright ª 2014, Sir Ganga Ram Hospital. Published by Reed Elsevier India Pvt. Ltd. All
rights reserved.

Before we start a research project, it is important to know the Power of a study or the b error is not applicable for
minimum number of subjects required to be enrolled to obtain calculation of sample size in these type of study design.
a meaningful answer to our research question. Certain 2. Comparison studies (hypothesis testing)
essential parameters are required to calculate sample size.1 In a. Categorical outcomes
this section, we shall discuss calculation of sample size using i. Expected proportion of outcome/event rates in two
either statistical formulae, software, web based online calcu- groups (assumed from pilot observation or reported
lators or downloadable smartphone applications. For the from other studies)
calculation of sample size, following information is required: b. Continuous outcomes
i. Expected means and standard deviations of quanti-
1. Estimation studies tative outcomes in two groups (assumed from pilot
a. If the outcomes are observation or reported from other studies)
i. Categorical: expected prevalence or proportion of an c. Type I (a error): Usually a error of 5% is acceptable. The
event/disease in a population corresponding z values for a error or for 95% confidence
ii. Continuous: standard deviation of parameter of level are displayed in Table 1.
interest d. b (beta) error is conventionally kept at 20%. The power of
b. Precision (d): absolute/relative expressed as fraction e.g. a study is 1b. For a b error of 20% (0.2) the power of study
a 10% precision shall be expressed as 0.1 is 80% (10.2), its corresponding z value is 0.842 (Table 1).
c. Confidence level: Usually a confidence level of 95% is
acceptable. This also gives us the estimate of ‘a error’, Values of Z1a and Z1b corresponding to different level of a
which is 5% or 0.05 for 95% confidence level. error and power (1b) are listed in Table 1. Z1a value is

* Corresponding author.
E-mail address: satishsaluja@gmail.com (S. Saluja).
http://dx.doi.org/10.1016/j.cmrp.2014.04.001
2352-0817/Copyright ª 2014, Sir Ganga Ram Hospital. Published by Reed Elsevier India Pvt. Ltd. All rights reserved.
88 c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2

 Desired absolute precision (d), generally 5e10%


Table 1 e Z values for various levels of a & b error and
power.
Formula to calculate sample size for this scenario is as
a level Z1a/2 (Two sided test) Z1a (one sided test)
follows:
0.01 2.576 2.326
0.05 1.960 1.645 Z21a=2  pð1  pÞ
0.10 1.645 1.282 n¼
d2
b error Z1b 1b (power)
Example: An investigator wants to know the required
0.20 0.842 0.8 (80%) sample size to study prevalence of anemia in a population. For
0.10 1.282 0.9 (90%)
an expected prevalence of anemia of 20%, how many subjects
0.05 1.645 0.95 (95%)
need to be enrolled for a precision of 5% and confidence level
0.01 2.326 0.99 (99%)
95%.
Here, estimated prevalence is 20%, p ¼ 0.2.
different for one sided or two sided hypothesis testing. Two Confidence level ¼ 95%, i.e. Z1a/2 ¼ 1.96.
sided hypothesis means that results of intervention could be Absolute precision (d) ¼ 5% ¼ 0.05.
on either direction i.e. outcome could be better or worse. In
ð1:96  1:96Þ  ð0:2Þ  ð1  0:2Þ
most of situations, a two sided Z1a value should be used. In a n¼ ¼ 245:86
ð0:05Þ  ð0:05Þ
few situations, it may be known before the study that result of
an outcome is possible in only one direction. Such exceptional Hence, we need to enroll 246 subjects to study prevalence
circumstances may be studies on nutritional supplementation of anemia in the population with 95% confidence level and 5%
vs placebo, where we expect that the outcome can only be precision.
better and not worse than placebo. In such instances, one Sometimes, the information about the actual event rate in
tailed statistical analysis may be considered and corre- the population may not be available. In such a situation, it is
sponding Z1a value may be taken. This would require a suggested to assume the prevalence to be 50% (p ¼ 0.5). This
smaller sample size for detection of the expected difference will give the largest sample size estimation of prevalence in
compared to a two-tailed analysis (Table 1). the population. In Table 2, we can see that product of p and
In following sections, we will discuss sample size calcula- 1p is maximum for p ¼ 0.5.
tions for various study designs using statistical formulae.2 Example: A medical officer seeks to estimate the propor-
tion of children in the district receiving complete immuniza-
tion during 1st year. There is no information about expected
1. Estimation of population proportion proportion of immunization coverage in the population. How
(event rate) with specified absolute precision many children must be studied for a desired precision of 10%
and 95% confidence?
A researcher wishes to know the prevalence of a disease or Since we do not know the expected proportion of children
event of interest in a specified population. For example prev- receiving complete immunization coverage, we will take 50%
alence of tuberculosis or anemia or vitamin D deficiency in a as expected proportion. Sample size for this study is calcu-
population. What is the minimum number of subjects he lated as follows:
should enroll to precisely estimate this prevalence?
Z21a=2  pð1  pÞ
To calculate the sample size for above situation, the n¼
d2
investigator should have some estimate of prevalence of that
n ¼ 1:96  1:96  0:25=0:10  0:10 ¼ 96:04
outcome/disease. This information shall come from published
literature or a pilot study or an assumption. In addition a Thus 97 subjects will be the desired sample size.
confidence level needs to be set which usually is kept at 95%
with a corresponding z value being 1.96. The researcher also
needs to set a precision i.e. the range on either side of
assumed prevalence. For example if the researcher assumes
2. Estimation of population proportion
an absolute precision of 10% for an expected prevalence of
(event rate) with specified relative precision
30%, the range would be 20%e40%.
Required inputs for estimation of sample size (n) for prev-
Many times while evaluating event rate in a population,
alence are:
researcher might set desired precision relative to expected
proportion. For example researcher may set precision as 10%
 Expected event rate/proportion (p)
of expected proportion or event rate of 50%. Here 10% is a
 Confidence level, which is usually kept at 95%, corre-
relative precision (ε), i.e. 10% of 50% equivalent to a range of
sponding z value ¼ 1.96
45%e55%. Sample size calculation for this scenario is as
follows:

Table 2 e Values of p (1Lp) for various proportion levels. Z21a=2  ð1  pÞ



p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ε2 p
p (1p) 0.09 0.16 0.21 0.24 0.25 0.24 0.21 0.16 0.09
(Here p is expected event rate and ε is relative precision)
c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2 89

1:96  1:96  ð1  0:5Þ h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi


 ffioi2
n¼ Z1a=2  2Pð1  PÞ þ Z1b  P1 ð1  P1 Þ þ P2 1  ðPÞ2
0:1  0:1  0:5

n ¼ 384:16 ðP1  P2 Þ2

Thus a sample size of 385 subjects is required for this


Here P is the average of two proportions:
study.
ðP1 þ P2 Þ

2
3. Estimation of a population mean with Example: Investigators of “COIN trial” calculated sample
specified precision size for a comparison of two interventions, CPAP or in-
tubations in preterm infants for an outcome or death/bron-
This method for sample size calculation is used in study de- chopulmonary dysplasia.3 They hypothesized that with use of
signs where the parameter of interest is quantitative, for CPAP expected event rate would decrease to 20% from 30% in
example hemoglobin levels in adolescent girls or cholesterol the group with standard care. Sample size for this difference
levels in a population. For this purpose the investigator needs with a two tailed a error of 0.05 and a power of 80% is calcu-
to have an approximate estimate of standard deviation of that lated as follows:
parameter. This could be obtained from published literature
or a pilot study. In addition we need to set desired confidence Given P1 ¼ 0:3;
level and precision. Confidence level is usually set at 95%, with P2 ¼ 0:2
a corresponding z value of 1.96. The precision indicates the P ¼ ð0:3 þ 0:2Þ=2 ¼ 0:25
range on either side of estimated mean, in which the true a ¼ 0:05 b ¼ 0:20;
mean will fall. Required inputs and sample size calculation for
Therefore
estimation of mean is as follows:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1:96 2 0:25ð1 0:25Þ þ0:842  0:3ð10:3Þþ0:2ð10:2Þ2
1. Expected standard deviation of parameter (s) n¼
2. Desired precision (d) ð0:30:2Þ2
3. Confidence level (which is usually set at 95%) n ¼ 293:2

Hence 294 participants will be required in each group in


this study.
Z21a=2  s2

d2
Example: We want to estimate the systolic blood pressure
(SBP) of women in a given population. Based on a study done 5. Hypothesis testing for difference between
in similar population it is found that SD ¼ 10 mmHg. We two population means
would like to have an estimate within 5 mmHg on either side
of the mean, with 95% confidence. How many women should In this scenario, clinician compares quantitative outcome
be studied? between two groups. Examples of such situations are hemo-
globin levels/ferritin levels with early or delayed cord clamp-
n ¼ 1:96  1:96  10  10=5  5 ¼ 15:3 ing, glycated hemoglobin (HbA1c) with 3 months treatment
with Metformin or placebo.
Thus, 16 women need to be enrolled in the study to esti-
Required input for sample size calculations are:
mate mean systolic BP in this population.

 Standard deviation (s1) of outcome in group 1


 Standard deviation (s2) of outcome in group 2
4. Hypothesis testing for two population
 Estimated/expected difference between means of two
proportions
groups (m1m2)
 a error, which is usually fixed at 5% (Z1a value for a error
In this scenario, we compare categorical variables (event
5% is 1.96)
rates) in two groups, for example comparison of mortality rate
 Power, which is usually fixed at 80% (Z1b value for power
with two different interventions or complication rates or
80% or b error 20% is 0.842)
success rates with two type of surgical interventions. Required
inputs for sample size calculation for this scenario are:
Following is the formula for sample size calculation for this
Anticipated event rate in group 1 ¼ P1
scenario:
Anticipated event rate in group 2 ¼ P2
a error, which is usually fixed at 5% (Z1a/2 value for a error    2
s21 þ s22  Z1a=2 þ Z1b
5% is 1.96). n¼
ðm1  m2 Þ2
Power, which is usually fixed at 80% (Z1b value for power
80% or b error 20% is 0.842). Example: Investigators designed a study to evaluate the
Following is the formula for sample size calculation for this effects of early initiation of short term pressure support NIV
scenario: compared to traditional oxygen delivery via venturi mask in
90 c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2

Fig. 1 e Sample size calculator in excel sheet.

obese patients during the post-anesthesia care unit (PACU) (e.g. 50 mmHge60 mmHg) with an expected standard devia-
stay.4 They calculated sample size to detect an absolute tion of 10 in both groups with power 80% and a error 5%.
improvement of 10 mmHg in arterial oxygen partial pressure Sample size calculation for this study is as follows:

Fig. 2 e Sample size calculation using sealedenvelope.com.


c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2 91

   
s21 þ s22  Z1a=2 þ Z1b This shows sample size calculation for hypothesis testing in
n¼ two population proportions. Formula is displayed in high-
ðm1  m2 Þ2
h i lighted ellipse at top row with value of sample size in high-
ð10Þ2 þ ð10Þ2  ½1:96 þ 0:8422 lighted circle. In this example to detect a difference of 20%
n¼ ¼ 15:7
ð10Þ2 (80%e60% or 0.8e0.6) with alpha 0.05 and power of 80%, 36
h i subjects would be required in each group.
2  ð10Þ2  ½1:96 þ 0:8422
n¼ ¼ 15:7
ð10Þ2
5.2. Online calculators & softwares
Here SD of 10 has been used for both groups and 2  (SD)2
has been used instead of (SD1)2  (SD2)2. In this study we need Many downloadable software and online calculators are
to enroll 16 patients in each group. available on internet, which can do sample size calculation for
Reader need not to go through these calculations manu- common study designs. Following is the list of few readily
ally. These calculations can be done with ease by various available websites or free download softwares:
methods e.g STATA (www.stata.com),
Power & Precision (www.PowerAnalysis.com),
 Excel calculator Power and sample size (http://biostat.mc.vanderbilt.edu/
 Online calculators wiki/Main/PowerSampleSize) Sealedenvelop.com (www.
 Downloadable applications and softwares. sealedenvelope.com)
http://www.stat.ubc.ca/wrollin/stats/ssize/
5.1. Excel calculator
Sealedenvelop.com is one of the user friendly website,
We can create excel calculator by entering above formula in which can be accessed on a computer or a smartphone to
an excel file. Fig. 1 displays a snapshot of one of the excel file. calculate sample size online. Fig. 2 is showing snapshot of

Fig. 3 e Sample size calculation using STATA.


92 c u r r e n t m e d i c i n e r e s e a r c h a n d p r a c t i c e 4 ( 2 0 1 4 ) 8 7 e9 2

sample size calculation using sealedenvelop.com. In this outcome of interest and is subject to high type II error. A large
example, sample size has been calculated to detect a decrease sample size than required is a waste of resources and would
in primary outcome from 30% in control group to 20% in be unethical. Sample size for various study designs can be
experimental group for 80% power with 5% level of signifi- calculated manually, using statistical formulae or with help of
cance. This shows that a sample size of 291 subjects in each statistical softwares or online calculators.
group is required for this study.
There are many statistical software which can be down-
loaded to computer for sample size estimation and other Conflicts of interest
statistical calculations. One of the commonly used software is
STATA. Step by step sample size calculation using STATA for 2 All authors have none to declare.
population proportion with alpha 0.05 and power 0.80 has
been displayed in Fig. 3 In the first window under the ‘Main’
tick type of populations based on the study design and under
references
the ‘Option’ tick ‘Compute sample size’ and enter confidence
level and power of study. On clicking submit a separate win-
dow opens up as in third frame indicating sample size. For a
1. Singh A, Soni A, Saluja S. Essentials of sample size calculation.
reduction in outcome from 30% to 20% a sample size of 313 GRJ. 2013;3:97e99.
subjects is required in each group. 2. Lawanga SK, Lemeshow S. Adequacy of Sample Size in Health
Studies. World Health Organization; 1990.
3. Morley CJ, Davis PG, Doyle LW, Brion LP, Hascoet JM, Carlin JB,
6. Summary COIN Trial Investigators. Nasal CPAP or intubation at birth for
very preterm infants. N Engl J Med. 2008;358:700e708.
4. Zoremba M, Kalmus G, Begemann D, et al. Short term non-
Sample size estimation is an essential step in designing and
invasive ventilation post-surgery improves arterial blood-
conducting medical research. It ensures recruitment of gases in obese subjects compared to supplemental oxygen
adequate number of subjects in the study. An inadvertent delivery e a randomized controlled trial. BMC Anesthesiol.
small sample size will not give accurate assessment of the 2011;11:10.

You might also like