You are on page 1of 69

Sample size

By Mohamad Adam Bujang


Institute for Clinical Research, Ministry of Health Malaysia
&
Sarawak General Hospital

1
Expectations

Pre-requisite knowledge Expectations


• Intermediate statistics • Know the concept of sample size
• Statistical inference calculation and estimation
• Distributions
• Know how to determine sample
• Statistical tests
size for common statistical tests
(i.e. correlation and Cronbach’s
alpha, logistic regression)

2
Scope of presentation
• Approach in statistical analysis
• Descriptive & Inferential analysis
• Sample Size for Common Statistical Tests
• Pearson’s correlation test
• Cronbach’s alpha test
• Revision: What affect sample size?
• Alpha, Power of study and effect sizes
• Sample size estimation for multivariate analysis
• MLR and Logistic regression
• Tips for determination of sample size

3
Introduction: Why need to calculate or
estimate sample size?
• Requirement for protocol submission
• Plan for budget
• To know when to start and when to stop
• To get significant result

4
Introduction: Approach in the statistical
analysis

Population
Descriptive analysis
data

Study Statistical
objectives analysis

Sample Descriptive analysis &


data Inferential analysis

5
Approach in statistical analysis

Descriptive analysis Inferential analysis


• Inferential analysis is a way of making
• A quantitative processes that inferences about populations based on
involve describing the data to samples.
show pattern, magnitude, trend
• Involves various of statistical tests to analyze
and association of the variables. sample data and make a conclusion about the
population based on concept of probability.

• The statistical test will produce an evidence


(p-value) and it is used as an indicator to
justify statistics (results) that are derived from
sample can be inferred to the parameter (true
values) of the population.

6
Introduction: Relation between population &
sample

% of adults % of adults

Mean age by Mean age by


Statistics Sample
gender Population gender Parameters

Mean monthly Mean monthly


income income

Inference!

7
Introduction: Summary
• How good statistics derived from a sample can be inferred to the
targeted population?
• 2 factors drive the accuracy of the statistics
a. Sampling technique – to eliminate bias in selection
b. Sample size – to get sufficient sample size for inference
• Therefore, researcher calculate or estimate sample size because
a. They study or analyze sample data instead of population data
b. They want to get significant result (p<0.05) to justify for
inference

8
Introduction: Summary

Sample size Objective

Statistical test:
Eg: Ind. Samp.
T-test, Pearson
Chi Square,
Correlation, etc

Formula!

9
Sample Size for Common
Statistical Tests
Correlation test & Cronbach’s alpha

10
Introduction: What affect sample size?
• Alpha (∝)
• Power of the study (1 - 𝛽)
Fail to
• Effect size reject
null

• Researchers are bound to make


errors (Type I & Type II) in making
conclusion
• In sample size calculation, Type I &
Type II errors are usually fixed at
5% (∝) and 20% (𝛽) respectively. Reject
null
11
Introduction: Effect
size?
Effect size?
• Measures the magnitude of
differences, association,
correlation etc. depending on
the statistical tests.

• Cohen J. A power
primer. Psychological
Bulletin. 1992;112(1):155–159.
doi: 10.1037/0033-
2909.112.1.155.
12
Note:
A power of 80.0%
refers to type II
error of 20.0%

Power of study
=1–β
= 1 – 0.2
=0.8

Note:
An alpha (type I
error) of 0.05%
refers to 95%
confidence interval

Confidence interval
=1–α
= 1 – 0.05
= 0.95

13
Relationship between effect size & sample
size
• If the real effect size is large,
ideally there is no need large
ES n sample size to prove the large ES
is exist!

• In sample size calculation,


researcher need to determine
ES n the ES of a study.

14
Relation between effect size & sample size
• Set low effect size • Thus, researcher will normally
• You may need large sample size to set the smallest effect size that
prove the result is statistically they can tolerate to convince
significant. themselves and also the
audience that the finding is
• Set high effect size clinically or scientifically
• Although you are require to recruit
significant (i.e. there is a
small sample size, but you may not difference, there is an
able to achieve the desired effect association, there is a
size from the sample. correlation, etc)

15
Correlation test – Pearson’s Correlation Test
• A measure of the correlation & strength between two variables in
numerical form.

16
Inferential analysis:

Conventional approach Common approach


• Step 1: State the null and • Step 1: Understand the objective
alternative hypothesis. • Step 2: Determine the appropriate
• Step 2: Select an appropriate statistical test to answer the objective
inferential statistical test.
• Step 3: Conduct statistical analysis
• Step 3: Select level of significance.
• Step 4: Interpretation
• Step 4: Determine regions of the
rejection region. • Step 5: Make a conclusion
• Step 5: Perform test.
• Step 6: Make a conclusion
17
Inferential analysis: Scenario
• This study aims to determine the correlation between age and
knowledge score (Step 1: Understand the objective). Since both are in
numerical form and parametric test is assumed, thus a Pearson’s
correlation test was conducted (Step 2 & Step 3: Determine the
appropriate statistical test to answer the objective & Conduct
statistical analysis). Result shows the result was statistically significant
(p=0.013) with correlation coefficient of 0.30 (Step 4: Interpretation).
Therefore, the correlation between age and knowledge score is exist
but with low magnitude (Step 5: Make a conclusion).

18
Inferential analysis: Scenario (sample size)
• This study aims to determine the • Let alpha is fixed at 0.05 (5%)
correlation between age and ≈ 95% confidence interval
knowledge score. Both variables
are observed in numerical form. • Let power of study is fixed at
What is the appropriate sample 80.0%
size of this study to determine a • Effect size ?
low or a moderate correlation Say we estimate the minimum
between these two variables? value of correlation coefficient is
0.30.

19
What information do we need for sample size
calculation?
• Type I error = alpha = 0.05
• Power of study = 1 – β = 80.0%
• Effect size for Pearson’s correlation test = r = correlation coefficient

• Using PASS software, the calculation require us to determine;


a. The correlation coefficient (r) in the null hypothesis
b. The correlation coefficient (r) in the alternative hypothesis

20
Sample size calculation using PASS software:
Pearson’s correlation test

21
Sample size calculation using PASS software:
Pearson’s correlation test

22
Sample size calculation: Pearson’s correlation
test
ρ0 ρ1 n Note:
0.0 0.1 782 • ρ0 is the value of the population correlation
under the null hypothesis.
0.2 193
• ρ1 is the value of the population correlation
0.3 84 under the alternative hypothesis.
0.4 46
0.5 29 • Bujang MA, Nurakmal B. Sample size guideline for
0.6 19 correlation analysis. World Journal of Social Science
Research. 2016;3(1):37–46.
0.7 13 doi: 10.22158/wjssr.v3n1p37.
0.8 9
0.9 6
• Citations 137 since 2016 until 8th Feb 2022
Note:
• Sample size was calculated based on formula by Guenther
(1977) based on alpha less than 0.05 and minimum power of
80.0%
23
Sample size statement: Pearson’s correlation test
• This study aims to determine the magnitude of correlation between age
and knowledge score. The basis of sample size calculation will use
formula based on Pearson’s correlation test. The minimum correlation
coefficient to be detected in the study is at least 0.30. With assumption
that the magnitude of correlation in the null hypothesis and alternative
hypothesis are equal to zero and at least 0.3 respectively. Hence, the
minimum required sample size is 84 based on alpha of 0.05 and
minimum power of 80%. By adding a 20.0% of drop out, this study
need to recruit 105 participants.

84 / 0.8 = 105
24
Statistical test: Cronbach’s alpha
• Aim: To determine the
strength/magnitude of internal
consistency or stability of domain
(latent variable measured by a
group of variables).

• Commonly use to measure internal


consistency of domains for
questionnaire development and • When variables a, b, and c have high internal
questionnaire validation studies. consistency, so it means that variables a, b,
and c are:
-measuring the same thing
• The statistics or coefficient ranged -consistent with each other
between zero to one. but a, b, and c are not necessarily a valid
measure for certain things
25
Example of Latent Variables
“Depression” is a
latent variable. Depression
(Item
This parameter 3,5,10,13,16
(depression) cannot ,17 and 21)
be measured
directly but instead
require other
measures to explain
the parameter
Psychological
Symptoms

Stress Anxiety
(Item (Item
1,6,8,11,12, 2,4,7,9,15,1
14 and 18) 9 and 20)
26
Example: Job Satisfaction Questionnaire (JS-
Q)
• TW1, TW2, TW3, TW4 and TW5
report excellent internal
consistency with Cronbach’s alpha
coefficient 0.924. This group of
item is suitable to represent a
domain (in this case is Teamwork
,TW)

• Study by Ahmad, N.F.D.; Ren Jye,


A.K.; Zulkifli, Z.; Bujang, M.A. The
development and validation of job
satisfaction questionnaire for
health workforce. Malays. J. Med.
Sci. 2020, 27, 128–143.
27
Acceptable Cronbach’s alpha values
Guideline:
i. For general;
Cronbach’s alpha more than 0.5 is acceptable
ii. Questionnaire development studies;
Aim for Cronbach’s alpha > 0.7
iii. Questionnaire validation studies;
Aim for Cronbach’s alpha > 0.5
iv. Domain needs further revision if report Cronbach’s alpha <0.5
v. Domain does not exist when Cronbach’s alpha <0.1

28
Inferential analysis: Scenario
• This study aims to determine the internal consistency of four main
domains of Questionnaire Z (Step 1: Understand the objective). All
domains have 5 items each and thus, a Cronbach’s alpha test was
conducted (Step 2 & Step 3: Determine the appropriate statistical test
to answer the objective & Conduct statistical analysis). Result shows
that all the four domains report Cronbach’s alpha more than 0.5.
(Step 4: Interpretation). Therefore, the internal consistency of
Questionnaire Z domains are acceptable (Step 5: Make a conclusion).

29
What information do we need for sample size
calculation?
• Type I error = alpha = 0.05
• Power of study = 1 – β = 80.0%
• Effect size for Cronbach’s alpha is determined by the difference of
Cronbach’s alpha values in the hypothesis testing and the number of
items (or raters)

• Using PASS software, the calculation require us to determine;


a. The coefficient of Cronbach’s alpha in the null hypothesis
b. The coefficient of Cronbach’s alpha in the alternative hypothesis
c. Number of items
30
Inferential analysis: Scenario (sample size)
• This study aims to determine the • Let alpha is fixed at 0.05 (5%)
internal consistency of ≈ 95% confidence interval
Questionnaire Z domains. There
are four main domain and each • Let power of study is fixed at 80.0%
domain has 5 items. It was noted • Effect size ?
that the acceptable Cronbach’s • Determine by;
alpha is more than 0.5. What is the
appropriate sample size of this i. Alpha value in the null hypothesis
study to determine acceptable ii. Alpha value in the alternative
internal inconsistency? hypothesis
iii. Number of items

31
Sample size calculation using PASS software:
Cronbach’s alpha test

32
Sample size calculation using PASS software:
Cronbach’s alpha test
CA0 CA1 n Note:
0.0 0.3 152 • CA0 is the value of the estimated Cronbach’s alpha
in the null hypothesis.
0.4 74
• CA1 is the value of the estimated Cronbach’s alpha
0.5 41 in the alternative hypothesis.
0.6 24
0.7 14
0.8 9 • Bujang MA, Omar ED, Baharum NA. A review on
0.9 5 sample size determination for Cronbach’s alpha
test: a simple guide for researchers. Malays J Med
Sci. 2018;25(6):85–99.
Note:
doi: 10.21315/mjms2018.25.6.9.
• Sample size was calculated based on formula by Bonnet &
Douglas (2002) based on alpha less than 0.05 and minimum • Citations 159 since 2018 until 8thFeb 2022
power of 80.0%

33
Sample size statement: Cronbach’s alpha test
• This study aims to determine the internal consistency of domains for
Questionnaire Z which has 5 questions in each domain. The basis of
sample size calculation will use formula based on Cronbach’s alpha
test. The minimum Cronbach’s alpha coefficient to be detected in the
four domains is at least 0.50. With assumption that the Cronbach’s
alpha coefficient in the null hypothesis and alternative hypothesis are
equal to zero and at least 0.5 respectively. Hence, the minimum
required sample size is 41 based on alpha of 0.05 and minimum power
of 80%. By adding a 20.0% of drop out, this study need to recruit 52
participants.

34
Sample Size for Multivariate
Analysis
Multiple Linear Regression (MLR) & Analysis of Covariance (ANCOVA)
Logistic Regression

35
Estimate sample size
To determine to what extent the socio-demographics profile and
• We estimate sample size usually for multivariate analysis. perception among UPLB scientists are associated with the use of
social media in research.

Examples of multivariate analysis:


Logistic Regression Factors Outcome
Factor analysis
General Linear Model (ANCOVA)
Multiple Regression
Survival analysis
Etc.

Example of research questions:


• What are the associated factors towards poor control of HbA1c?
• Could gender, education level and salary predict monthly
expenses?

36
Multiple Linear Regression & General Linear
Model (ANCOVA)
• is a statistical technique that “ Based on a cross-sectional study,
uses several explanatory a group of researcher aim to
variables (independent determine to what extent age,
variables) to; gender, ethnicity, education
• predict the outcome of a response level, BMI, exercise and diet are
variable (in numerical form). associated with systolic blood
• study how the explanatory pressure.”
variables associate with the
response variable (in numerical
form). How many participants should
they recruit?

37
Multiple Linear Regression & General Linear
Model (ANCOVA)
Before sample size calculation:
• Understand the scenario
• Determine the appropriate sample size technique to answer the
objective
• MLR or General Linear Model ANCOVA!

38
Rule of thumb for Multiple Linear Regression

(1) Tabachnick, B.G. & Fidell, L.S. (2013). Sample size statement:
Using Multivariate Statistics (6th “The aim of this study is to determine to
edition). Boston: Pearson Education what extent age, gender, ethnicity,
education level, BMI, exercise and diet
“N > 50 + 8m” are associated with systolic blood
pressure. According to study by
The number of sample size (N) should Tabachnick et., al., (2013), which is
exceeds referring to a guideline of sample size
50 + 8 (no. of predictors or risk factor) for Multiple Linear Regression, The
number of sample size (N) should
exceeds 50 + 8 (no. of independent
variables). Since this study has 7
independent variables, therefore this
study will needs a minimum sample
size of 106=50 + 8(7). (Tabachnick et
al., 2013).”
39
Rule of thumb for MLR / ANCOVA
(2) Bujang MA, Sa’at N, Tg Abu Bakar • Based on validation (based on various
Sidik TMI. Determination of minimum sample size & statistical analyses)
sample size requirement for multiple between sample statistics and
linear regression and analysis of parameter, the ideal sample size is
covariance based on experimental and 300 subjects.
non-experimental studies. Epidemiology
Biostatistics and Public
Health. 2017;14(3):e12117–1.
doi: 10.2427/1211.

40
Rule of thumb: Multiple Linear Regression (MLR) and General Linear
Model (ANCOVA) for observational study by Bujang et al., (2017)

At
sample
size of
300

The relation of the difference of effect size (partial eta-squared) between parameters and statistics and sample sizes

41
Rule of thumb for MLR & ANCOVA for observational study by Bujang et. al.,
(2017)

Sample size statement:


“The aim of this study is to determine to what extent age, gender,
ethnicity, education level, BMI, exercise and diet are associated with
systolic blood pressure. According to study by Bujang et., al., (2017),
which is referring to a guideline of sample size for MLR and General
Linear Model ANCOVA for an observational study, the number of
sample size (n) should exceeds minimum 300 to be able to derive
statistics that are mimic with the parameters in the targeted population
(Bujang et al., 2017).”

42
Logistic Regression
• is a statistical technique that “ Based on a cross-sectional study,
uses several explanatory a group of researcher aim to
variables (independent determine to what extent age,
variables) to; gender, ethnicity, education
• predict the outcome of a response level, BMI, exercise and diet are
variable (in categorical form). associated with status of systolic
• study how the explanatory blood pressure (i.e. controlled &
variables associate with the not controlled.”
response variable (in categorical
form).
How many participants should
they recruit?

43
Rule of thumb for Logistic Regression based on Peduzzi et al.,
(1996)
• An EPV10 rule of thumb is
depends on;
1. Prevalence of the outcome of
interest (e.g; 30% of poor
outcome)
2. Number of participants to be
recruited (e.g; 300 participants).
• Based on (1) & (2), researchers be
able to determine number of
independent variables to be tested
in the final regression model.
44
Rule of thumb for Logistic Regression based
on Peduzzi et al., (1996)
a. Estimated b. Total c. Total d. EPV of e. Number of f. Sample size
prevalence of the sample estimated 10 factors (IV) sufficient?
least category size sample size for (c / 10) in the final
from a binary the least logistic
outcome category from regression
a binary model
outcome
20% 100 20 2 4 No
20% 300 60 6 5 Yes
30% 100 30 3 4 No
30% 300 90 9 5 Yes
50% 100 50 5 4 Yes
50% 300 150 15 5 Yes

45
Rule of thumb for Logistic Regression based
on Peduzzi et al., (1996)
a. Number b. Total c. Total d. Estimated e. Total f. Sample
of factors estimated sample prevalence of estimated size
planned in sample size for size the least sample size for sufficient?
the logistic the least category from the least
regression category from a binary category from
model a binary outcome a binary
outcome outcome
4 40 100 20% 20 No
EPV 10
5 50 300 20% 60 Yes
4 40 100 30% 30 No
5 50 300 30% 90 Yes
4 40 100 50% 50 Yes
5 50 300 50% 150 Yes
Yes: e > b
46
Rule of thumb for Logistic Regression based on Peduzzi et al.,
(1996)
Sample size statement:
“The aim of this study is to determine to what extent age, gender,
ethnicity, education level, BMI, exercise and diet are associated with
poor control of HbA1c). According to study by Peduzzi (1996) which
is referring to a guideline of sample size for logistic regression,
suggest a minimum event per variable is 10 for the least number in the
outcome variable. Since this study is interested to study 7 risk factors,
therefore this study will needs a minimum sample of 70 patients in the
poor outcome category. This study plans to recruit at least 300 samples
which is exceed the minimum number of sample size since the
prevalence of poor outcome is estimated at 50% (Peduzzi, 1996).”

47
Cox regression
• The concept of EPV 10 was introduced for both logistic regression and cox
regression.
• Peduzzi & Concato proposed the similar rule of thumb of EPV10 can be used also
for cox regression

Reference:
• Peduzzi, P., Concato, J., Feinstein, A. R. and Holford, T. R. 1995. Importance of
events per independent variable in proportional hazards regression analysis: II.
Accuracy and precision of regression estimates. Journal of Clinical Epidemiology,
48: 1503–1510.

48
Criticism of EPV10 rule of thumb
The concept of EPV with 10 received some critics [Smeden et al., 2016] and hence, Austin
and Steyerberg (2017) recommended EPV of 20 instead of 10 [Austin & Steyerberg, 2017].

References:

• Maarten van Smeden, Joris A. H. de Groot, Karel G. M. Moons, Gary S. Collins, Douglas
G. Altman, Marinus J. C. Eijkemans and Johannes B. Reitsma. No rationale for 1 variable
per 10 events criterion for binary logistic regression analysis. BMC Medical Research
Methodology (2016) 16:163

• Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of
different strategies for estimating the out-of-sample validity of logistic regression models.
Statistical Methods in Medical Research 2017, Vol. 26(2) 796–808

49
Rule of thumb for Logistic Regression based
on Bujang et al., (2018)
Based on;
• Rule of thumb of EPV50
• Using a simple formula
n = 100 + 50i where i refers to
number of independent variables
in the final model.

Citations 198 since 2018 until 8th


Feb 2022

50
Rule of thumb for Logistic Regression based on Bujang et al.,
(2018)
Logistic regression based on enter method Logistic regression based on stepwise method

Sample size statement: Sample size statement:


“The aim of this study is to determine to what “The aim of this study is to determine the associated
extent age, gender, ethnicity, education level, factors toward poor control of HbA1c). Out of 313
BMI, exercise and diet are associated with participants and 7 independent variables were
poor control of HbA1c). According to study tested, only 3 factors (age, diet and exercise) were
statistically significant. According to study by
by Bujang et. al., (2018) which is referring to Bujang et. al., (2018) which is referring to a
a guideline of sample size for logistic guideline of sample size for logistic regression,
regression, suggest using a simple formula, n suggest using a simple formula, n = 100 + 50i
= 100 + 50i where i is the number of where i is the number of independent variable in the
independent variable in the final model. Since final model. Since this study found 3 risk factors,
this study is interested to study 7 risk factors, therefore this study will needs a minimum sample
therefore this study will needs a minimum of 250 patients. Since this study had recruited 313
sample of 450 patients. (Bujang et. al., participants and hence, the sample size is sufficient
2018).” (Bujang et. al., 2018).”

51
Rule of thumb - . If large data is available (>500)

52
Rule of thumb - If large data is available (>500)

As justification instead of sample size statement:


“The aim of this study is to determine the associated factors towards
poor control of HbA1c from large cohort of data. A previous study
found that when the sample size reached at least 500, the statistics
analyzed from the sample are likely to be closed with the parameters
in that particular population (Bujang et., al., 2012). Since the current
available data in the cohort is 4371, therefore the current sample size
is sufficient to estimate the parameter in the population.”

53
Summary: Why need to calculate or estimate
sample size?
• Requirement for protocol submission Require a sample size
statement
• Plan for budget
• To know when to start and when to stop
• To get significant result

To convince audience that the


research hypothesis constructed by
the researchers is proven, then a
proper conclusion can be made

54
Tips for sample size
calculation or estimation

55
To understand the objective of a study
Understand:
1. the subject matter
2. the scenario
3. the significant of the study objective

Objective should be:


1. Measureable
2. Can be answered using statistical analysis

56
To select the appropriate statistical analysis
• Need to familiar with various
statistical tests
• Say the study has been
conducted by others, then read
the paper in the method’s
section – to identify the
statistical test that was used

57
To calculate or estimate the minimum sample
size required by the study
• Calculation
• Manual
• Software

• Sample size tables from published articles


• Rule of thumb
• Multivariate analysis
• Regression models
• Exploratory Factor Analysis

58
Sample
size papers

59
To provide an additional allowance during subject
recruitment to cater for a certain proportion of
non-response
Causes Calculation
• Missing participants’ response • Minimum sample size required 150
• Spoilt or broken sample • Add non-response rate of 20%
• Missing values
How much? • 150 / 0.8 = 187.5 = 188
• usually by 20% to 30%.
• If researcher is expecting a high non-response • Say add non-response rate of 30%
rate in a self-administered survey, then • 150 / 0.7 = 214.2 = 215
he/she should provide an allowance for it by
adding more than 30% such as 40% to 50%.
Purpose
• To ensure minimum sample size is achieved

60
To write a sample size statement
All the elements from Step 1 until Step 4 • Say a study aims to determine the
• determine the study objective, association of factors with optimal
HbA1c level as determined by its cut-
• determine the appropriate statistical off point of < 6.5% among patients
analysis, with type 2 diabetes mellitus (T2DM).
• sample size estimation/calculation Previous study had already estimated
that several significant factors were
• add non-response rate identified, and then included as three
to four variables in the final model
consisting of parameters that were
should be fully stated in the sample size selected from demographic profile of
statement. patients and clinical parameters (cite
the appropriate reference). How many
T2DM patients should the study
recruit in order to answer the study
objective?

61
To write a sample size statement
• Step 1: To Understand the • Step 2: To Decide the Appropriate
Objective of Study Statistical Analysis
• The study aims to determine a set • In this example, the outcome
of independent variables that show variable is in the categorical and
a significant association with binary form, such as HbA1c level of
optimal HbA1c level (as < 6.5% versus ≥ 6.5%. On the other
determined by its cut-off point of < hand, there are about 3 to 4
6.5%) among T2DM patients. independent variables, which can
be expressed in both the
categorical and numerical form.
Therefore, an appropriate
statistical analysis shall be logistic
regression.
62
To write a sample size statement
• Step 3: To Estimate or Calculate the Sample Size Required
• Since this study will require a multivariate regression analysis, thus it
is recommended to estimate sample size based on the general rule of
thumb. There are several general rules of thumb available for
estimating the sample size for multivariate logistic regression. Two
approaches are introduced here, namely: i) sample size estimation
based on concept of event per variable (EPV) and ii) sample size
estimation based on a simple formula.

63
To write a sample size statement
i) Sample size estimation based on a concept EPV 50 ii) Sample size estimation based on a formula of n =
100 + 50i (where i represents number of independent
• For EPV 50, the researcher will need to know the variable in the final model)
prevalence of the ‘good’ outcome category and the
number of subjects in the ‘good’ outcome category • When using this formula, the researcher will first
to fit the rule of EPV 50. need to set the total number of independent
variables in the final model. As stated in the
• Say, the prevalence of ‘good’ outcome category is example, the total number of independent
reported at 70% (cite the appropriate reference). variables were estimated to be about three to four
• Then, with a total of four independent variables, (cite the appropriate reference). Then, with a total
the minimum sample size required in the ‘poor’ of four independent variables, the minimum
outcome category will be at least 200 subjects in required sample size will be 300 patients [(i.e. 100
order to fulfil the condition for EPV 50 (i.e. 200/4 = + 50 (4) = 300].
50).
• On the other hand, by estimating the prevalence of
‘good’ outcome at 70.0%, this study will therefore
need to recruit at least 290 subjects in order to
ensure that a minimum 200 subjects will be
obtained in the ‘poor’ outcome category (70/100 x
290 = 203, and 203 > 200).

64
To write a sample size statement
• Step 4: To Provide Additional Allowance for a Certain Proportion of
Non-Response Rate
• In order to make up for a rough estimate of 20.0% of non-response
rate, the minimum sample size requirement is calculated to be 254
patients (i.e. 203/0.8) by estimating the sample size based on the EPV
50, and is calculated to be 375 patients (i.e. 300/0.8) by estimating
the sample size based on the formula n = 100 + 50i.

65
To write a sample size statement
• Step 5: To Write a Sample Size Statement
• There were previously two approaches that were introduced to estimate sample size for
logistic regression. Say, if the researcher chooses to apply the formula n = 100 +
50i. Therefore, the sample size statement will be written as follows:
• “The main objective of this study is to determine the association of factors with optimal
HbA1c level as determined by its cut-off point of < 6.5% among patients with type 2
diabetes mellitus (T2DM). The sample size estimation is derived from the general rule of
thumb for logistic regression proposed by Bujang et al. (2018), which had established a
simple guideline of sample size determination for logistic regression. In this study, Bujang
et al. (2018) suggested to calculate the sample size by basing on a formula n = 100 +
50i. The estimated total number of independent variables was about three to four (cite
the appropriate reference). Thus, with a total of four independent variables, the
minimum required sample size will be 300 patients (i.e. 100 + 50 (4) = 300). By providing
an additional allowance to cater for a possible dropout rate of 20%, this study will
therefore need at least a sample size of 300/0.8 = 375 patients.”

66
A checklist to
ease for
sample size
calculation or
estimation

67
A checklist to
ease for
sample size
calculation or
estimation

68
End

Thank you

69

You might also like