Sample Size

Sample size calculation
Presented by: Dr Aamir Omair
Dr. Emad Masuadi, BSc, MSc, MPhil, Ph.D.

Assistant Professor of Biostatistics
Research Unit, College of Medicine,
King Saud bin Abdulaziz University for Health Sciences
Objectives
At the end of this session participants should be able to:
1. Explain the importance of sample size

calculation in research
2. Identify the factors that determine the sample

size
3. Identify the relevant software / website for

calculating the required sample size
Emad Masuadi
Importance of sample size calculation
• To ensure that estimates are obtained with the
required accuracy / precision (margin of error).
• The size of the sample influences both the

representativeness of the sample and the
statistical analysis of the data
• Smaller samples are more likely not to be

representative
• Larger samples are more likely to detect a

small difference between different groups
Emad Masuadi
Large sample size effect
100%
80%
60%
40%
20%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
100%
r is the correlation coefficient
r = -0.07
80% p-value =0.04
n =770
60% R² = 0.00544374403386072
40%
20%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
100%
r is the correlation coefficient
r = -0.07
80% p-value =0.04
n =770
60% R² = 0.00544374403386072
40%
Only 0.5% of y explained

by x
20%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
Importance of sample size calculation
Inappropriate sample size can lead to:
• Wrong conclusions
• Poor quality research (Errors)
• Waste of resources (time, money and effort)
• Delay in meeting deadlines
Emad Masuadi
What is Statistics?
• Statistics is a science that deals with collecting and
analyzing data, drawing conclusions, and making
decisions.
• There are two main areas of Statistics:
– Descriptive statistics:
provides tabular and graphical techniques and
numerical measures for describing data.
– Inferential statistics:
provides procedures for analyzing data and
making decisions. Using the sample (statistic)
to infer about the population (parameter)
8
Emad Masuadi
Inferential Statistics
• Estimation: the process of approximating the population
characteristic by a value or range of values e.g.
– Estimate the population mean weight using the sample
mean weight
– Estimate the proportion of obese persons in the
population
• Hypothesis testing: the process of assessing the evidence
against a claim about the population characteristics e.g.
– Determine if the mean weight of males is greater than that
of females
– Determine if obese persons are more likely to be diabetic
as compared to normal weight persons
9
Emad Masuadi
Factors influence the sample size
Research question/
Objectives
Variables in research
question and their type
Hypothesis
Statistical Estimation
testing
Procedure
Outcome Outcome Outcome Outcome

(Categorical) (Numerical) (Categorical) (Numerical)
Significance Significance Confidence Confidence

level (5%) level (5%) Level (95%) Level (95%)
Power of the Power of the

Margin of error Margin of error
test (80%) test (80%)
Prevalence Mean
Difference Prevalence SD
(difference) and SD
Emad Masuadi
Point and Interval Estimates
• A point estimate is a single number,
• a confidence interval provides additional
information about variability
Lower Upper
Confidence Confidence
Point Estimate Limit (UCL)
Limit (LCL)
Width of
confidence interval
11
Emad Masuadi
Confidence Interval Estimate
• An interval gives a range of values:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from one sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
• The general formula for confidence intervals is:
Point Estimate  (Critical Value)(Standard Error) 12

Emad Masuadi
Margin of Error (E)
• Margin of Error (E): the amount added and subtracted
to the point estimate to form the confidence interval
Confidence interval for the Proportion Confidence interval for the Mean
E
𝑝^ ±𝑧 𝛼/2 √𝑝^ ¿¿¿ x  t  /2

s
n
 Data variation, σ/s: E as σ/s
 Sample size, n : E as n
 Level of confidence, 1 -  : E if 1 - 
13
Emad Masuadi
Common Levels of Confidence
• Commonly used confidence levels are 90%, 95%, and 99%
Confidence
Confidence z value,
Coefficient,
Level
1  z/2
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.576
99.8% .998 3.08
99.9% .999 3.27
14
Emad Masuadi
Estimation vs Hypothesis testing
• What is the prevalence of obesity among adult

females in Saudi Arabia?
– Confidence interval (one group) categorical variable
• Is there a difference in the prevalence of obesity
between males and females in Saudi Arabia?
– Testing hypothesis (two groups) categorical variable
• What is the mean BMI of adult females in Saudi
Arabia?
– Confidence interval (one group) numerical variable
• Is there a difference in the mean BMI between
males and females in Saudi Arabia?
– Testing hypothesis (two groups) numerical variable
Emad Masuadi
Estimation / Confidence intervals
• When the primary outcome is categorical
Parameter Value
Confidence level 95%
Prevalence From the literature review or
a pilot study or 50%
Margin of error (precision) (half of Usually 5% ()
the interval width) or 10% ()
N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Confidence level 95%
Standard deviation From the literature review or
a pilot study or (Range / 4)
Margin of error (precision) (half of the e.g. ()
interval width)
N (sample size) ?
Emad Masuadi
• Categorical variables:
– Expected Prevalence / Proportion
should be known
– If Not Known …
• Take 0.5 or 50% as the expected value
• This gives the maximum variance
• So it will give the sample size based on the
maximum variance possible
Emad Masuadi
Sample size calculation (confidence interval)
• Categorical variables (Single sample):
( za / 2 )2 pq
n=
E2
• z/2 = 1.96 for two-sided 95% confidence level
• p is the expected population prevalence
• q= 1-p
• E = (margin of error) is is half of the interval width
Emad Masuadi
• Numerical variables:
– Standard deviation (SD) or the expected
Variance of the population should be known
– If Not Known …
• Estimate the maximum possible range for
the outcome variable e.g. 110 to 190 mmHg
for systolic blood pressure
• So range is 190 – 110 = 80
• Divide the ‘maximum range’ by 4 to get
estimated value of ‘sd’ e.g. 80 / 4 = 20
Emad Masuadi
Normal Distribution Curve
Sample size calculation (confidence interval)
• Numerical variables (Single sample):
(t/2) x 
2
n= E
• t/2 = critical value for two-sided 95% confidence

level
•  is the expected standard deviation
• E (margin of error) is half of the interval width
Emad Masuadi
Hypothesis testing
• When the primary outcome is categorical
Parameter Value
Significance level 0.05
Prevalence difference From the literature review or
a pilot study
Power of the test Usually 80%
N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Significance level 0.05
Mean difference and standard From the literature or
deviation a pilot study
Power of the test Usually 80%
N (sample size) ?
Emad Masuadi
Types of errors in hypothesis testing
Truth in the population

Decision Null hypothesis is Null hypothesis is
True False
Fail to reject null Wrong decision

hypothesis Correct decision
Type II error (β)
Reject null Wrong decision

hypothesis Correct decision
Type I error (α)
Type I error: is rejecting the null hypothesis when it is actually true

Type II error: is failing to reject the null hypothesis when it is actually false
Emad Masuadi
Types of errors
• The investigator choose α prior to the study
(common values are α = 0.05 or 0.01)
• Probability of committing type I error (reject the
null hypothesis when it is right) is called α (level of
significance)
• Probability of committing type II error is called β
• Power is (1- β)
• If β = 0.20 the power is 0.8:
20% is set as maximum chance of missing an
association if it exists (incorrectly finding no
association)
Emad Masuadi
Sample size calculators
• Websites:
– http://www.raosoft.com/samplesize.html
– http://www.stat.uiowa.edu/~rlenth/Power/
– http
://www.openepi.com/SampleSize/SSProp
or.htm
– http://www.powerandsamplesize.com/
• Software: PASS from
http://www.ncss.com/
Emad Masuadi
Thank You
Questions?
Emad Masuadi

Sample Size

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample Size

Uploaded by

Copyright:

Available Formats

Sample size calculation

Presented by: Dr Aamir Omair

Dr. Emad Masuadi, BSc, MSc, MPhil, Ph.D.

1. Explain the importance of sample size

2. Identify the factors that determine the sample

3. Identify the relevant software / website for

• The size of the sample influences both the

• Smaller samples are more likely not to be

• Larger samples are more likely to detect a

Only 0.5% of y explained

Outcome Outcome Outcome Outcome

Significance Significance Confidence Confidence

Power of the Power of the

Point Estimate  (Critical Value)(Standard Error) 12

𝑝^ ±𝑧 𝛼/2 √𝑝^ ¿¿¿ x  t  /2

• Commonly used confidence levels are 90%, 95%, and 99%

• What is the prevalence of obesity among adult

• Categorical variables (Single sample):

• Numerical variables (Single sample):

• t/2 = critical value for two-sided 95% confidence

Truth in the population

Fail to reject null Wrong decision

Reject null Wrong decision

Type I error: is rejecting the null hypothesis when it is actually true

You might also like