You are on page 1of 43

SAMPLE SIZE

DETERMINATION
BY
DR ZUBAIR K.O.
DEPT OF MEDICAL MICROBIOLOGY.NHA
MBBS(IL),SR II

1
OUTLINE
• Our take home…………….
• What is sample size?
• What is sample size determination?
• How large a sample do I need?
• What are the methods of determining it?
• What are the factors that affect it?
• Mind my language
• How do you determine it?
• How do you use it?
• A final word………………..

2
OUR TAKE HOME
At the end of this presentation, we should be able to;

 Understand the significance of sample size.


 Determine sample size.
 Understand factors that may affect sample size
 Use sample size in our research or study.

3
WHAT IS SAMPLE SIZE?
 This is the sub-population to be studied in order to make an
inference to a reference population(A broader population to
which the findings from a study are to be generalized)
 In census, the sample size is equal to the population size.
However, in research, because of time constraint and budget, a
representative sample are normally used.
 The larger the sample size the more accurate the findings from a
study.

4
 Availability of resources sets the upper limit of the sample
size.
 While the required accuracy sets the lower limit of sample
size
 Therefore, an optimum sample size is an essential
component of any research.

5
6
WHAT IS SAMPLE SIZE DETERMINATION
 Sample size determination is the mathematical estimation of
the number of subjects/units to be included in a study.
 When a representative sample is taken from a population,
the finding are generalized to the population.
 Optimum sample size determination is required for the
following reasons:
4. To allow for appropriate analysis
5. To provide the desired level of accuracy
6. To allow validity of significance test.

7
HOW LARGE A SAMPLE DO I NEED?
 If the sample is too small:
2. Even a well conducted study may fail to answer it research
question
3. It may fail to detect important effect or associations
4. It may associate this effect or association imprecisely

8
CONVERSELY
 If the sample size is too large:
2. The study will be difficult and costly
3. Time constraint
4. Available cases e.g rare disease.
5. Loss of accuracy.

Hence, optimum sample size must be determined before


commencement of a study.

9
MIND MY LANGUAGE
 Random error  Type I(a) error
 Systematic error (bias)  Type II (b) error
 Precision (reliability)  Power (1-b)
 Accuracy (Validity)  Effect size
 Null hypothesis  Design effect
 Alternative hypothesis

10
 Random error: error that occur by chance. Sources are sample
variability, subject to subject differences & measurement errors.
It can be reduce by averaging, increase sample size, repeating the
experiment.
 Systematic error: deviations not due to chance alone. Several
factors, e.g patient selection criteria may contribute. It can be
reduce by good study design and conduct of the experiment.
 Precision: the degree to which a variable has the same value
when measured several times. It is a function of random error.
 Accuracy: the degree to which a variable actually represent the
true value. It is function of systematic error.

11
12
 Null hypothesis: It state that there is no difference among
groups or no association between the predictor & the
outcome variable. This hypothesis need to be tested.

 Alternative hypothesis: It contradict the null hypothesis.


If the alternative hypothesis cannot be tested directly, it is
accepted by exclusion if the test of significance rejects the
null hypothesis. There are two types; one tail(one-sided) or
two tailed(two-sided)

13
 Type I(a) error: It occurs if an investigator rejects a null
hypothesis that is actually true in the population. The
probability of making (a) error is called as level of
significance & considered as 0.05(5%). It is specified as Za
in sample size computing. Za is a value from standard
normal distribution ≡ a. Sample size is inversely
proportional to type I error.
 Type II(b) error: it occur if the investigator fails to reject a
null hypothesis that is actually false in the population. It is
specify in terms of Zb in sample size computing. Zb is a
value from standard normal distribution ≡b

14
 Power(1-b): This is the probability that the test will correctly
identify a significant difference, effect or association in the
sample should one exist in the population. Sample size is directly
proportional to the power of the study. The larger the sample
size, the study will have greater power to detect significance
difference, effect or association.

 Effect size: is a measure of the strength of the relationship between


two variables in a population. It is the magnitude of the effect
under the alternative hypothesis. The bigger the size of the effect
in the population, the easier it will be to find.

15
 Design effect: Geographic clustering is generally used to
make the study easier & cheaper to perform.
The effect on the sample size depends on the number of
clusters & the variance between & within the cluster.
In practice, this is determined from previous studies and
is expressed as a constant called ‘design effect’ often
between 1.0 &2.0. The sample sizes for simple random
samples are multiplied by the design effect to obtain the
sample size for the cluster sample.

16
 odds ratio is a measure of effect size, describing the
strength of association or non-independence between two
binary data values.

 relative risk (RR) is the risk of an event (or of developing


a disease) relative to exposure. Relative risk is a ratio of the
probability of the event occurring in the exposed group
versus a non-exposed group.

17
POWER ANALYSIS
 When the estimated sample size can not be included in a
study, post-hoc power analysis should be carried out.
 The probability of correctly rejecting the null hypothesis is
equal to 1 – b, which is called power. The power of a test
refers to its ability to detect what it is looking for.
 the power of a test is our probability of finding what we are
looking for, given its size.
 post-hoc power analysis is done after a study has been
carried out to help to explain the results if a study which did
not find any significant effects.

18
AT WHAT STAGE CAN SAMPLE SIZE BE
ADDRESSED?

It can be addressed at two stages:


2. Calculate the optimum sample size required during the
planning stage, while designing the study, using appropriate
approach & information on some parameters.
3. Or through post-hoc power analysis at the stage of
interpretation of the result.

19
APPROACH FOR ESTIMATING SAMPLE
SIZE/POWER ANALYSIS
 Approaches for estimating sample size and performing power
analysis depend primarily on:
2. The study design &
3. The main outcome measure of the study

There are distinct approaches for calculating sample size for


different study designs & different outcome measures.

20
1. THE STUDY DESIGN
 There are many different approaches for calculating the sample
size for different study designs. Such as case control design,
cohort design, cross sectional studies, clinical trials,
diagnostic test studies etc.
 Within each study design there could be more sub-designs and
the sample size calculation will vary accordingly.
 Therefore, one must use the correct approach for computing the
sample size appropriate to the study design & its subtype.

21
2.PRIMARY OUTCOME MEASURE
1⁰ outcome measure is usually reflected in the 1⁰ research
question of the study & also depend on the study design.
 For estimating the risk in control study, it will be the odds
ratio, while for cohort study it will be the relative ratio.
 For case control study, it could be the difference in
means/proportions of exposure in case & controls,
crude/adjusted odds ratio etc.
 Hence, while calculating sample size, one of these
1⁰outcome measures has to be specified b/c there are
distinct approach for calculating the sample size

22
statistical inference from the study
results

In addition, there are also different procedure for calculating


sample size for two approaches of drawing statistical inference
from the study result i.e
2. Estimation (Confidence interval approach)
3. Hypothesis testing(Test of significance approach)
A researcher needs to select the appropriate procedure for
computing the sample size & accordingly use the approach of
drawing a statistical inference subsequently.

NB: Test of significance: Chi-squared, T-test, Z-test, F-test, P-


23 value
ADDITIONAL PARAMETERS
Depending upon the approach chosen for calculating the sample
size, one also needs to specify some additional parameters such
as;
 Hypothesis
 Precision
 Type I error
 Type II error
 Power
 Effect size
 Design effect

24
PROCEDURE FOR CALCULATING
SAMPLE SIZE.

There are four procedures that could be used for calculating sample
size:
2. Use of formulae
3. Ready made table
4. Nomograms
5. Computer software

25
USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION & POWER ANALYSIS
 There are many formulae for calculating sample size &
power in different situations for different study designs.
 The appropriate sample size for population-based study is
determined largely by 3 factors
3. The estimated prevalence of the variable of interest.
4. The desired level of confidence.
5. The acceptable margin of error.

26
 To calculate the minimum sample size required for accuracy, in
estimating proportions, the following decisions must be taken:
2. Decide on a reasonable estimate of key proportions (p) to be
measured in the study
3. Decide on the degree of accuracy (d) that is desired in the study.
~1%-5% or 0.01 and 0.05
4. Decide on the confidence level(Z) you want to use. Usually
95%≡1.96.
5. Determine the size (N) of the population that the sample is
supposed to represent.
6. Decide on the minimum differences you expect to find statistical
significance.

27
 For population >10,000.
n=Z2pq/d2

n= desired sample size(when the population>10,000)


Z=standard normal deviate; usually set at 1.96(or a~2), which correspond to
95% confidence level.
p=proportion in the target population estimated to have a particular
characteristics. If there is no reasonable estimate, use 50%(i.e 0.5)
q=1-p(proportion in the target population not having the particular
characteristics)
d= degree of accuracy required, usually set at 0.05 level( occasionally at 2.0)

28
 E.g if the proportion of a target population with certain
characteristics is 0.50, Z statistics is 1.96 & we desire
accuracy at 0.05 level, then the sample size is

n=(1.962)(0.5)(0.5)/0.052
n=384.

29
If study population is < 10,000

nf=n/1+(n)/(N)

nf= desired sample size, when study population <10,000


n= desired sample size, when the study population > 10,000
N= estimate of the population size

Example, if n were found to be 400 and if the population size were estimated at 1000,
then nf will be calculated as follows

nf= 400/1+400/1000
nf= 400/1.4
nf=286

30
SAMPLE SIZE FORMULA FOR COMPARISON OF GROUPS

If we wish to test difference(d) between two sub-samples regarding a


proportion & can assume an equal number of cases(n1=n2=n’) in two sub-
samples, the formula for n’ is

n’=2z2pq/d2

E.g suppose we want to compare an experimental group against a control group


with regards to women using contraception. If we expect p to be 40 & wish
to conclude that an observed difference of 0.10 or more is significant at the
0.05 level, the sample size will be:
n’= 2(1.96)2(0.4)(0.6)/0.12
=184
Thus, 184 experimental subject & another 184 control subjects are required.

31
USE OF READYMADE TABLE FOR SAMPLE
SIZE CALCULATION
 How large a sample of patients should be followed up if an
investigator wishes to estimate the incidence rate of a disease to
within 10% of it’s true value with 95% confidence?
 The table show that for e=0.10 & confidence level of 95%, a
sample size of 385 would be needed.
 This table can be used to calculate the sample size making the
desired changes in the relative precision & confidence level .e.g
if the level of confidence is reduce to 90%, then the sample size
would be 271.
 Such table that give ready made sample sizes are available for
different designs & situation

32
33
USE OF NOMOGRAM FOR SAMPLE SIZE
CALCULATION

 For use of nomogram to calculate the sample size, one


needs to specify the study(group 1) & the control
group(group 2). It could be arbitrary or based on study
design; the nomogram will work either way.
 The researcher should then decide the effect size that is
clinically important to detect. This should be expressed
in terms of % change in the response rate compared with
that of the control group.

34
 E.g if 40% of patients treated with standard therapy are
cured and one wants to know whether a new drug can cure
50%, one is looking for a 25% increase in cure rate .
(50%-40%/40% = 25% )

35
36
USE OF COMPUTER SOFTWARE FOR SAMPLE
SIZE CALCULATION & POWER ANALYSIS
The following software can be used for calculating sample size
& power;
 Epi-info
 nQuerry
 Power & precision
 Sample
 STATA
 SPSS

37
Epi-info for sample size determination
 In STATCALC:
 1 Select SAMPLE SIZE & POWER.
 2 Select POPULATION SURVEY.
 3 Enter the size of population (e.g. 15 000).
 4 Enter the expected frequency (an estimate of the true
prevalence, e.g.80% ± your minimum standard).
 5 Enter the worst acceptable result (e.g. 75%) i.e the margin
of error is 5%

38
How to use sample size formulae
Steps:
1st Formulate a research question
2nd Select appropriate study design, primary outcome measure,
statistical significance.
3rd use the appropriate formula to calculate the sample size.

39
Finally
 Sample size determination is one of the most essential
component of every research/study.
 The larger the sample size, the higher the degree accuracy,
but this is limit by the availability of resources.
 It can be determined using formulae, readymade table,
nomogram or computer software.

40
STILL CONFUSED………………………..

Smart people don’t do it alone…………………

Call a statistician
•Sample selection
41 •Sample size determination
•Analysis of data
42
References
 Research methodology, 2004, M.O. Araoye; sample size
determination, page 117
 Research methodology, 2004,Zodpey SP ijvl.com
 Wikipedia, sample size determination

43