You are on page 1of 47

SAMPLING AND SAMPLE SIZE

DETERMINATION

Dr. Anuji Gamage MBBS,MSc., MD (Community Medicine),


Grad. Cert. in Economics, MSc. (Health Economics and Policy)(Australia),
Senior Lecturer/Consultant Community Physician,
Dept. of Public Health and Family Medicine
Sir John Kotalawela Defence University
Ratmalana
OUTLINE

• Sampling
• Selecting cases
• Sample size determination
SAMPLING
WHAT IS SAMPLING ?

• A sample is a selected subset of the population


that is used to gather information about the
population

• Sampling is the process of selecting a number of


units from the population
CONCEPTS BEHIND SAMPLING

• Population
• Sampling criteria
• Sampling frame
a) Makes the research of any type and size
manageable;
b) Significantly saves the costs of the research;
c) Results in more accurate research findings;
d) Provides an opportunity to process the
information in a more efficient way;
e) Accelerates the speed of primary data collection.

6
WHAT IS A POPULATION ?

• The term population (or universe) refers to the whole collection of


units or elements (Persons, Records, or Events)
• A population is all the units/ people (or animals, plants, or anything
else) that researchers are interested in finding out about.
• Studying everyone in a population would take too long, cost too much
or be impossible.
• Instead of studying the whole population, researchers study smaller
samples of people from the population they are interested in.
• The sample chosen should be representative of the study population
WHAT IS A SAMPLE ?

• A Sample is the group of people studied by the


researcher
• a sample of people that is very similar to the range
of people within the population
• The findings of the sample are generalised to the
population.
• A sample that is sufficiently similar to the
population in every way that might be important is
called a representative sample.
SAMPLING METHODS

• The approach the researcher uses to select the


sample from the population

• Probability
• Non-probability
PROBABILITY (RANDOM) SAMPLING
METHODS

• Every member of population has a known chance of participating


in the study and this probability can be accurately determined.

• Simple random sampling


• Systematic sampling
• Stratified random sampling
• Cluster sampling
• Multistage sampling
SIMPLE RANDOM SAMPLING

• People are randomly chosen from a population


• Each person in the population has the same chance of being chosen
• A list of all members of population is prepared. Each element is marked with
a specific number (suppose from 1 to N).
• n items are chosen among a population size of N.
• This can be done either with the use of random number tables or random
number generator software. The latter option is more preferable as the
selection of random samples can be aided by software such as Research
Randomizer and Stat Trek. In this way researcher bias can be minimized.
SYSTEMATIC SAMPLING

• In systematic sampling (also called systematic


random sampling) every Nth member of
population is selected to be included in the study.
SYSTEMATIC SAMPLING

• Label each member of the sample group with a


unique identification number (ID).
• Calculate the sampling fraction by dividing the
sample size to the total number of the population:
• The first sample has to be chosen in a random
manner.
• Additional members of sample group are chosen
by recruiting each Nth subject
SAMPLING FRAME

• List of units from which sample is drawn


• Defines your population
• E.g., List of members of organization or
community
• Ideally you’d like to list all members of your
population as your sampling frame
• Randomly select your sample from that list
• Often impractical to list entire population
SAMPLING FRAMES FOR SURVEYS

• Limitations of the telephone book:


• Misses unlisted numbers
• Class bias:
• Poor people may not have phone
• Less likely to have multiple phone lines
• Most studies use a technique such as Random
Digit Dialing as a surrogate for a sampling frame
STRATIFIED RANDOM SAMPLING

• Sometimes researchers are interested in


understanding more about the specific sub-groups
within populations, such as different ethnic groups
or age groups
• In stratified random sampling, researchers select
groups (or 'strata') and randomly choose
participants from within those groups
• This method ensures the sample contains enough
people from each group that the researchers are
interested in, which allows researchers to study
differences within and between those grou
CLUSTER SAMPLING

• If a population is spread across a large geographical area,


like a large city or country, it might be easier to use cluster
sampling than to sample from the whole population
• The population is divided into areas called clusters, and
researchers randomly select which clusters to include in
the study
• Everyone in each cluster is asked to take part in the
research, so the sample represents the diversity of different
people within the each area
• Cluster sampling is a quicker and easier way to get
a representative sample, but there is a higher chance of
error than with probability sampling
MULTI-STAGE SAMPLING

• Multi-stage sampling (also known as multi-stage cluster sampling) is a more complex form
of cluster sampling which contains two or more stages in sample selection.
• In simple terms, in multi-stage sampling large clusters of population are divided into
smaller clusters in several stages in order to make primary data collection more
manageable.
NONPROBABILITY (NONRANDOM)
SAMPLING

• Samples are selected on non-random manner, therefore not each population


member has a chance to participate in the study.

• Convenience (accidental) sampling


• Quota sampling
• Purposive sampling
• Snowball sampling
CONVENIENCE SAMPLING

• Convenience sampling is a type of nonprobability


sampling in which people are sampled simply
because they are "convenient" sources of data for
researchers
QUOTA SAMPLING

• Sample group members are selected on the basis


of specific criteria
• Divide the population in to mutually exclusive sub-
groups
• Selection from the sub-groups is based on non-
random sampling methods
• In quota sampling, researchers use non-random
sampling methods to gather data from one
stratum until the required quota fixed by the
researcher is fulfilled.
PURPOSIVE SAMPLING

• Purposive sampling (also known as judgment, selective


or subjective sampling) is a sampling technique in
which researcher relies on his or her own judgment
when choosing members of population to participate
in the study

• Elements selected for the sample are chosen by the


judgment of the researcher.
SNOWBALL SAMPLING

• Used when characteristics to be possessed by samples are rare and difficult to find.
• Involves primary data sources nominating another potential primary data sources to be used
in the research
• Establish a contact with one or two initial cases from the sampling frame. This stage is usually
the most difficult one.
• Request the initial cases to identify more cases
• Ask new cases to identify further cases (and so on)
• Stop when:
• a) Your pre-specified sample size has been completed;
• b) There are no further cases left;
• c) Pursuing further cases will make the project unmanageable due to the large size.
TASKS

• Advantages and Disadvantages


• Examples for each sampling technique
SAMPLE SIZE
HOW LARGE A SAMPLE DO I NEED ?

The answer depends on


• The aim of the study
• The nature of the study
• The scope of the study
• The expected result

All should be reviewed considered carefully


WHEN DO WE NEED SAMPLE SIZE
CALCULATIONS

• Sample size calculations are required for the vast


majority of quantitative studies.
• Sample size calculations are not required for
qualitative research
• Sample size calculations may not be required for
certain preliminary pilot studies
• Sample size calculations are not necessary for
costing studies
SAMPLE SIZE SHOULD BE

big enough’ to
• Detect effect of expected magnitude
• To be also statistically significant
not be ‘too big’
Where we spend a lot of resources
HOW TO CALCULATE THE SAMPLE SIZE?

• Use formulae and calculate


• Use appropriate software
• Winpepi, OpenEpi
• Use published tables
• Lwanga & Lemeshow
FACTORS THAT EFFECT THE SAMPLE SIZE
CALCULATION
• The variables of interest in your study, including the type of data*
• The desired power*
• The desired significance level*
• The effect size of clinical importance*
• The standard deviation of continuous outcome variables
• Whether analysis will involve one- or two-sided tests*
• Aspects of the design of your study: e.g. is your study ....
• a simple randomised controlled trial (RCT)
• a cluster randomised trial
• an observational study
• a prevalence study
• a study measuring sensitivity and specificity
• does your study have paired data?
• does your study include repeated measures?
• are groups of equal sizes?
• are the data hierarchical?
TYPE OF DATA

• That we know the objectives of the study and our “ outcomes”

• Outcomes – 3 groups
a) two alternatives exist - Case/ dead/ alive / vaccinated
b) Two mutually exclusive alternatives- blood groups , religion
c) continuous variables- height / weight / BP

• The statistical method for sample size detection will depend on the outcome
we decided
THE DESIRED POWER

• Power is the probability that the null hypothesis will be correctly rejected i.e.
rejected when there is indeed a real difference or association.
• It can also be thought of as "100 minus the percentage chance of missing a real
effect" - therefore the higher the power, the lower the chance of missing a real
effect.
• Power is typically set at 80%, 90% or 95%. Power should not be less than 80%.
• If it is very important that the study does not miss a real effect, then a power of
90% or more should be applied.
THE DESIRED SIGNIFICANCE LEVEL

• The significance level is a cut-off point for the p-value, below which the null
hypothesis will be rejected and it will be concluded that there is evidence of an
effect.
• The significance level is typically set at 5%
• If the observed p-value is smaller than 5% then there is only a small probability
that the study could have observed the data it did if there was truly no effect,
and so it would be concluded that there is evidence of a real effect.
ERRORS ASSOCIATED WITH
SIGNIFICANCE TESTS

• Rejecting a true null hypothesis


• Type I error (α error)
• Significance
• The probability of this is given by the P value

• Failing to reject a false null hypothesis


• Type II error (β error)
• 1 – β is known as the power of a test
THE EFFECT SIZE OF CLINICAL
IMPORTANCE

• This is the smallest difference between the group means or proportions (or
odds ratio/relative risk closest to unity) which would be considered to be
clinically or biologically important.

• The sample size should be set so that if such a difference exists, then it is very
likely that a statistically significant result would be obtained.
WHETHER ANALYSIS WILL INVOLVE
ONE- OR TWO-SIDED TESTS

• In a two-sided test, the null hypothesis states there is no effect, and the
alternative hypothesis (often implied) is that a difference exists in either
direction.

• In a one-sided test the alternative hypothesis does specify a direction, for


example that an active treatment is better than a placebo, and the null
hypothesis then includes both no effect and placebo better than active
treatment.
IN ADDITION TO THE ABOVE

• Non-response is added
• Design effect

Basic formulae to calculate sample size are based on the assumption that simple
random sampling is being used.

A design effect(DEFF) is an adjustment made to find a survey sample size, due to


a sampling method (e.g. cluster, respondent driven sampling, )
SAMPLE SIZE CHECKLIST

• Descriptive study to estimate mean


• Estimate of the standard deviation
• Acceptable width of the (usually 95%) confidence interval

• Descriptive study to estimate proportion


• Estimate of the proportion to be estimated!
• Acceptable width of the (usually 95%) confidence interval
SAMPLE SIZE CHECKLIST

• Analytical study to compare two means


• Estimate of the two standard deviations
• Expected difference between the means
• Power (80% or 90%)
• Significance (0.05 or 0.01)

• Analytical study to compare two proportions


• Estimate of the two proportions
• Or one proportion and the OR / RR
• Power (80% or 90%)
• Significance (0.05 or 0.01)
SAMPLE SIZE CHECKLIST


DESIGN EFFECT

• Basic formulae to calculate sample size are based on the assumption that
simple random sampling is being used.
• In general for a given sample size the standard error tends to be higher for
other sampling methods compared to simple random sampling.
• Sampling scheme should be taken into consideration during analysis
X= Zα/2 *p*(1-p)
2 / E2

• Z=. Standard normal distribution reflecting the 95% Confidence interval


For this value 1.96 is considered

• P = population proportion

• E= Desired margin of error


CRITIQUE THE SAMPLE

• What is the study population?


• What was the sampling frame?
• Were the inclusion and exclusion criteria identified?
• What sampling methods were used?
• Was there rationale for the sampling method?
• What was the response rate?
• Was there a power analysis?
• Was the sample large enough?
• Were the characteristics of the sample described?
• Was the sample representative of the population they were studying?
• Who is the sample generalizable to?
TASK

• Calculate the sample size for a determine the prevalence of DM in Sri Lanka

• Knowledge of NCD among school students is 40%. Calculate the sample size
DESCRIPTIVE STUDY TO ESTIMATE MEAN

• Z=. Standard normal distribution reflecting the 95% Confidence interval


• For this value 1.96 is considered
• s = Standard deviation of the outcome variable
• For this value standard deviation of the luminal arterial area of the
umbilical cords of children born to preeclamptic mothers was considered
• Value obtained from literature for this value is 401758 µm2 (Shaima M.
et al 2016)
• E= Desired margin of error.
CHECKLIST FOR THE RESEARCH

• What is your study population


• The inclusion and exclusion criteria
• Availability of a sample frame
• What sampling methods will be used?
• What is the rationale for the sampling method?
• The sample size
• Who is the sample generalizable to?
• Discuss strengths and weaknesses of sampling method

You might also like