Professional Documents
Culture Documents
SAMPLING POPULATIONS
1 2
But usually
• not feasible due to cost, time, effort, accessibility of the
whole population Census can be unrealistic, impractical, tedious, lengthy, extremely difficult and costly.
• impossible to observe the entire population
3 4
1
10/17/19
Common sense
Target Population Target population
Population to which study results may be extrapolated (Reference population) Common sense
judgement to
assess
judgement to
representativeness
assess
External validity Generalizability of study population
representativeness
Populationfromwhichthesamplewastaken Study Population Study Population
(Source population)
of study population
Sampling frame
Farm Address No. pigs
7 *has all the important characteristics of the population from which it was obtained, i.e., it should be the same or as close as possible to the true
value in the target population.
Inference (Generalization)
7 8
Clearly defined
The degree of representativeness of the sample will depend on how accurate the sampling frame is
and in getting the selected samples for the study.
iindividuals which you will observe and make measurements
Units from which statements regarding the source
populations(and possibly the target population.
9 Units not in the sampling frame have zero probability of selection.In this case, we cannot generalize our
study findings to the population of interest even if a probability sampling method was used to select the
10 •The list of all sampling units in the population from which the
sample will be drawn is called the sampling frame. There are two
sampling units. Essential for selecting the elements of the target population types :
• No member of the study
Provides information for locating and identifying the units
• No unit thatthe
from is not part of the population should be on the frame.
frame
• No member should be
duplicated on the frame
Estimates are reliable and allows calculation of confidence limits for the true
value The procedures for obtaining a sample
Sampling avoids the hassle and expense of measuring every member of the population. It should be designed to select samples that will
11 accurately describe or represent the characteristics of the total population from which they are selected.
12
2
10/17/19
SAMPLING METHODS
SAMPLING METHODS
PROBABILITY NON-PROBABILITY
Each member has a known, non-zero probability of PROBABILITY NON-PROBABILITY
being selected. Same chance of being selected Based on ‘good reasoning’ of investigator. Unequal chance of being selected
13 14
15 16
NON-PROBABILITY SAMPLING
PROBABILITY BOX 2. When is non-probability sampling methods may be useful
Each member has a known, non-zero probability of
Carries
being selected. Same chance a great
of being selectedrisk of producing biased estimates
The following situations may justify non-probability sampling method
INDIVIDUAL Group Group &Snowball sampling Convenience sampling • Demonstrate freedom from disease
Quota sampling Individual
Chain referral sampling Easy, quick or cheap to collect
• Determine if a disease is present in a population and that it is not possible to
Simple Multistage
random sampling disease, or compute estimate
Self-selection
Extreme Homogenous
17 18
3
10/17/19
PROBABILITY SAMPLING
Each member has a known, non-zero probability of being selected.
Same chance of being selected
Simple random Systematic Stratified random sampling Cluster sampling Multistage sampling
sampling random sampling Sampling strata Sampling groups Sampling in stages
Sampling at
specific interval
Typical
Critical
case
case
Self-selection
Extreme Homogenous
19 20
Simple Random Sampling EXAMPLE 1. Selecting samples by SRS using Table of Random Numbers
The simplest probability sampling
To select a SRS of 18 cattle from a herd of 900 cattle, 18 random numbers are generated between 1 and 900
using a Table of Random Numbers or s computer-generated random numbers. A list of animals (sampling
Sampling units are selected using random numbers or some other frame) is created and those corresponding to the chosen numbers are then selected. The procedure below
random selection technique (dice, cards) so that each member of the outlines the selection of random numbers from a table:
1. Randomly select a starting point and the sampling direction. Suppose the random start was the 5th entry
population has the same probability of being selected. 04181, and you choose to move across rows from left to right.
SRS of n = 5 from N = 10 2. Choose a three-digit number not greater than 900 each time until you obtain the desired sample size.
Hence, the cattle included in your study are those with ear tag nos.:
41, 817, 801 278, 705, 861, 837, 154, 535, 066, 126, 020, 049, 085, 851, 087, 143 and 810.
Samples consist of cattle nos. 1, 5, 6, 8, 9
3
1 2 1 2 3 4 5 6 7 8 9 10
5 4 1 0 0 0 1 1 0 1 1 0
7 6
Sampling frame based on cow numbers
8
9
10
SRS requires information on population size and sampling frame.
21 If random ( ) ³ 0.5
21
22
then 1
SRS of n = 5 from N = 10;
sampling fraction (n/N)= 0.5
23 24
4
10/17/19
25 26
Sample size:
10 of Breed A
40% x n 14 of Breed B 60% x n
27 28
Multistage Sampling
EXAMPLE 2.
Sampling is done randomly in two or more stages from within large groups, the
Two-stage cluster sampling
sampling unit and the method of selection are usually different at each stage. At
the first stage, sampling units may be the villages using some method. For the
second stage, the sampling frame could be the farms in the sample of the first
stage units, with sampling method that may be different from the one used for
selecting the first stage units. This process could be carried to a third stage by
selecting animals from the sample of farms.
Wh
Multistage sampling is is particularly suitable in very large populations. This is dec
often the case in which one wants to obtain information in farms from different 1. S
villages, and these villages have to be selected from different geographic units. 2.
3.
4.
Source: FAO.
29 30
5
10/17/19
Advantages Disadvantages
• Requires a complete, updated sampling
Two-stage sampling Simple random frame of sampling units
• Easy to do
Suppose you want to determine the
sampling (SRS) • Can be unfeasible for large populations
and costly
seroprevalence of hemorrhagic Systematic • Easy to draw and a lot faster than using • Danger that the sampling frame may
septicemia in carabaos in municipality random sampling SRS arrange subjects in a pattern.
X. Because of logistical difficulties, • No need for a complete prior knowledge of
you decided to conduct sampling in sampling frame before sampling
Stratified random • More precise estimates of the parameter • Expensive and complex
two stages. Based on prior calculation, sampling are obtained compared to SRS • Requires advanced knowledge of the
you determined that you have to • Guarantees representation of all strata in status of sampling units in the stratum and
sample 3 barangays and 15 carabaos the sample the stratification factor
• Obtains estimates of outcomes in each
from each selected barangay. The stratum
sampling design is shown on the right. Cluster sampling • Cheap, administratively efficient • Less precise estimate of the parameter is
• No need of sampling frame of all units in obtained
the population • Requires more sampling units than SRS
Multistage • A complete sampling frame is only needed • Less precise than SRS
sampling in the first stage; sampling frames are only • Requires more sampling units than SRS
required for selected units in subsequent
Source: FAO
sampling stage.
• Flexible and less costly than SRS
31 32
BIAS IN SAMPLING
Bias is a tendency of estimate to deviate in one direction from a true value. Bias in
sampling results to distortion of study results. It can occur during selection of
sampling units and in measurement of exposure or outcome. Examples are listed Having de
below:
select ou
Sampling bias can be due to:
• Incomplete sampling frame
Measurement bias occurs when:
• Respondents do not tell the truth
DECIDING ON now have t
• Improper sampling • Respondents do not always SUITABLE SAMPLE SIZE
• Incomplete coverage understand the questions.
• Studying volunteers only • Respondents forget
• Missing cases of short duration • Respondents give different answers
to different interviewers.
• Non-response
• Respondents may say what they
• Not representative of the population think the interviewer wants to hear.
33 34
Effects or difference
35 36
6
10/17/19
Multistage
37 38
Statistician/ σ2
Z 2 PQ (1-P)
Z 2 PQ
Epidemiologist Enter formula in Excel
n= a 2 n= a 2
Use EpiInfo –Statcalc module
d Tolerable error d
39 40
Table of sample size for estimation of proportion in a population at fixed levels of confidence and accuracy
and varying expected proportion
Box 1. Sample size for single proportion Level of Confidence
Expected 90% 95% 99%
•We want to estimate prevalence of Disease Y in a 400-cattle
Prevalence Desired absolute Desired absolute Desired absolute
herd. No idea of expected prevalence in this population. We
would like our estimate to be within +/- 5% at 95% confidence precision precision precision
10% 5% 1% 10% 5% 1% 1% 5% 10%
level. Thus:
• n = Zα 2PQ Confidence level 95% 99% 99/9 10% 24 97 2435 35 138 3457 60 239 5971
• d2 Significance level 5% 1% 0.1% 20% 43 173 4329 61 246 6147 106 425 10616
Z score 1.96 2.58 3.29 30% 57 227 5682 81 323 8067 139 557 13933
40% 65 260 6494 92 369 9220 159 637 15923
n = 1.962PQ =1.962 0.50 (1-0.50
d2 (0.05)2 = 384 cattle 50% 68 271 6764 96 384 9604 166 663 16587
n(c)= 384 = 196 cattle 65
60% 260 6494 92 369 9220 159 637 15923
70% 57 227 5682 81 323 8067 139 557 13933
80% 43 173 4329 61 246 6147 106 425 10616
90% 24 97 2435 35 138 3457 60 239 5971
41 42
7
10/17/19
43 44
Applying the finite population correction Box 4. Sample size for detecting disease or confirm
absence of disease in a finite population
n = {1 - (1 – P1)1/D} x {N - D/2} + 1
Reduce n if n >10% of N, i.e., 384/400 = 96%. n = Required sample size
Thus, N = Population size
( c ) = 384 = 196 animals are required for SRS
D = Estimated minimum number of diseased animals
1-0.96 or stratified sample in a in the population
descriptive study
(population size x minimum expected
prevalence)
P1 = Confidence level (usually 95%)
45 46
48
47 48
8
10/17/19
Factors to consider in Sample Size Determination Sample size is directly proportional to the power of
Za 2 PQ
n= Zα2 2PQ
study. The larger the sample size, the study will
n= Required Precision have a greater power to detect significant different,
Factor
effect or association Sample size
d2 (P1-P2)2 Expected variation of Desired precision Need larger sample to get smaller error
variable in the population Nature of analysis Complex multivariate statistics
2
δ
Za 2 PQ Zαδ 2 Desired level of confidence Number of δ need larger samples
n= 2 variables
d2 Heterogeneity Need larger sample to study more
d2 Design effect- often b/w diverse population
1&2. Sampling design Smaller if stratified, larger if
> accurate but costly; can Nonstatistical Statistical power -
concerns How likely it iscluster
Power - Probability that the
that
test will correctly identify a a statistical
Cost, time, facilities and manpower signficant difference, effect or difference of a given
Effect size- a measure of strength of association. magnitude will be
relationship b/w 2 variables in a
Sampling method: larger for detected, under the
population. The bigger the size of the assumption that it
effect in population, the easier it will be cluster sampling does exist
to find out Multiple variables 49
49 50
51