You are on page 1of 9

10/17/19

SAMPLING POPULATIONS

1 2

CENSUS Imagine taking blood samples from all


cattle all over the country and testing
Questionnaire on EBL
risk factors administered
Ideally, sera for the presence of Abs vs. to cattle raisers in the
we require the study of WHY SAMPLE?
enzootic bovine leukosis (EBL) entire country!
each member in the
population of interest.

But usually
• not feasible due to cost, time, effort, accessibility of the
whole population Census can be unrealistic, impractical, tedious, lengthy, extremely difficult and costly.
• impossible to observe the entire population

3 4

We ordinarily carry out investigation on samples whereby a subset of the population is


observed for the purpose of describing, estimating or inferring information about the
total population from which the sample was selected.
We ordinarily carry out investigation on:
SAMPLE CENSUS
SAMPLE POPULATION Feasibility Yes Not always
asubsetof thepopulation all members in the population of interest Each memb
Speed for obtaining results Fast Slow
POPULATION and SAMPLES

Cost, manpower Less More


* More Less
ENCE
INFER
Accuracy Very precise and reliable estimate Less precise & reliable
LING of population parameters
SAMP • Allow obtaini
Describing quickly
Estimating Census:.
• Less expensi
• Results may b
*Generalization Sampling: a segment of the population, calledacurate the due t
efficient use o
than a sample, is ofobserved for the purpose of describing,
conducted, a sample provides reasonably accurate and
If representative, reliable
allows information
the Calculation of confidence limits and use of
5 6 errors, biases, problems, design
population. Sampling allows relatively more
possible
time
statistical and
tests to effort
make inference census toparameter
about the
flaws, etc., thus better accuracy.
of estimating
interest and to • Probability sa
compare
obtaining information from a sample is cheaper andcharacteristics
quicker thanofthatdifferent subpopulations
a census. or inferring information about theprob estimate
total population from which the sample wasinferences to
other popula
selected

1
10/17/19

Common sense
Target Population Target population
Population to which study results may be extrapolated (Reference population) Common sense
judgement to
assess
judgement to
representativeness
assess
External validity Generalizability of study population
representativeness
Populationfromwhichthesamplewastaken Study Population Study Population
(Source population)
of study population
Sampling frame
Farm Address No. pigs

1. Macabenta Farms 8 Ipil St., Bgy. Tanuan 40


For inference about the reference
2. Golden Pig Farms 14 Buhay St., Bgy Diwa 3 to be valid, i.e., generalizations c
3. Country Pig Farm 1 Ubos St., Bgy Carne 15
made:
4. Constancio 11 Maaso St., Bgy Awa 9 Internal validity 1. Study population must be simi
5. Bountry Pig Farm 4 Maitim, Bgy More 8 Are the study
6. Green Harvest 2 Maasim St., Bgy Less 7 conclusions correct reference population
The units finally selected from the study population 7. Organic Farm
Sampling
6 None St., Bgy Nani 10 for the animals in
the sample?
2. Sample must be similar to stud
population , i.e., representativ
forinclusioninthestudy Study Sample Study Sample
Thosewhotookpartinthestudy Representative*

7 *has all the important characteristics of the population from which it was obtained, i.e., it should be the same or as close as possible to the true
value in the target population.

Inference (Generalization)
7 8
Clearly defined
The degree of representativeness of the sample will depend on how accurate the sampling frame is
and in getting the selected samples for the study.
iindividuals which you will observe and make measurements
Units from which statements regarding the source
populations(and possibly the target population.

TYPES OF SAMPLING FRAME


Desirable properties of sampling frame Type of frame Examples
List frame List of slaughterhouses, households, clinics, farms, animal producers,
Pig holdings in Sta. Maria, Bulacan
As of October 4, 2019 • Complete and accurate telephone
The frame chosen will impact directory;
on the selected Agriculture
study population.cooperatives
For instance, or notational list e.g.
if the investigator is interested
buffaloesinarriving
determining
in thethe humoral
abattoir antibody
during a certain day.
Farm Address No. pigs
• Up-to-date response to FMD of vaccinated pigs in Batangas and if samples were drawn
1. Macabenta Farms 8 Ipil St., Bgy. Tanuan 40
from a listframe
Area of members ofANational
list ofHoggeographic
Raisers’ Association in that province, units in a hierarchical
areas/administrative
• Unique identification for
2. Golden Pig Farms 14 Buhay St., Bgy Diwa 3
3. Country Pig Farm 1 Ubos St., Bgy Carne 15 the information obtainedarrangement
will not mirror the truelist
e.g., level
ofofregions,
antibody response
provinces, municipalities, villages,
4. Constancio Farm 11 Maaso St., Bgy Awa 9 physically locating a unit & for in the pig population of Batangas. The sample will exclude non-member pig
satellite images and cover, areal photographs
5. Bountry Pig Farm 4 Maitim, Bgy More 8
sample selection farms, both commercial and backyard enterprises.
6. Green Harvester 2 Maasim St., Bgy Less 7
Multi-stage A list of geographic areas/administrative units in a hierarchical
7. Organic Farm 6 None St., Bgy Nani 10
• Having supplementary frame arrangement e.g., list of regions, provinces, municipalities, villages;
information for sample selection satellite images and cover, areal photographs
and estimation In practice, it is not always easy to obtain a sampling frame. A sampling frame may be
available but can be incomplete.

9 Units not in the sampling frame have zero probability of selection.In this case, we cannot generalize our
study findings to the population of interest even if a probability sampling method was used to select the
10 •The list of all sampling units in the population from which the
sample will be drawn is called the sampling frame. There are two
sampling units. Essential for selecting the elements of the target population types :
• No member of the study
Provides information for locating and identifying the units

population should be excluded


Provides quantitative information for estimation of population parameters based on sample observations.

• No unit thatthe
from is not part of the population should be on the frame.
frame
• No member should be
duplicated on the frame

DESIRABLE CHARACTERISTICS OF A SAMPLE

Representative of study population


Mirror (similar) the important characteristics of the population
from which it is drawn to allow inference back to that population

Produces unbiased estimates


Estimates are accurate The goal of sampling is for the sample to
mirror the characteristics in the target
Produces estimates of known precision population.

Estimates are reliable and allows calculation of confidence limits for the true
value The procedures for obtaining a sample

Sampling avoids the hassle and expense of measuring every member of the population. It should be designed to select samples that will
11 accurately describe or represent the characteristics of the total population from which they are selected.
12

2
10/17/19

SAMPLING METHODS
SAMPLING METHODS
PROBABILITY NON-PROBABILITY
Each member has a known, non-zero probability of PROBABILITY NON-PROBABILITY
being selected. Same chance of being selected Based on ‘good reasoning’ of investigator. Unequal chance of being selected

Snowball Convenience • Representative of population • Unrepresentative of the population


Quota sampling
sampling sampling
Simple Multstage Purposive Haphazard • Each unit has a known non-zero • Each unit doesn’t have an equal chance
random sampling sampling sampling chance of being selected. of being selected
sampling
Systematic Cluster
random sampling • Obtain estimates that are as
sampling close as possible to the true
Judgement sampling value of the target population
Stratified
random • Formal, impartial way of sampling
sampling
• Sampling error can be calculated • Sampling error remains unknown.

13 14

BOX 1. Probability vs. Non-probability sampling


Probability sampling is a formal, In non-probability sampling, every unit In probability sampling, every animal in the population has a fixed and known
impartial way of sampling in which in a population does not have an equal probability of being selected to be part of the sample.
every unit in the population has a chance of being selected. sampling of
known, equal non-zero chance of units is left to the investigator. This method allows the investigator to generalize the results from the sample to
being included in the sample, and Therefore, it is not possible to estimate the population, which is usually the major reason for doing a study. Second, it can
that chance can be quantified. As population parameters from sample tell the researcher the margin of error that could be expected from these
such, representative samples are statistics and to know how different estimates, that is, how far off the estimates can be. Most statistical tests are
based on the assumption of some sort of random sampling.
obtained, and results of the study the sample is from the population.
can be inferred to the population Hence, the findings cannot be
from which the sample was derived. generalized from the sample to the When probability sampling is not used, the ability to generalize the results from
target population. the sample to the population is questionable.
Provide prevalence estimates and
Easier to perform and cheaper
CI (to show how confident we are Not representative, cannot provide
that our estimates are correct). prevalence estimates
It is time consuming, expensive and
logistically more demanding

15 16

NON-PROBABILITY SAMPLING
PROBABILITY BOX 2. When is non-probability sampling methods may be useful
Each member has a known, non-zero probability of
Carries
being selected. Same chance a great
of being selectedrisk of producing biased estimates
The following situations may justify non-probability sampling method
INDIVIDUAL Group Group &Snowball sampling Convenience sampling • Demonstrate freedom from disease
Quota sampling Individual
Chain referral sampling Easy, quick or cheap to collect
• Determine if a disease is present in a population and that it is not possible to
Simple Multistage
random sampling disease, or compute estimate

sampling Purposive sampling


Sampling Haphazard sampling
Sampling for a specific purpose
in stages No fixed purpose or reason in an attempt Samples include those animals most likely to be infected so as to
Systematic to imitate random sampling increase your likelihood of detecting disease.
random
sampling
Sampling Self-selection
at Cluster In the above situations, statistical inference about the target population is
sampling
Sampling volunteers
interval Judgement sampling
Stratified Based on own existing knowledge or professional judgement Typical NOT required so that a representative sample is less important.
random Sampling groups Critical
sampling case
case Non-probability sampling has been used for preliminary and exploratory researches or
Sampling
strata where it is the only feasible sampling method.

Self-selection
Extreme Homogenous
17 18

3
10/17/19

PROBABILITY SAMPLING
Each member has a known, non-zero probability of being selected.
Same chance of being selected

Individual Group Group & individual

Simple random Systematic Stratified random sampling Cluster sampling Multistage sampling
sampling random sampling Sampling strata Sampling groups Sampling in stages
Sampling at
specific interval
Typical
Critical
case
case

Self-selection
Extreme Homogenous
19 20

Simple Random Sampling EXAMPLE 1. Selecting samples by SRS using Table of Random Numbers
The simplest probability sampling
To select a SRS of 18 cattle from a herd of 900 cattle, 18 random numbers are generated between 1 and 900
using a Table of Random Numbers or s computer-generated random numbers. A list of animals (sampling
Sampling units are selected using random numbers or some other frame) is created and those corresponding to the chosen numbers are then selected. The procedure below
random selection technique (dice, cards) so that each member of the outlines the selection of random numbers from a table:
1. Randomly select a starting point and the sampling direction. Suppose the random start was the 5th entry
population has the same probability of being selected. 04181, and you choose to move across rows from left to right.
SRS of n = 5 from N = 10 2. Choose a three-digit number not greater than 900 each time until you obtain the desired sample size.
Hence, the cattle included in your study are those with ear tag nos.:
41, 817, 801 278, 705, 861, 837, 154, 535, 066, 126, 020, 049, 085, 851, 087, 143 and 810.
Samples consist of cattle nos. 1, 5, 6, 8, 9

3
1 2 1 2 3 4 5 6 7 8 9 10
5 4 1 0 0 0 1 1 0 1 1 0
7 6
Sampling frame based on cow numbers
8
9
10
SRS requires information on population size and sampling frame.

21 If random ( ) ³ 0.5
21
22
then 1
SRS of n = 5 from N = 10;
sampling fraction (n/N)= 0.5

Systematic Random Sample


Box 1. Generating random numbers using Microsoft Excel
Random selection of sampling units at a pre-determined interval after
Open the program and type into the cell the following characters:
= RAND()
randomly selecting the first member as the starting point. More practical
than SRS esp. when a sampling frame is not available. Prior identification
Press ENTER key and it will produce a random number in that cell.
of animals is not necessary. For the systematic random sampling, we need
to know the approximate population.
Copy the formula throughout a selection of cells and it will produce random
numbers between 0 and 1.
• Ensures sample evenly distributed over study population
To obtain random numbers without the decimal and not more than the • Problems
population size, for example, you wanted random numbers from 1 to 900, – if group is very heterogeneous and sampling fraction large
you could enter the following characters: Systematic sampling w
= INT (900*rand())+1
– if ordering related to sampling step
inappropriate, as all stu
week (e.g., Tuesdays on
The INT eliminates the digits after the decimal, the 900* creates the range slaughter cattle from B
to be covered, and the +1 sets the lowest number in the range.
24

23 24

4
10/17/19

EXAMPLE 2. Selecting samples by systematic random sampling Stratified Random Sampling


Suppose you determined that 20 goats will be needed to estimate the prevalence and intensity of Selection of sampling units from separate, non-overlapping subgroups (strata)
haemonchosis in a herd of 200 goats. You learned that goats do not have ear tags, hence, you decided to according to some important characteristic associated with the outcome of interest by
carry out systematic sampling by selecting the animals in a chute. The following steps can be followed: probability sampling method until the required sample size is attained.
W ith SRS, we may e
subgroup or anothe
1. Determine the sampling interval, population size/ desired sample size (N/n): 200/20 = 4. Thus, every us to analyze the ef
4th animal is selected until a total sample is reached. It requires not only sampling frame of primary sampling units to select from, but also have equal number
2. Decide the random starting point, selecting a number between 1 and 4. The numbers 1 to 4 are each for each stratum. When the number of units is drawn from the strata proportional to maximize the powe
Conversely, we may
written on a piece of paper, and a number is drawn. Suppose the random number “4” was picked. the stratum size, the method is called proportionate stratified random sampling. The equivalent to the ge
Instruct the owner to take the animals in a herd, one at a time in the thchute. The 4th animal passing probability of selection uses the formula: variables, such as a
through the chute would be the first animal chosen, and then every 4 animal. Note that if the first In stratified random
various levels or str
animal is chosen deliberately, the technique becomes a form of non-probability sampling. Stratum sample size = stratum size x sample size from the stratum in
4. Starting from the 4th cattle that passed through the chute, select every 4th animal. Population size Reduces variance
only within
variance
This strategy is used when the herd size is known in advance: large herds are more
likely to be selected than small herds. In disproportionate stratified sampling, the
numbers of units are drawn using different sampling fractions in the strata. A larger
percentage is taken from those bigger in size than others. This method is illustrated in Equal (same n
the next slide. stratum) alloca
Allows easy ac
interest

25 26

EXAMPLE 2. Stratified random sampling using disproportionate sampling Cluster sampling


Suppose you would like to estimate the seroprevalence of brucellosis in a goat herd and In cluster sampling, the population is divided into groups or clusters.
determined that a sample of 24 goats is needed. You read in a journal that the burden Examples of clusters are pens, litters, farms, herds, flocks and ponds. A
of the disease is influenced by breed. You decided to carry out stratified random
sampling by (1) listing the sheep belonging to each breed and (2) taking a simple number of clusters are selected randomly, and then all units within
random sample of animals from each breed. The sample size in each stratum is selected clusters are sampled.
obtained by multiplying the total sample size with the proportion in the herd belonging
to each breed.
After the clusters are established, a systematic, simple or stratified
random sample of the clusters is drawn and the members of the chosen
clusters are sampled. The selected clusters are used to represent the
population.
40% Breed A Stratify by breed 60% Breed B

Sample size:
10 of Breed A
40% x n 14 of Breed B 60% x n

27 28

Multistage Sampling
EXAMPLE 2.
Sampling is done randomly in two or more stages from within large groups, the
Two-stage cluster sampling
sampling unit and the method of selection are usually different at each stage. At
the first stage, sampling units may be the villages using some method. For the
second stage, the sampling frame could be the farms in the sample of the first
stage units, with sampling method that may be different from the one used for
selecting the first stage units. This process could be carried to a third stage by
selecting animals from the sample of farms.
Wh
Multistage sampling is is particularly suitable in very large populations. This is dec
often the case in which one wants to obtain information in farms from different 1. S
villages, and these villages have to be selected from different geographic units. 2.

3.
4.

Source: FAO.

29 30

5
10/17/19

Advantages Disadvantages
• Requires a complete, updated sampling
Two-stage sampling Simple random frame of sampling units
• Easy to do
Suppose you want to determine the
sampling (SRS) • Can be unfeasible for large populations
and costly
seroprevalence of hemorrhagic Systematic • Easy to draw and a lot faster than using • Danger that the sampling frame may
septicemia in carabaos in municipality random sampling SRS arrange subjects in a pattern.
X. Because of logistical difficulties, • No need for a complete prior knowledge of
you decided to conduct sampling in sampling frame before sampling
Stratified random • More precise estimates of the parameter • Expensive and complex
two stages. Based on prior calculation, sampling are obtained compared to SRS • Requires advanced knowledge of the
you determined that you have to • Guarantees representation of all strata in status of sampling units in the stratum and
sample 3 barangays and 15 carabaos the sample the stratification factor
• Obtains estimates of outcomes in each
from each selected barangay. The stratum
sampling design is shown on the right. Cluster sampling • Cheap, administratively efficient • Less precise estimate of the parameter is
• No need of sampling frame of all units in obtained
the population • Requires more sampling units than SRS
Multistage • A complete sampling frame is only needed • Less precise than SRS
sampling in the first stage; sampling frames are only • Requires more sampling units than SRS
required for selected units in subsequent
Source: FAO
sampling stage.
• Flexible and less costly than SRS

31 32

BIAS IN SAMPLING
Bias is a tendency of estimate to deviate in one direction from a true value. Bias in
sampling results to distortion of study results. It can occur during selection of
sampling units and in measurement of exposure or outcome. Examples are listed Having de
below:
select ou
Sampling bias can be due to:
• Incomplete sampling frame
Measurement bias occurs when:
• Respondents do not tell the truth
DECIDING ON now have t
• Improper sampling • Respondents do not always SUITABLE SAMPLE SIZE
• Incomplete coverage understand the questions.
• Studying volunteers only • Respondents forget
• Missing cases of short duration • Respondents give different answers
to different interviewers.
• Non-response
• Respondents may say what they
• Not representative of the population think the interviewer wants to hear.

33 34

What are we trying to estimate?


Sample size estimation
Means
Consider the following factors: – Means (e.g. the
Proportions – Proportions (e.g
area)
Presence of disease - Presence of dise
- Relationship be
Relationship b/w E & D - Effe

Effects or difference

35 36

6
10/17/19

What sampling strategy are we using?


How many populations?
Simple random
– Means (e.g. the average One
number of cases per area)
population
Systematic random – Proportions (e.g. The proportion of infected farms in an – Means (e
area) Two or more populations – Proportio
- Presence of disease area)
Stratified random
- Relationship between Exposure and Disease - Presence
- Effects or differences within a population - Relation
Cluster

Multistage

37 38

Example of computer softwares


Sample size for estimating:
Published
Formulas tables Single mean Single proportion
• WinEpiscope
Multiplier Confidence level
• WinPEPI
• OpenEpi 1.645 90%
Software • Epiinfo 1.96 95%
Hand packages 2.576 99%
calculator • FreeCalc Population
Proportion
• EpiTools Standard deviation

Statistician/ σ2
Z 2 PQ (1-P)
Z 2 PQ
Epidemiologist Enter formula in Excel
n= a 2 n= a 2
Use EpiInfo –Statcalc module
d Tolerable error d

39 40

Table of sample size for estimation of proportion in a population at fixed levels of confidence and accuracy
and varying expected proportion
Box 1. Sample size for single proportion Level of Confidence
Expected 90% 95% 99%
•We want to estimate prevalence of Disease Y in a 400-cattle
Prevalence Desired absolute Desired absolute Desired absolute
herd. No idea of expected prevalence in this population. We
would like our estimate to be within +/- 5% at 95% confidence precision precision precision
10% 5% 1% 10% 5% 1% 1% 5% 10%
level. Thus:
• n = Zα 2PQ Confidence level 95% 99% 99/9 10% 24 97 2435 35 138 3457 60 239 5971
• d2 Significance level 5% 1% 0.1% 20% 43 173 4329 61 246 6147 106 425 10616
Z score 1.96 2.58 3.29 30% 57 227 5682 81 323 8067 139 557 13933
40% 65 260 6494 92 369 9220 159 637 15923
n = 1.962PQ =1.962 0.50 (1-0.50
d2 (0.05)2 = 384 cattle 50% 68 271 6764 96 384 9604 166 663 16587
n(c)= 384 = 196 cattle 65
60% 260 6494 92 369 9220 159 637 15923
70% 57 227 5682 81 323 8067 139 557 13933
80% 43 173 4329 61 246 6147 106 425 10616
90% 24 97 2435 35 138 3457 60 239 5971

41 42

7
10/17/19

Box 2. Reducing the required sample size using


finite population correction factor
When sampling a relatively small population, the required sample
size can be adjusted using the finite population correction factor:
n(c)= n Required sample size
1–f 1 - Sampling fraction
Where sampling fraction (f) = n Sample size
N Population size
If known, indicate population size

The correction factor is applied only in descriptive studies using


simple or stratified random sample and when the sampling
fraction is greater than 10% of the population size.

43 44

Applying the finite population correction Box 4. Sample size for detecting disease or confirm
absence of disease in a finite population
n = {1 - (1 – P1)1/D} x {N - D/2} + 1
Reduce n if n >10% of N, i.e., 384/400 = 96%. n = Required sample size
Thus, N = Population size
( c ) = 384 = 196 animals are required for SRS
D = Estimated minimum number of diseased animals
1-0.96 or stratified sample in a in the population
descriptive study
(population size x minimum expected
prevalence)
P1 = Confidence level (usually 95%)

45 46

How many animals are needed to detect at least one


EXAMPLE 4 Sample size calculation to detect disease
positive animal in a population of 250 at 95% level
Suppose we have a leptospirosis-free herd of 250 pigs (N) and
we want to be 95% (P1) sure that it is free from Leptospira of confidence and 10% as expected proportion of
pomona. Based on data from an endemic area, L. pomona positives?
normally affects 10% of the population (D). We want to know
how many pigs are required for bacterial culture. Thus:
D = 0.10 x 250 = 25
n = {1 - (1 – P1)1/D} x {N - D/2} + 1
n = (1 - (1 - 0.95)1/25) x (250 - 25/2) + 1
= 0.1129 x 238.5
= 27
We need to sample 27 pigs. If none of the 27 pigs tested had L.
pomona, we can be 95% sure that L. pomona is not present in the
herd.

48

47 48

8
10/17/19

Factors to consider in Sample Size Determination Sample size is directly proportional to the power of
Za 2 PQ
n= Zα2 2PQ
study. The larger the sample size, the study will
n= Required Precision have a greater power to detect significant different,
Factor
effect or association Sample size
d2 (P1-P2)2 Expected variation of Desired precision Need larger sample to get smaller error
variable in the population Nature of analysis Complex multivariate statistics
2
δ
Za 2 PQ Zαδ 2 Desired level of confidence Number of δ need larger samples
n= 2 variables
d2 Heterogeneity Need larger sample to study more
d2 Design effect- often b/w diverse population
1&2. Sampling design Smaller if stratified, larger if
> accurate but costly; can Nonstatistical Statistical power -
concerns How likely it iscluster
Power - Probability that the
that
test will correctly identify a a statistical
Cost, time, facilities and manpower signficant difference, effect or difference of a given
Effect size- a measure of strength of association. magnitude will be
relationship b/w 2 variables in a
Sampling method: larger for detected, under the
population. The bigger the size of the assumption that it
effect in population, the easier it will be cluster sampling does exist
to find out Multiple variables 49

49 50

If sample size is too small If sample size is too large


• Even a well-conducted • Study will be difficult & costly
study may fail to answer
its research question. • Time constraint
• loss of accuracy
• May fail to detect
important effects or
associations
• May associate this effect
or association
imprecisely

51

You might also like