You are on page 1of 72

Chapter 4

Sampling and sampling distribution

Prepared by
Hirbo Shore (MPH, Assist. Professor),
School of Public Health, CHMS-HU
January 2021

• Sample survey methodology often use to obtain information
about a large aggregate or population by selecting and
measuring a sample from the population

• Due to the variability of characteristics among items in the

population, scientific sample designs in the sample selection
process to reduce the risk of a distorted view of the population,
and to make inferences about the population based on the
information from the sample survey data.

• In order to make statistically valid inferences for the

population, sample design must be incorporated in the data

• Sampling enables us to estimate characteristics of a population

by directly observing a portion of the entire population.
• The main interest is how this information can be applied to the
entire population.

Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of
collecting the information.
• Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of
collecting data
• Sampling error: Precise allowance can be made for sampling
• Greater speed: Data can be collected and summarized more

Disadvantages of sampling:
• There is always a sampling error.

• Sampling may create a feeling of discrimination within the


• Sampling may be inadvisable where every unit in the

population is legally required to have a record.

Sampling technique

• There are two different approaches to sampling in survey


– Nonprobability sampling

– Probability sampling

Probability sampling
• Gives each element a known non-zero chance of being
included in the sample.

• This method is closer to a true representation of the


• It can be difficult to use due to size of the sample and cost to

obtain, but the generalizations that come from it are more
likely to be closer to the a true representation of the
Probability sampling,…
• The following are the most common probability sampling
– Simple random sampling

– Systematic random sampling

– Sampling with probability proportional to size

– Stratified random sampling

– Cluster sampling

– Multi‐stage sampling

1. Simple random sampling

• Each member of a population has an equal chance of being

included in the sample.

• A list of serially numbered sampling units (1 to N), and then a

random number table/ computer generated number or lottery
method is used to select n individuals out of N.

• Example
– Suppose your school has 500 students and you need to
conduct a short survey on the quality of the food served in
the cafeteria.
– You decide that a sample of 50 students should be
sufficient for your purposes.
– In order to get your sample, you assign a number from 1 to
500 to each student in your school.
– To select the sample, you use a table of randomly generated


• Pick a starting point in the table (a row and column number)

and look at the random numbers that appear there. In this case,
since the data run into three digits, the random numbers would
need to contain three digits as well.

• Ignore all random numbers after 500 because they do not

correspond to any of the students in the school.

• The first 50 different numbers between 001 and 500 make up

your sample.
Systematic random sampling
• Also interval sampling, systematic sampling where there is a
gap, or interval, between each selected unit in the sample

• Steps in systematic random sampling

1. Number the units on your frame from 1 to N (where N is
the total population size).
2. Determine the sampling interval (K) by dividing the
number of units in the population by the desired sample

Systematic random sampling,…

3. Select a number between one and K at random. This

number is called the random start and would be the first
number included in your sample.

4. Select every Kth unit after that first number


• To select a sample of 100 from a population of 400, you would

need a sampling interval of 400 ÷ 100 = 4.

• Therefore, K = 4.

• You will need to select one unit out of every four units to end

up with a total of 100 units in your sample.

• Select a number between 1 and 4 from a table of random


• If you choose 3, the third unit on your frame would be the first
unit included in your sample;

• The sample might consist of the following units to make up a

sample of 100: 3 (the random start), 7, 11, 15, 19...395, 399
(up to N, which is 400 in this case).

• With a systematic sample approach there are only four
possible samples that can be selected, corresponding to the
four possible random starts:
• 1, 5, 9, 13...393, 397

• 2, 6, 10, 14...394, 398

• 3, 7, 11, 15...395, 399

• 4, 8, 12, 16...396, 400

• Each member of the population belongs to only one of the four
samples and each sample has the same chance of being
• Each unit has a one in four chance of being selected in the
• The main difference is that with SRS, any combination of 100
units would have a chance of making up the sample, while
with systematic sampling, there are only four possible

Sampling with probability proportional to size

• Also known as sampling with probability proportional to size

• Probability sampling requires that each member of the survey
population have a chance of being included in the sample, but
it does not require that this chance be the same for everyone

• If there is information available on the frame about the size of

each unit and if those units vary in size, this information can
be used in the sampling selection in order to increase the
Probability proportional to size,…
• With this method, the bigger the size of the unit, the higher the
chance it has of being included in the sample.

• For this method to bring increased efficiency, the measure of

size needs to be accurate.

Steps in PPS
1. List all clusters with their population size

2. Calculate the cumulative frequency

3. Calculate the sampling interval by dividing the total

population size by the sample number of clusters, say K

4. Randomly choose a number between 1 and K, say j

5. Kebeles/Clusters with cumulative frequency contacting the jth,

(j+k)th, ….(j+(k‐1)k)th will be included in the sample

Cluster size Cum. Interval
1 1028 1028 0-1028
• If the number of cluster needed is 5,
2 555 1583 1029-1583
sampling interval is 9156/5=1831 3 390 1584-1973
• Draw a number between 1 and 1831 4 1309 3282 1974-3282
5 698 3980 3283-3980
lets say 480
6 907 4887 3981-4887
• Start from the village including “480” 7 432 5319 4888-5319
and draw the clusters adding the 8 897 6216 5320-6216
9 677 6893 6217-6893
sampling interval
l0 501 7394 6894-7394
• According cluster 1, 4, 5, 6, 7 will be 11 1094 7395-8488
included 12 668 9156 8489-9156
Total 9156

Stratified random sampling

• It is done when the population is known to be have

heterogeneity with regard to some factors and those factors are
used for stratification

• The population is divided into homogeneous, mutually

exclusive groups called strata, and then independent samples
are selected from each stratum.

Stratified random sampling,…

• Any of the sampling methods can be used to sample within

each stratum.

• A population can be stratified by any variable that is available

for all units on the sampling frame prior to sampling (e.g., age,
sex, province of residence, income, etc.).

Reasons for stratified sampling

• more efficient sampling strategy.

• ensures an adequate sample size for sub‐groups in the

population of interest

• Each stratum becomes an independent population and you will

need to decide the sample size for each stratum.

Cluster sampling
• Sometimes it is too expensive to spread a sample across the
population as a whole.

• Travel costs can become expensive if interviewers have to

survey people from one end of the country to the other.

• To reduce costs, researchers may choose a cluster sampling


• The clusters should be homogeneous

Steps in cluster sampling
• Cluster sampling divides the population into groups or
• A number of clusters are selected randomly to represent the
total population, and then all units within selected clusters are
included in the sample.
• No units from non‐selected clusters are included in the
sample—they are represented by those from selected clusters.
• This differs from stratified sampling, where some units are
selected from each group


• In a school based study, we assume students of the same

school are homogeneous.

• We can select randomly sections and include all students of the

selected sections only

Cluster sampling,…
• cost reduction is a reason for using cluster sampling.

• It creates 'pockets' of sampled units instead of spreading the

sample over the whole territory and reduce cost.

• Another reason is that sometimes a list of all units in the

population is not available, while a list of all clusters is either

available or easy to create.

Cluster sampling,…

• The main drawback is a loss of efficiency when compared

with simple random sampling.

• It is usually better to survey a large number of small clusters

instead of a small number of large clusters
• This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall

Cluster sampling

• Another drawback to cluster sampling is that you do not have

total control over the final sample size.

• the final size may be larger or smaller than you expected

Multi‐stage sampling

• Involves picking a sample from within each chosen cluster,

rather than including all units in the cluster.
• Requires at least two stages
– In the first stage, large groups or clusters are identified and
– These clusters contain more population units than are
needed for the final sample

Multi‐stage sampling,…
• In the second stage, population units are picked from within
the selected clusters for a final sample.

– If more than two stages are used, the process of choosing

population units within clusters continues until there is a
final sample
• Benefit of a more concentrated sample for cost reduction

Multi‐stage sampling,…

• However, the sample is not as concentrated as other clusters

and the sample size is still bigger than for a simple random
sample size.

• a list of all of the units in the population is not needed. All you
need is a list of clusters and list of the units in the selected

Multi‐stage sampling,…

• More information is needed in this type of sample than what is

required in cluster sampling.

• However, multi‐stage sampling still saves a great amount of

time and effort by not having to create a list of all of the units
in a population.

Non‐probability sampling
• There is an assumption that there is an even distribution of
characteristics within the population
• Any sample would be representative and because of that,
results will be accurate
• Elements are chosen arbitrarily, there is no way to estimate the
probability of any one element being included in the sample

• No assurance is given that each item has a chance of being

included, making it impossible either to estimate sampling
variability or to identify possible bias
Non‐probability sampling,…

• Reliability cannot be measured in nonprobability sampling; the

only way to address data quality is to compare some of the
survey results with available information about the population

• There is no assurance that the estimates will meet an

acceptable level of error.

• Researchers are reluctant to use these methods because there is

no way to measure the precision of the resulting sample.

Non‐probability sampling,…

• Despite these drawbacks, non‐probability sampling methods

can be useful when descriptive comments about the sample
itself are desired.

• They are quick, inexpensive and convenient.

• There are also other circumstances, such as researches, when it

is unfeasible or impractical to conduct probability sampling.

Types of nonprobability sampling
1. Convenience or haphazard sampling

2. Volunteer sampling

3. Judgment sampling

4. Quota sampling

Convenience sampling

• referred to as haphazard or accidental sampling.

• It is not normally representative of the target population

because sample units are only selected if they can be accessed
easily and conveniently.
• Easy to use, but that advantage is greatly offset by the
presence of bias.
• Although useful applications of the technique are limited, it
can deliver accurate results when the population is

Convenience sampling,…
• For example;
• a scientist could use this method to determine whether a
lake is polluted.

• Assuming that the lake water is well‐mixed, any sample

would yield similar information.

• A scientist could safely draw water anywhere on the

lake without fretting about whether or not the sample is
Volunteer sampling
• Occurs when people volunteer their services for the study.

• In psychological experiments or pharmaceutical trials (drug

testing), for example, it would be difficult and unethical to
enlist random participants from the general public. In these
instances, the sample is taken from a group of volunteers.

Judgment sampling
• a sample is taken based on certain judgments about the overall

• The underlying assumption is that the investigator will select

units that are characteristic of the population

• The critical issue here is objectivity: how much can judgment

be relied upon to arrive at a typical sample?

Judgment sampling,…

• subject to the researcher's biases and is perhaps even more

biased than haphazard sampling.

• Since any preconceptions the researcher may have are

reflected in the sample, large biases can be introduced if these
preconceptions are inaccurate

• Often used in exploratory studies like pre‐testing of

questionnaires and focus groups

Judgment sampling,…

• They also prefer to use this method in laboratory settings

where the choice of experimental subjects (i.e., animal,
human) reflects the investigator's pre‐existing beliefs about the

• One advantage of judgment sampling is the reduced cost and

time involved in acquiring the sample.

Quota sampling
• This is one of the most common forms of non‐probability

• Sampling is done until a specific number of units (quotas) for

various sub‐populations have been selected.

• Since there are no rules as to how these quotas are to be filled,

it is really a means for satisfying sample size objectives for
certain sub‐populations.

Quota sampling,…

• it does not meet the basic requirement of randomness

• Some units may have no chance of selection or the chance of

selection may be unknown.

• Therefore, the sample may be biased

• generally less expensive than random sampling.

Quota sampling,…

• Effective sampling method when information is urgently

required and can be carried out independent of existing
sampling frames.

• In many cases where the population has no suitable frame,

quota sampling may be the only appropriate sampling method.

Sampling Distribution
• The distribution of all possible values that can be assumed by
some statistic, computed from samples of the same size
randomly drawn from the same population

• Used to make statistical inference about the unknown

• Let us consider a variable, X, and the population of X is
comprised of N elements denoted by (X1, . . ., XN) which
means that whenever we draw a sample of size, n, the values
in the sample must be selected from these elements in the
Sampling Distribution,…
• Before a sample is drawn, we do not have any idea which
values from the population are being selected although it is
known that n values in the sample must be from N values in
the population

• In that case for each value of the sample, a random variable

can be defined, which are denoted by (X1, . . ., Xn).

• The realization of random sample can be represented as X1 =

x1, . . ., Xn = xn where x1, . . ., xn)

Sampling Distribution,…

• A random sample implies that each random variable in a

random sample can take any of the N values from the

• In other words, n random variables in the random sample may

take Nx. . . xN =Nn possible samples with replacement.
• To avoid this repetition in the same sample, we use the other
approach named as sampling without replacement.

Sampling Distribution,…

• Sampling without replacement ensures that if one value is

drawn already from the population in a sample, it will be
dropped from the next selection

• In that case, the number of possible samples is Nx(N-

1)x…x(N-n+1) where there is no replacement in a single
sample and considering the order of selection

Sampling Distribution,…

• However, due to change in the order of selection in the sample

which occurs in n! ways, there are n! samples with same
elements if the order is considered in samples

• Hence, to select samples without repetition within the sample

(without replacement) or between samples (disregarding
order), we have Nx(N-1)x…x(N-n+1)/n!= 𝑛
samples without
replacement disregarding the order of selection in samples

Sampling Distribution,…
• Example:
– Let us consider a population with four elements represented
by X1 =1, X2=3, X3 =2, and X4 =4.

– The population size is N = 4. We want to draw a sample of

size n = 2. Let us define the random sample as (X1, X2).

– Then we can select the samples by either with replacement

or without replacement.

Sampling Distribution,…

• With replacement: The number of samples with replacement is

24 =16.

• The samples are: (1, 1), (1, 3), (1, 2), (1, 4), (3, 1), (3, 3), (3,
2), (3, 4), (2, 1), (2, 3), (2, 2), (2, 4), (4, 1), (4, 3), (4, 2), (4, 4).

• Without replacement: The number of samples without

replacement in each sample is 4x3 =12 and the samples are (1,
3), (1, 2), (1, 4), (3, 1), (3, 2), (3, 4), (2, 1), (2, 3), (2, 4), (4, 1),
(4, 3), (4, 2).
Sampling Distribution,…

• It is clearly seen from these possible samples that each sample

is repeated 2! =2 x1 times if we disregard the order of
selection such as (1, 3) and (3, 1).

• Hence, the number of samples without replacement within

each sample (without replacement) and between samples
(disregarding order) is 2

Sampling Distribution,…

• If we consider without replacement and disregard order then

the samples are: (1, 3), (1, 2), (1, 4), (3, 2), (3, 4), and (2, 4)

• The important point here is that we select only one sample out
of all the possible samples using with or without replacement.

• Let us consider a hypothetical situation where all the possible

samples could be drawn of same sample size, n

Sampling Distribution,…

• In that situation, we could find the value of statistic for all the
possible samples.
• As an example, let us consider about the statistic, sample
• If there are Nn possible samples with replacement then the
number of values of statistic, mean, will be Nn too.
• Similarly, disregarding order the number of samples without
replacement is 𝑁 𝑛

Sampling Distribution,…

• We can find the frequency distribution and probability

distribution of the mean based on these values.

• This probability distribution is known as the sampling


• In other words, probability distribution of a statistic is known

as the sampling distribution

Sampling Distribution of the Sample Mean
• The sampling distribution of the sample mean is widely used
in statistics.

• Suppose that we have a population with mean μ and variance

σ2 and let X1, X2, . . ., Xn be a random sample of size n
selected randomly from this population.
𝑖=1 𝑥𝑖
• We know that the sample mean is 𝑥= 𝑛

Sampling Distribution of the Sample
• Example:
– Let us consider a population of size N = 5 and a sample of
size n = 3.
– The population elements are X1=30, X2 =25, X3 = 40, X4
=27, and X5 =35.
– The random sample is X1, X2, X3).
– The number of samples using without replacement and
disregarding order is 53 =10

Sampling Distribution of the Sample
𝑖=1 𝑥𝑖 157
• The population mean is µ= =31.40
𝑛 5

• The mean of the sample means

𝑖=1 𝑥𝑖


Sampling Distribution of the Sample
• The population mean of the variable X and the population
mean of means from the random samples of size n are exactly
the same.
• From the above example, we observe that in both the cases, the
mean is 31.40
E(X)= µ
E( 𝑥) =µ

Characteristics of the Sampling
Distribution of 𝑥
Mean and Variance of X If X1, X2, . . ., Xn is a random sample of
size n from any distribution with mean µ and variance σ2, then:
1. The µx= µ
2. The variance of 𝑥 is
3. The standard deviation (standard error ) of 𝑥 is

Sampling from Normal Population
– If X1, X2, . . ., Xn is a random sample of size n from a
normal population with mean µ and variance σ2 that is
Normal (µ, σ2), then the sample mean has a normal
– mean =µ
– Variance= 𝑛
– the random variable in a random sample satisfies the
assumption that X ~N (µ, σ2)
𝑥 −𝜇
• Z=𝜎/ 𝑛

Central Limit Theorem

• Let X1, X2, . . ., Xn be a random sample of size n from non-

normal population with mean µ and variance σ2.

• If the sample size n is large or if n→ ∞, then the sample mean

has approximately a normal distribution with mean µ and
variance σ2/n that is
– 𝑥= ≈ 𝑛𝑜𝑟𝑚𝑎𝑙(µ, σ2/n)
= Z=𝜎/ 𝑛 ≈N(0, 1) normal(0,1)

Central Limit Theorem,…
• We can use this result when sampling from non-normal
distribution with known variance 𝜎2 and with large sample size

Sampling Distribution of 𝑥2 − 𝑥 2

• The sampling distribution is 𝑥2 − 𝑥 2 is

• The variance 𝑥2 − 𝑥 2 is

• The standard deviation 𝑥2 − 𝑥 2 is

σ2 unknown, Normal population

• If X1, X2, . . ., Xn is a random sample of size n from a normal

distribution with mean μ and unknown variance σ2 that is
Normal (μ, σ2), then the statistic

t=𝑠/ , has df (n-1), s is sample standard deviation

Distribution of the sample proportion

• Constructed experimentally in exactly the same manner as the

case of the arithmetic mean

• From the finite population, we would take all possible samples

of a given size and for each sample compute the sample
proportion, 𝑃

• When the sample size is large, the distribution of sample

proportions is approximately normally distributed by virtue of
the central limit theorem
Distribution of the sample proportion,…

• The mean μ𝑃 = the average of all the possible sample

proportions, equals true population proportion

• The variance of proportion 𝜎𝑝 2 = P(1-P)/n

The end !



You might also like