You are on page 1of 72

Chapter 4

Sampling and sampling distribution


Prepared by
Hirbo Shore (MPH, Assist. Professor),
School of Public Health, CHMS-HU
Contact: hamakiya@gmail.com
January 2021

1
Sampling
• Sample survey methodology often use to obtain information
about a large aggregate or population by selecting and
measuring a sample from the population

• Due to the variability of characteristics among items in the


population, scientific sample designs in the sample selection
process to reduce the risk of a distorted view of the population,
and to make inferences about the population based on the
information from the sample survey data.
Sampling,…

• In order to make statistically valid inferences for the


population, sample design must be incorporated in the data
analysis.

• Sampling enables us to estimate characteristics of a population


by directly observing a portion of the entire population.
• The main interest is how this information can be applied to the
entire population.

3
Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of
collecting the information.
• Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of
collecting data
• Sampling error: Precise allowance can be made for sampling
error
• Greater speed: Data can be collected and summarized more
quickly

4
Disadvantages of sampling:
• There is always a sampling error.

• Sampling may create a feeling of discrimination within the


population.

• Sampling may be inadvisable where every unit in the


population is legally required to have a record.

5
Sampling technique

• There are two different approaches to sampling in survey


research:

– Nonprobability sampling

– Probability sampling

6
Probability sampling
• Gives each element a known non-zero chance of being
included in the sample.

• This method is closer to a true representation of the


population.

• It can be difficult to use due to size of the sample and cost to


obtain, but the generalizations that come from it are more
likely to be closer to the a true representation of the
population.
7
Probability sampling,…
• The following are the most common probability sampling
methods:
– Simple random sampling

– Systematic random sampling

– Sampling with probability proportional to size

– Stratified random sampling

– Cluster sampling

– Multi‐stage sampling

8
1. Simple random sampling

• Each member of a population has an equal chance of being


included in the sample.

• A list of serially numbered sampling units (1 to N), and then a


random number table/ computer generated number or lottery
method is used to select n individuals out of N.

9
SRS,…
• Example
– Suppose your school has 500 students and you need to
conduct a short survey on the quality of the food served in
the cafeteria.
– You decide that a sample of 50 students should be
sufficient for your purposes.
– In order to get your sample, you assign a number from 1 to
500 to each student in your school.
– To select the sample, you use a table of randomly generated
numbers.

10
SRS,…

• Pick a starting point in the table (a row and column number)


and look at the random numbers that appear there. In this case,
since the data run into three digits, the random numbers would
need to contain three digits as well.

• Ignore all random numbers after 500 because they do not


correspond to any of the students in the school.

• The first 50 different numbers between 001 and 500 make up


your sample.
11
Systematic random sampling
• Also interval sampling, systematic sampling where there is a
gap, or interval, between each selected unit in the sample

• Steps in systematic random sampling


1. Number the units on your frame from 1 to N (where N is
the total population size).
2. Determine the sampling interval (K) by dividing the
number of units in the population by the desired sample
size.

12
Systematic random sampling,…

3. Select a number between one and K at random. This


number is called the random start and would be the first
number included in your sample.

4. Select every Kth unit after that first number

13
Example

• To select a sample of 100 from a population of 400, you would


need a sampling interval of 400 ÷ 100 = 4.

• Therefore, K = 4.

• You will need to select one unit out of every four units to end

up with a total of 100 units in your sample.


• Select a number between 1 and 4 from a table of random
numbers.

14
Example,…

• If you choose 3, the third unit on your frame would be the first
unit included in your sample;

• The sample might consist of the following units to make up a


sample of 100: 3 (the random start), 7, 11, 15, 19...395, 399
(up to N, which is 400 in this case).

15
Example,…
• With a systematic sample approach there are only four
possible samples that can be selected, corresponding to the
four possible random starts:
• 1, 5, 9, 13...393, 397

• 2, 6, 10, 14...394, 398

• 3, 7, 11, 15...395, 399

• 4, 8, 12, 16...396, 400

16
Example,…
• Each member of the population belongs to only one of the four
samples and each sample has the same chance of being
selected.
• Each unit has a one in four chance of being selected in the
sample.
• The main difference is that with SRS, any combination of 100
units would have a chance of making up the sample, while
with systematic sampling, there are only four possible
samples.

17
Sampling with probability proportional to size

• Also known as sampling with probability proportional to size


(PPS).
• Probability sampling requires that each member of the survey
population have a chance of being included in the sample, but
it does not require that this chance be the same for everyone

• If there is information available on the frame about the size of


each unit and if those units vary in size, this information can
be used in the sampling selection in order to increase the
efficiency.
18
Probability proportional to size,…
• With this method, the bigger the size of the unit, the higher the
chance it has of being included in the sample.

• For this method to bring increased efficiency, the measure of


size needs to be accurate.

19
Steps in PPS
1. List all clusters with their population size

2. Calculate the cumulative frequency

3. Calculate the sampling interval by dividing the total


population size by the sample number of clusters, say K

4. Randomly choose a number between 1 and K, say j

5. Kebeles/Clusters with cumulative frequency contacting the jth,


(j+k)th, ….(j+(k‐1)k)th will be included in the sample

20
Cluster size Cum. Interval
1 1028 1028 0-1028
• If the number of cluster needed is 5,
2 555 1583 1029-1583
sampling interval is 9156/5=1831 3 390 1584-1973
1973
• Draw a number between 1 and 1831 4 1309 3282 1974-3282
5 698 3980 3283-3980
lets say 480
6 907 4887 3981-4887
• Start from the village including “480” 7 432 5319 4888-5319
and draw the clusters adding the 8 897 6216 5320-6216
9 677 6893 6217-6893
sampling interval
l0 501 7394 6894-7394
• According cluster 1, 4, 5, 6, 7 will be 11 1094 7395-8488
8488
included 12 668 9156 8489-9156
Total 9156

21
Stratified random sampling

• It is done when the population is known to be have


heterogeneity with regard to some factors and those factors are
used for stratification

• The population is divided into homogeneous, mutually


exclusive groups called strata, and then independent samples
are selected from each stratum.

22
Stratified random sampling,…

• Any of the sampling methods can be used to sample within


each stratum.

• A population can be stratified by any variable that is available


for all units on the sampling frame prior to sampling (e.g., age,
sex, province of residence, income, etc.).

23
Reasons for stratified sampling

• more efficient sampling strategy.

• ensures an adequate sample size for sub‐groups in the


population of interest

• Each stratum becomes an independent population and you will


need to decide the sample size for each stratum.

24
Cluster sampling
• Sometimes it is too expensive to spread a sample across the
population as a whole.

• Travel costs can become expensive if interviewers have to


survey people from one end of the country to the other.

• To reduce costs, researchers may choose a cluster sampling


technique

• The clusters should be homogeneous

25
Steps in cluster sampling
• Cluster sampling divides the population into groups or
clusters.
• A number of clusters are selected randomly to represent the
total population, and then all units within selected clusters are
included in the sample.
• No units from non‐selected clusters are included in the
sample—they are represented by those from selected clusters.
• This differs from stratified sampling, where some units are
selected from each group

26
Example

• In a school based study, we assume students of the same


school are homogeneous.

• We can select randomly sections and include all students of the


selected sections only

27
Cluster sampling,…
• cost reduction is a reason for using cluster sampling.

• It creates 'pockets' of sampled units instead of spreading the


sample over the whole territory and reduce cost.

• Another reason is that sometimes a list of all units in the


population is not available, while a list of all clusters is either

available or easy to create.

28
Cluster sampling,…

• The main drawback is a loss of efficiency when compared


with simple random sampling.

• It is usually better to survey a large number of small clusters


instead of a small number of large clusters
• This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population.

29
Cluster sampling

• Another drawback to cluster sampling is that you do not have

total control over the final sample size.

• the final size may be larger or smaller than you expected

30
Multi‐stage sampling

• Involves picking a sample from within each chosen cluster,


rather than including all units in the cluster.
• Requires at least two stages
– In the first stage, large groups or clusters are identified and
selected.
– These clusters contain more population units than are
needed for the final sample

31
Multi‐stage sampling,…
• In the second stage, population units are picked from within
the selected clusters for a final sample.

– If more than two stages are used, the process of choosing


population units within clusters continues until there is a
final sample
• Benefit of a more concentrated sample for cost reduction

32
Multi‐stage sampling,…

• However, the sample is not as concentrated as other clusters


and the sample size is still bigger than for a simple random
sample size.

• a list of all of the units in the population is not needed. All you
need is a list of clusters and list of the units in the selected
clusters.

33
Multi‐stage sampling,…

• More information is needed in this type of sample than what is


required in cluster sampling.

• However, multi‐stage sampling still saves a great amount of


time and effort by not having to create a list of all of the units
in a population.

34
Non‐probability sampling
• There is an assumption that there is an even distribution of
characteristics within the population
• Any sample would be representative and because of that,
results will be accurate
• Elements are chosen arbitrarily, there is no way to estimate the
probability of any one element being included in the sample

• No assurance is given that each item has a chance of being


included, making it impossible either to estimate sampling
variability or to identify possible bias
35
Non‐probability sampling,…

• Reliability cannot be measured in nonprobability sampling; the


only way to address data quality is to compare some of the
survey results with available information about the population

• There is no assurance that the estimates will meet an


acceptable level of error.

• Researchers are reluctant to use these methods because there is


no way to measure the precision of the resulting sample.

36
Non‐probability sampling,…

• Despite these drawbacks, non‐probability sampling methods


can be useful when descriptive comments about the sample
itself are desired.

• They are quick, inexpensive and convenient.

• There are also other circumstances, such as researches, when it

is unfeasible or impractical to conduct probability sampling.

37
Types of nonprobability sampling
1. Convenience or haphazard sampling

2. Volunteer sampling

3. Judgment sampling

4. Quota sampling

38
Convenience sampling

• referred to as haphazard or accidental sampling.

• It is not normally representative of the target population


because sample units are only selected if they can be accessed
easily and conveniently.
• Easy to use, but that advantage is greatly offset by the
presence of bias.
• Although useful applications of the technique are limited, it
can deliver accurate results when the population is
homogeneous

39
Convenience sampling,…
• For example;
• a scientist could use this method to determine whether a
lake is polluted.

• Assuming that the lake water is well‐mixed, any sample


would yield similar information.

• A scientist could safely draw water anywhere on the


lake without fretting about whether or not the sample is
representative
40
Volunteer sampling
• Occurs when people volunteer their services for the study.

• In psychological experiments or pharmaceutical trials (drug


testing), for example, it would be difficult and unethical to
enlist random participants from the general public. In these
instances, the sample is taken from a group of volunteers.

41
Judgment sampling
• a sample is taken based on certain judgments about the overall
population

• The underlying assumption is that the investigator will select


units that are characteristic of the population

• The critical issue here is objectivity: how much can judgment


be relied upon to arrive at a typical sample?

42
Judgment sampling,…

• subject to the researcher's biases and is perhaps even more


biased than haphazard sampling.

• Since any preconceptions the researcher may have are


reflected in the sample, large biases can be introduced if these
preconceptions are inaccurate

• Often used in exploratory studies like pre‐testing of


questionnaires and focus groups

43
Judgment sampling,…

• They also prefer to use this method in laboratory settings


where the choice of experimental subjects (i.e., animal,
human) reflects the investigator's pre‐existing beliefs about the
population.

• One advantage of judgment sampling is the reduced cost and


time involved in acquiring the sample.

44
Quota sampling
• This is one of the most common forms of non‐probability
sampling.

• Sampling is done until a specific number of units (quotas) for


various sub‐populations have been selected.

• Since there are no rules as to how these quotas are to be filled,


it is really a means for satisfying sample size objectives for
certain sub‐populations.

45
Quota sampling,…

• it does not meet the basic requirement of randomness

• Some units may have no chance of selection or the chance of


selection may be unknown.

• Therefore, the sample may be biased

• generally less expensive than random sampling.

46
Quota sampling,…

• Effective sampling method when information is urgently


required and can be carried out independent of existing
sampling frames.

• In many cases where the population has no suitable frame,


quota sampling may be the only appropriate sampling method.

47
Sampling Distribution
• The distribution of all possible values that can be assumed by
some statistic, computed from samples of the same size
randomly drawn from the same population

• Used to make statistical inference about the unknown


parameter
• Let us consider a variable, X, and the population of X is
comprised of N elements denoted by (X1, . . ., XN) which
means that whenever we draw a sample of size, n, the values
in the sample must be selected from these elements in the
population
48
Sampling Distribution,…
• Before a sample is drawn, we do not have any idea which
values from the population are being selected although it is
known that n values in the sample must be from N values in
the population

• In that case for each value of the sample, a random variable


can be defined, which are denoted by (X1, . . ., Xn).

• The realization of random sample can be represented as X1 =


x1, . . ., Xn = xn where x1, . . ., xn)

49
Sampling Distribution,…

• A random sample implies that each random variable in a


random sample can take any of the N values from the
population.

• In other words, n random variables in the random sample may


take Nx. . . xN =Nn possible samples with replacement.
• To avoid this repetition in the same sample, we use the other
approach named as sampling without replacement.

50
Sampling Distribution,…

• Sampling without replacement ensures that if one value is


drawn already from the population in a sample, it will be
dropped from the next selection

• In that case, the number of possible samples is Nx(N-


1)x…x(N-n+1) where there is no replacement in a single
sample and considering the order of selection

51
Sampling Distribution,…

• However, due to change in the order of selection in the sample


which occurs in n! ways, there are n! samples with same
elements if the order is considered in samples

• Hence, to select samples without repetition within the sample


(without replacement) or between samples (disregarding
𝑁
order), we have Nx(N-1)x…x(N-n+1)/n!= 𝑛
samples without
replacement disregarding the order of selection in samples

52
Sampling Distribution,…
• Example:
– Let us consider a population with four elements represented
by X1 =1, X2=3, X3 =2, and X4 =4.

– The population size is N = 4. We want to draw a sample of


size n = 2. Let us define the random sample as (X1, X2).

– Then we can select the samples by either with replacement


or without replacement.

53
Sampling Distribution,…

• With replacement: The number of samples with replacement is


24 =16.

• The samples are: (1, 1), (1, 3), (1, 2), (1, 4), (3, 1), (3, 3), (3,
2), (3, 4), (2, 1), (2, 3), (2, 2), (2, 4), (4, 1), (4, 3), (4, 2), (4, 4).

• Without replacement: The number of samples without


replacement in each sample is 4x3 =12 and the samples are (1,
3), (1, 2), (1, 4), (3, 1), (3, 2), (3, 4), (2, 1), (2, 3), (2, 4), (4, 1),
(4, 3), (4, 2).
54
Sampling Distribution,…

• It is clearly seen from these possible samples that each sample


is repeated 2! =2 x1 times if we disregard the order of
selection such as (1, 3) and (3, 1).

• Hence, the number of samples without replacement within


each sample (without replacement) and between samples
4
(disregarding order) is 2
=6

55
Sampling Distribution,…

• If we consider without replacement and disregard order then


the samples are: (1, 3), (1, 2), (1, 4), (3, 2), (3, 4), and (2, 4)

• The important point here is that we select only one sample out
of all the possible samples using with or without replacement.

• Let us consider a hypothetical situation where all the possible


samples could be drawn of same sample size, n

56
Sampling Distribution,…

• In that situation, we could find the value of statistic for all the
possible samples.
• As an example, let us consider about the statistic, sample
mean.
• If there are Nn possible samples with replacement then the
number of values of statistic, mean, will be Nn too.
• Similarly, disregarding order the number of samples without
replacement is 𝑁 𝑛

57
Sampling Distribution,…

• We can find the frequency distribution and probability


distribution of the mean based on these values.

• This probability distribution is known as the sampling


distribution.

• In other words, probability distribution of a statistic is known


as the sampling distribution

58
Sampling Distribution of the Sample Mean
• The sampling distribution of the sample mean is widely used
in statistics.

• Suppose that we have a population with mean μ and variance


σ2 and let X1, X2, . . ., Xn be a random sample of size n
selected randomly from this population.
𝑛
𝑖=1 𝑥𝑖
• We know that the sample mean is 𝑥= 𝑛

59
Sampling Distribution of the Sample
Mean,…
• Example:
– Let us consider a population of size N = 5 and a sample of
size n = 3.
– The population elements are X1=30, X2 =25, X3 = 40, X4
=27, and X5 =35.
– The random sample is X1, X2, X3).
– The number of samples using without replacement and
disregarding order is 53 =10

60
Sampling Distribution of the Sample
Mean,…
5
𝑖=1 𝑥𝑖 157
• The population mean is µ= =31.40
𝑛 5

• The mean of the sample means

10
𝑖=1 𝑥𝑖
µ=
10

314
=31.40
10

61
Sampling Distribution of the Sample
Mean,…
• The population mean of the variable X and the population
mean of means from the random samples of size n are exactly
the same.
• From the above example, we observe that in both the cases, the
mean is 31.40
E(X)= µ
E( 𝑥) =µ

62
Characteristics of the Sampling
Distribution of 𝑥
Mean and Variance of X If X1, X2, . . ., Xn is a random sample of
size n from any distribution with mean µ and variance σ2, then:
1. The µx= µ
2. The variance of 𝑥 is
3. The standard deviation (standard error ) of 𝑥 is

63
Sampling from Normal Population
– If X1, X2, . . ., Xn is a random sample of size n from a
normal population with mean µ and variance σ2 that is
Normal (µ, σ2), then the sample mean has a normal
distribution
– mean =µ
𝜎2
– Variance= 𝑛
– the random variable in a random sample satisfies the
assumption that X ~N (µ, σ2)
𝑥 −𝜇
• Z=𝜎/ 𝑛
~𝑁𝑜𝑟𝑚𝑎𝑙(0,1)

64
Central Limit Theorem

• Let X1, X2, . . ., Xn be a random sample of size n from non-


normal population with mean µ and variance σ2.

• If the sample size n is large or if n→ ∞, then the sample mean


has approximately a normal distribution with mean µ and
variance σ2/n that is
– 𝑥= ≈ 𝑛𝑜𝑟𝑚𝑎𝑙(µ, σ2/n)
𝑥−𝜇
= Z=𝜎/ 𝑛 ≈N(0, 1) normal(0,1)

65
Central Limit Theorem,…
• We can use this result when sampling from non-normal
distribution with known variance 𝜎2 and with large sample size
too

66
Sampling Distribution of 𝑥2 − 𝑥 2

• The sampling distribution is 𝑥2 − 𝑥 2 is


µ1-μ2
• The variance 𝑥2 − 𝑥 2 is

• The standard deviation 𝑥2 − 𝑥 2 is

67
68
σ2 unknown, Normal population

• If X1, X2, . . ., Xn is a random sample of size n from a normal


distribution with mean μ and unknown variance σ2 that is
Normal (μ, σ2), then the statistic

𝑥−𝜇
t=𝑠/ , has df (n-1), s is sample standard deviation
𝑛

69
Distribution of the sample proportion

• Constructed experimentally in exactly the same manner as the


case of the arithmetic mean

• From the finite population, we would take all possible samples


of a given size and for each sample compute the sample
proportion, 𝑃

• When the sample size is large, the distribution of sample


proportions is approximately normally distributed by virtue of
the central limit theorem
70
Distribution of the sample proportion,…

• The mean μ𝑃 = the average of all the possible sample


proportions, equals true population proportion

• The variance of proportion 𝜎𝑝 2 = P(1-P)/n

71
The end !

THANK YOU
FOR
THANK YOU
FOR
!!
THANK YOU FOR YOUR ATTENTION!!

72

You might also like