You are on page 1of 10

Cluster sampling

Definition.
Cluster sampling is a sampling plan used when
mutually homogeneous yet internally heterogeneous
groupings are evident in a statistical population.
In this sampling plan, the total population is divided
into these groups (known as clusters) and a simple
random sample of the groups is selected.
The elements in each cluster are then sampled.
Cont…

Each cluster should be a


small-scale representation of
the total population.
The clusters should be
mutually exclusive and
collectively exhaustive.
Types of
Cluster
Sampling
Cont…
If all elements in each sampled cluster are sampled, then this is
referred to as a "one-stage" cluster sampling plan.
If a simple random subsample of elements is selected within
each of these groups, this is referred to as a "two-stage" cluster
sampling plan.
A common motivation for cluster sampling is to reduce the
total number of interviews and costs given the desired
accuracy. For a fixed sample size, the expected random error is
smaller when most of the variation in the population is present
internally within the groups, and not between the groups.
Cont…
Clusters are usually geographical areas. In practice, cluster
sampling designs and complex designs are used in large scale
surveys because:-
There are usually no adequate sampling frames of elements to
use in direct sampling, but frames of clusters are usually available
or can be easily obtained.
Clustering leads to savings in cost and time between and within
clusters. Thus it tends to be cheap and quick.
Cluster sampling (single stage) leads to simple field instruction and
training, therefore reducing errors. Generally, cluster sampling facilitates
better control of some of non-sampling errors.
Advantages of cluster sampling
 Can be cheaper than other sampling plans.
 Reduced variability: in the rare case
– e.g. fewer travel expenses, administration costs.
of a negative intra-class correlation
 Feasibility: This sampling plan takes large
between subjects within a cluster, the
populations into account. Since these groups are so estimators produced by cluster
large, deploying any other sampling plan would be sampling will yield more accurate
very costly. estimates than data obtained from
 Economy: The regular two major concerns of a simple random sample (i.e. the design
expenditure, i.e., traveling and listing, are greatly effect will be smaller than 1). This is
reduced in this method. For example: Compiling not a common place scenario.
research information about every household in a  Major use: when the sampling frame
city would be very costly, whereas compiling of all elements is not available we can
information about various blocks of the city will be resort only to the cluster sampling.
more economical.
Disadvantages of cluster sampling
 Higher sampling error: Estimates are
generally less efficient than direct
 There is a high tendency of
sampling and the efficiency decreases as
elements within a cluster to
the number of stages of sampling
exhibit similar characteristics,
increases (since cluster sampling is usually
thus a certain characteristic
done in a number of stages). There is a
may be either over-represented
possibility of high sampling errors since
or under-represented which
only limited clusters are included leaving
results to skewness of results in
off a significant proportion of the
the study.
population un-sampled.
Comparison between Clustering and Stratification
No
Stratification Clustering
1. A fraction of the population is A fraction of the population is
investigated. investigated.
2. Each stratum is considered from Only a sample of clusters is considered.
which a sample is drawn.
3. For better estimates, units within For better estimates, units within each
each stratum should be homogeneous, cluster should be heterogeneous, while
but there is heterogeneity between clusters are homogeneous.
strata.
4. Higher costs are incurred. Lower costs are incurred.

5. A higher precision is achieved than in A lower precision is achieved than in


SRS. SRS.
6. Within each stratum, the sample size The sample size varies if the clusters
Single Stage Cluster Sampling
Clusters of Equal Sizes

 List all clusters and use a random


Let:
sampling technique to select some of
 N-the population size
these clusters.
 T= total number of clusters in
 The units in the selected clusters are the population
all enumerated.  t= number of clusters selected
in the sample
 Clusters may be equal or unequal in
 B= size of each cluster
sizes.
Cont. … The estimator of the variance y is
𝑡
The unbiased estimator of the population 1 2
𝑣𝑎𝑟(𝑦)
ത = ෍ 𝑦ത𝑖 − 𝑦ത
mean based on the sample is:- 𝑡−1
𝑡 𝑖=1
1 The computational formula is
𝑦ത = ෍ 𝑦ത𝑖
𝑡
𝑖=1 1−𝑓𝑖 𝑦2
ഥ 𝑖𝑠 𝑡ℎ𝑒 𝒎𝒆𝒂𝒏 𝒑𝒆𝒓 𝒆𝒍𝒆𝒎𝒆𝒏𝒕𝒂𝒓𝒚 𝒖𝒏𝒊𝒕
𝑊ℎ𝑒𝑟𝑒 𝒚 𝑣𝑎𝑟 𝑦ത = σ𝑡𝑖=1 𝑦 2 𝑖 − .
𝐵 𝑡 𝑡−1 𝐵2 𝑡
1
𝑎𝑛𝑑 𝑦ത𝑖 = ෍ 𝑦𝑖𝑗 𝑡
𝐵 Where 𝑓𝑖 = ; the first stage sampling fraction.
𝑗=1 𝑇
𝐵
𝑦ത𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟.
1
𝑡 𝐵
𝑦𝑖 = ෍ 𝑦𝑖𝑗 ; 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑡𝑜𝑡𝑎𝑙 𝑎𝑛𝑑
∴ 𝑦ത = ෍ ෍ 𝑦𝑖𝑗 𝑗=1
𝐵𝑡
𝑖=1 𝑗=1 𝑡 𝑡 𝐵

𝑦 = ෍ 𝑦𝑖 = ෍ ෍ 𝑦𝑖𝑗
𝑖=1 𝑖=1 𝑗=1
Where y is an overall sample total.

You might also like