Professional Documents
Culture Documents
Cluster Sampling
Cluster Sampling
Reading Assignment:!
Scheaffer, Mendenhall, Ott, & Gerow!
2/1/13
Chapter 8! 1
Goals for this Lecture!
2/1/13
2
What is Cluster Sampling?!
2/1/13
3
Illustration!
Draw random
sample of For each
Final households
SSU,
from each
sample! block!
(of households)!
enumerate
all the
households!
2/1/13
4
Advantages and Disadvantages!
• Advantages:!
– For some populations, cannot construct explicit
sampling frame!
• But can construct a frame by cluster (area,
organizational, etc.) then sample within!
– For some efforts, too expensive to conduct a SRS!
• E.g., drawing a SRS from the US population for
an in-person interview!
• Disadvantage: Cluster sampling generally
provides less precision than SRS or stratified
sampling!
2/1/13
5
When To Use Cluster Sampling!
2/1/13
6
Example: Small Village!
2/1/13
7
Source: Survey Methodology, 1st ed., Groves, et al, 2004.
Effect of Cluster Sampling!
2/1/13
8
* Assuming an infinitely sized population (or sampling with replacement) to make the calculations easy.
Cluster Sampling Notation!
m
1 1 m
• Estimator for the mean:! ycl = m ∑ ti = ∑
mn i =1
ti
∑n
i =1
i
i =1
⎛ m⎞ r s 2
• Variance of ycl :! Var ( ycl ) = ⎜ 1− ⎟
⎝ M ⎠ mN 2
If N not known,
m
1 estimate with n !
∑ (ti − ycl ni )
2
where! s =2
r
m − 1 i =1
m
N N m
• Estimator for the total:!τˆ = N ycl = m ∑ ti = ∑
mn i =1
ti
• Variance of τˆ :! ∑i =1
ni i =1
Var ()
τˆ = N 2 Var
y
( )cl
⎛ m ⎞ s 2
= N 2 ⎜ 1− ⎟ r 2
⎝ M ⎠ mN
⎛ m ⎞ s 2
= M 2 ⎜ 1− ⎟ r
⎝ M⎠ m
! m
1
∑ (ti − ycl ni )
2
where! sr =
2
m − 1 i =1
2/1/13
11
What if N is Not Known?!
⎝ M⎠ m
m
1
with! s 2 =
∑ (ti − yt )
2
t
m − 1 i =1
• Again, note that the calculations are based on
the idea that only the clusters are random !
2/1/13
12
Example: NAEP!
• Assume:!
– 40,000 4th grade classrooms in US!
– ni = 25 students per classroom!
• Sampling procedure:!
– Select m classrooms!
– Visit each classroom and collect data on all
students!
• If m = 8, will have data on 200 students!
• Note the differences from SRS !
– All groups of 200 students cannot be sampled!
– Students in each classroom more likely to be alike!
2/1/13
13
Mean & Variance Computations !
⎛ 1 ⎞ 8
!where!sr = ⎜ ⎟ ∑ ( ti − 25 ycl )
2 2
⎝ 7 ⎠ i =1
!
2/1/13
14
The Design Effect!
2/1/13
16
Effective Sample Size!
2/1/13
17
Sampling within Selected Clusters!
2/1/13
18
(Approximate) Sample Size
Calculations for Estimating Means !
• Typically, we want to determine a sample size
in terms of the number of clusters (m) to
achieve a particular margin of error B
• So, solving the following for m
⎛ m σ
⎞ r
2
2 ⎜1 − ⎟ 2
=B
⎝ M ⎠ mN
gives! M σ r2
m=
MB 2 N 2 4 + σ r2
• Remember to inflate to allow for nonresponse
(if relevant)!
2/1/13
19
(Approximate) Sample Size
Calculations for Estimating Totals !
• Now solve the following for m (the number of
clusters) using!
⎛ m σ
⎞ r
2
2M ⎜1 − ⎟ =B
⎝ M ⎠m
!which gives!
! M σ t2
m= 2
! B 4 M + σ t2
• Again, remember to inflate to account for the
nonresponse rate (if relevant)!
2/1/13
20
Sampling with Probability
Proportional to Size (pps)!
• Sometimes it makes sense to sample clusters
in proportion to their size!
– Puts more emphasis on the larger clusters where
most of the observations are!
• E.g., sampling clusters equally when there are
a lot of small clusters and only a few very large
ones could be very inefficient!
• For cluster sampling, sampling with
probability proportional to size (pps) means!
Pr (select cluster i ) = ni N
2/1/13
21
Estimation Summary With PPS!
• Total!
N m
– Estimator for the total:!τˆ pps = ∑ yi
m i =1
( ) ( )
2 m
τˆ N 2
– Variance of τˆ pps:! Var pps
= ∑
m(m − 1) i=1
yi − µ̂ pps
2/1/13
22
What We Have Covered!
2/1/13
23