Professional Documents
Culture Documents
Thus
E(ȳsys) = (Ȳ1 + ... + Ȳk )/k = Ȳ ,
so the systematic mean is unbiased for the population mean.
Let Yij = Yi+k(j−1), for i = 1, 2, ..., k and j = 1, 2, ..., n.
k
2 1X
Var(ȳsys) = E(ȳsys − Ȳ ) = (Ȳi − Ȳ )2.
k i=1
Recall
N
X
2
(N − 1)S = (Yi − Ȳ )2
i=1
Xk Xn
= (Yij − Ȳ )2
i=1 j=1
k X
X n k
X
= (Yij − Ȳi)2 + n (Ȳi − Ȳ )2
i=1 j=1 i=1
= k(n − 1)Sw2 + knVar(ȳsys) = (N − k)Sw2 + N Var(ȳsys)
SydU STAT3014 (2015) Second semester Dr. J. Chan 62
STAT3014/3914 Applied Stat.-Sampling C4-Sys. & Cluster Sam.
since N = nk. Thus Var(ȳsys) is the between cluster variance and Sw2 is
the within cluster variance.
N −1 2 N −k 2
Var(ȳsys) = S − Sw
N N
where
ns ns
!
1 X 1 X
s2ȳ = (ȳi − ȳ¯)2 = ȳi2 − nsȳ¯2 .
ns − 1 i=1
ns − 1 i=1
Systematic Sampling
Parameter Point Estimate Estimated Variance
One systematic sample
n k
1X 1X
Mean Y y= yi Var(Yb sy ) = (ȳj − ȳ)2
n i=1 k j=1
N N −1 2 N −k 2
(Sampling interval k = ) = S − Sw
n N N
" n k #
1 X X
where Sw2 = (yij − ȳi )2
N −k i=1 j=1
" k
n X
#
1 X
S2 = (yij − ȳ)2
N −1 i=1 j=1
s n
0 N 1 X
(Sampling interval k = ns ) s2y = (y j − y)2
n ns − 1 j=1
ns
!
1 X 2
= y 2j − ns y
ns − 1 j=1
4.2.2 Notation
N = Number of clusters in the population.
Mi = Size of cluster i, i = 1, . . . , N .
M= N
P
i=1 Mi = Population size of elements.
PN
M M i
M̄ = = i=1
N N = Population average cluster size.
yij = y-value for element j element in cluster i, i = 1, .., N ; j = 1, .., Mi.
PMi
Yi = yi = j=1 yij = y-total for cluster i, i = 1, . . . , N .
Ri = Ȳ¯i = Myii = y-mean per element for cluster i, i = 1, . . . , N .
Y = N
P PN PMi
y
i=1 i = i=1 j=1 yij = Population y-total.
Y
Ȳ = N = Population mean per cluster.
R = Ȳ¯ = M Y
= Population mean per element.
n = Number of clusters in the sample.
For 1-stage sampling
m = ni=1 Mi = Sample size of elements.
P
ȳ = n1 ni=1
P
y = Sample mean per cluster.
Pn i
y
r = ȳ¯ = Pni=1Mii = Sample mean per element.
i=1
×M
R ←→ Y
M
since M̄ × N = N × N = M.
and the variance estimates follow from the ratio estimators in SRS where
Pn 2
Pn 2 Pn 2
Pn 2
2 i=1 (yi − rMi ) i=1 yi − 2r i=1 yi Mi + r i=1 Mi
sr = = .
n−1 n−1
The analysis is simpler when the clusters are of equal size, that is, Mi =
M/N = M̄ , i = 1, 2, .., N where M and N are the total number of
elements and clusters respectively.
Recall from ANOVA
N X M̄ N X
M̄ N X
M̄
(Yij − Ȳ¯ )2 = (Yij − Ȳ¯i)2 + (Ȳ¯i − Ȳ¯ )2
X X X
and so the cluster estimator is preferable when Sb2 < S 2. Note that
N
1 X
Sb2 = (Yi − Ȳ )2
M̄ (N − 1) i=1
to survey all the elements in the selected clusters when the cluster
sizes are large. A natural way is to sub-sample elements from the
selected clusters resulting in a 2-stage cluster sampling where at
Stage 1: a SRS of clusters is drawn,
Stage 2: a SRS of elements is taken from each cluster selected at stage 1.
2. Further notation
3. Estimation
Mi
P
The observed cluster totals yi = yij are estimated by
j=1
i m
Mi X
ŷi = Miȳi· = yij (1)
mi j=1
where ȳi· is the sample mean per elements for the elements yij se-
lected from cluster i at stage 2 and then replace yi by ŷi.
Hence the additional variance of var(Yb̄ c2,r ) and var(Ȳ c2,r ) are ob-
b̄
tained by dividing the additional variance of var(Ybc2,r ) with N 2 and
M 2 respectively.
For ratio estimator:
s2 n 2
syi
1 n r N X
2 m i
var(Rbc2,r ) = 1 − + M 1 − ,
M̄ 2 N n nM 2 i=1 i Mi mi
s2 n 2
syi
n r 1 X mi
var(Yb̄ c2,r ) = 1− + Mi2 1 − ,
N n nN i=1 Mi mi
2 n 2
mi syi
2
n sr
N X
2
var(Yc2,r ) = N 1 −
b + M 1−
N n n i=1 i Mi mi
n
P
ŷi
i=1
where R
b = r̂ = = Ȳ c2,r is a mean per element estimate and
b̄
Pn
Mi
i=1
n n n n
2
ŷi2 2
Mi2
P P P P
(ŷi − r̂Mi) − 2r̂ ŷiMi + r̂
i=1 i=1 i=1 i=1
s2r = = (3)
n−1 n−1
is obtained by substituting ŷi for yi in s2r for 1-stage cluster sampling.
where n n
2
(ŷi − ȳˆ) ŷi2 − nȳˆ2
P P
i=1 i=1
s2y = =
n−1 n−1
is obtained by substituting ŷi for yi in s2y for 1-stage cluster sampling
n
ˆ 1
P
and ȳ = n ŷi is the sample mean of ŷi.
i=1
Note: It can be shown that R bc2, Yb̄ c2 and Ybc2 are exactly unbiased
while Rbc2,r , Yb̄ c2,r and Ybc2,r are approximately unbiased for large sam-
ple sizes.
Summary
Ratio Ordinary Variance Additional variance in
2 2
Est. (Sr ) Est. (Sy ) in stage 1 in stage 2 ŷi = Mi ȳ¯i → yi
n
√ ȳˆ n S2 mi s2yi
1 N X 2
Mean/ele. R r̂ X 1− M 1−
M̄ M̄ 2 N n nM 2 i=1 i Mi mi
n
√ n S2 mi s2yi
1 X 2
Mean/clus. Ȳ M̄ r̂ X ȳˆ 1− M 1−
N n nN i=1 i Mi mi
n
√ n S2 mi s2yi
2
NX 2
Total Y M r̂ X N ȳ ˆ N 1− M 1−
N n n i=1 i Mi mi
Note:
√
(a) ’ ’ or ’X’ indicates whether we can use the estimator when M
is unknown. In the case when M is unknown, M̄ is replaced by
n
1
P
the sample mean m̄ = n Mi and
i=1
L L 2
syl
1 X 1 X
2 nl
R
bstc1 = Wl ȳl X var(Rbstc1 ) = W l 1 −
M̄ l=1 M̄ 2 l=1 Nl nl
L L
√ nl s2yl
X X
2
Y stc1 =
b̄ Wl ȳl var(Y stc1 ) =
b̄ Wl 1 −
Nl nl
l=1 l=1
L L 2
√ syl
X
2
X
2 n l
Ybstc1 = N Wl ȳl var(Ybstc1 ) = N Wl 1 −
Nl nl
l=1 l=1
Pnl
yli
Separate ratio estimator: rl = Pni=1
l
, s2srl = s2yl − 2rl sxyl + rl2 s2xl
i=1 Mli
L L 2
1 X 1 X
2 nl ssrl
Rbstc1,sr = Wl rl M̄l X var(Rbstc1,sr ) = W l 1 −
M̄ l=1 M̄ 2 l=1 Nl n l
L L 2
X X n l ssrl
Yb̄ stc1,sr = Wl rl M̄l X var(Yb̄ stc1,sr ) = Wl2 1 −
Nl nl
l=1 l=1
L L 2
X
2
X
2 n l ssrl
Ybstc1,sr = N Wl rl M̄l X var(Ybstc1,sr ) = N Wl 1 −
Nl nl
l=1 l=1
W1 ȳ1 + W2 ȳ2
Combine ratio estimator: rc = , s2crl = s2yl − 2rc sxyl + rc2 s2xl
W1 m̄1 + W2 m̄2
L 2
√
1 X nl scrl
R
bstc1,cr = rc var(Rbstc1,cr ) =
2
Wl2 1 −
M̄ l=1 Nl nl
L 2
X n l scrl
Yb̄ stc1,cr = M̄ rc X var(Yb̄ stc1,cr ) = Wl2 1 −
Nl nl
l=1
L 2
X n l scrl
Ybstc1,cr = M rc X var(Ybstc1,cr ) = N 2 Wl2 1 −
Nl nl
l=1
(a) We have
y1 53, 160 y 54, 700
r1 = = = 8, 801.325 r2 = 2 = = 11, 163.265 sep. r., mean/ele.
m1 6.04 m2 4.90
N1 y 1 + N2 y 2 415 × 53, 160 + 168 × 54, 700
rc = = = 9, 385.248 comb. r., mean/ele.
N1 m1 + N2 m2 415 × 6.04 + 168 × 4.90
M1 2, 500 M2 850
M1 = = = 6.024 M2 = = = 5.060 pop. mean cluster size
N1 415 N2 168
M 3, 350
M= = = 5.7461 overall pop. mean cluster size
N 583
N1 415 N2 168
W1 = = = 0.7118 W2 = = = 0.2882 wt. prop. to no. of cluster
N 583 N 583
(ii) Estimate of the average income per individual in the two cities
combined and its standard error using separate ratio estimator
SydU STAT3014 (2015) Second semester Dr. J. Chan 79
STAT3014/3914 Applied Stat.-Sampling C4-Sys. & Cluster Sam.
are
sample average of income per block
R
bstc1,sr =
population average no. of residents per block
1
= (W1M̄1r1 + W2M̄2r2)
M̄
583 415 2500 53160 168 850 54700
= × × + × ×
3350 583 415 6.04 583 168 4.90
= 9, 400.623
2 2
1 2 n1 s r1 2 n2 sr2
var(Yb stc1,sr ) = W 1 1 − + W 2 1 −
M̄ 2 N1 n1 N2 n2
5832 4152 25 25189.3082
= 1− +
33502 5832 415 25
1682 10 7192.9702
= 1−
5832 168 10
= 366938.7745
√
se(Yb stc1,sr ) = 366938.7745 = 605.7547
are
sample estimate of total income
R
bstc1,cr =
sample estimate of total no. of residents
N1y 1 + N2y 2
= rc =
N1m1 + N2m2
415 × 53, 160 + 168 × 54, 700
= = 9, 385.25
415
× 6.04 + 168 ×
2 4.90 2
1 n 1 s cr1 n2 scr2
var(R
bstc1,cr ) = N12 1 − + N22 1 −
c2
M N n1 N2 n2
1
25 25998.6562
1 2
= 415 1 − +
3329.82 415 25
2
10 8656.483
1682 1 −
168 10
= 407060.1502
√
se(R
bstc1,cr ) = 407060.1502 = 638.0127
Note that
1. The 3 estimates, Rbstc1 = 9, 328.66, R
bstc1,sr = 9, 400.62 and R
bstc1,cr =
9, 385.25 using respectively ordinary, separate ratio and combined
ratio estimators are similar.
2. Since
ȳ1s1m 53160(2.3714)
0.30315 = ρ1 < = = 0.479048
2m̄1s1y 2(6.04)(21784.322)
s1sr > s1y . Also since se(Yb stc1,sr ) = 605.755 < se(R bstc1,cr ) =
638.013 < se(R bstc1) = 721.92, the ratio estimate is better than
the ordinary estimate because the block sizes M2j are highly and
positively related to the total incomes y2j for block j of city 2.
3. Between the two ratio estimates, the separate ratio estimate R bstc1 is
again better because there is great difference between the two ratios
of average income per residents r1 = 8, 801.3245 and r2 = 11, 163.265
in the 2 cities and their sample sizes of blocks, n1 = 25 and n2 = 10
are not too small. Hence the assumption of a common ratio in the
combined ratio estimator is not appropriate.