Understanding Systematic Sampling Techniques
Understanding Systematic Sampling Techniques
1 n −1
2 y 2 , y k + 2 , y 2k + 2 , , y (n −1)k + 2 1
k
y2 = y jk + 2
n j =0
n −1
y jk +i
1
i y i , y k +i , y 2k +i , , y (n −1)k +i 1 yi =
k n j =0
n −1
y ( j +1)k
1
k y k , y 2k , y 3k , , y nk 1 yk =
k n j =0
Systematic Sampling:
A more simplified and versatile form of probability sampling design, known as systematic sampling design. If a
population consists of the units y1 , y 2 , , y N arranged in some fixed order and if the i-th possible sample from
the population is defined to be a subset with the units y i , y k +i , , y (n −1)k +i ; i = 1 , 2 , , k , then the subset
selected by any means will constitute the i-th systematic sample of size n .
The mean of the i-th systematic sample is obtained as follows
1 n −1
yi = y jk +i
n j =0
1 n
yi . = y ij
n j =1
k n
y ij
1
The population mean is Y =
nk i =1 j =1
Systematic sampling - 2
( ) 2 yi . − Y
k
1 2
var y sy = E y i . − Y =
k i =1
nyi . − nY
k
1 2
= 2
n k i =1
2
1 k n k n
1 1
= 2 n y ij − n y ij
n k i =1 n j =1 nk i =1 j =1
k
2 k
2
k yi .
1 k
yi .
y −
= 2 y i2. −
1 i =1 i =1
= 2
n k i =1 k
i.
n k i =1 k
k
2
yi .
N k
For the estimated total, the variance is var Yˆsy ( ) = y i2. −
n i =1
i =1
k
Theorem 1: In systematic sampling, with interval k , the sample mean y sy is an unbiased estimator of the population
mean Y .
Proof:
In systematic sampling, the whole sample becomes fixed as soon as the first unit is selected. Since the probability
of selection of the i-th systematic sample is 1 , each of the k samples has a constant probability 1 of being
k k
selected. By definition, the mean of the i-th systematic sample is
1 n
yi . = y ij
n j =1
( )
E y sy =
1 k
y i.
k i =1
=
1 k 1 n
yij
k i =1 n j =1
k n
yij
1
= =Y
nk i =1 j =1
( )2
k n
where S w2 (sy ) =
1
y ij − y i . (1)
k (n − 1) i =1 j =1
is the variance among units that lie within the same systematic sample.
Proof:
By definition, the variance of y sy is
Systematic sampling - 3
( ) 2 yi . − Y
k
var y sy = E y i . − Y =
1 2
(2)
k i =1
Now consider the usual way of partitioning the total sum of squares as follows
(N − 1)S 2 = y ij − Y 2
k n
i =1 j =1
2 + y ij − y i . 2
k k n
= n y i . − Y
i =1 i =1 j =1
( )
= nk var y sy + k (n − 1)S w2 (sy )
N − 1 2 k (n − 1) 2
( )
var y sy =
N
S −
N
S w(sy )
Theorem 3: The mean of a systematic sample is more precise than the mean of a simple random sample iff
S w2 (sy ) S 2
Proof:
If y is the mean of a simple random sample of size n , then
N −n S2
var( y ) =
N n
while the variance y sy is
N − 1 2 k (n − 1) 2
( )
var y sy =
N
S −
N
S w(sy )
( )
Thus var y sy var( y ) if and only if
N − 1 2 k (n − 1) 2 N −n S2
S − S w(sy )
N N N n
k (n − 1) 2 N −1 2 N − n S 2
S w(sy ) S −
N N N n
−
k (n − 1) S w2 (sy ) N − 1 −
N n
S
2
n
kn − n 2
k (n − 1) S w2 (sy ) kn − 1 − S
n
k (n − 1) S w2 (sy ) k (n − 1)S 2
S w2 (sy ) S 2
which states that systematic sampling is more precise than simple random sampling if the variance within the
systematic samples is larger than the population variance as a whole.
Theorem 4: An alternative form of the variance of y sy is
2
( )
var y sy =
S 2 N −1
n
N
1 + (n − 1) w =
n
1 + (n − 1) w
where w is the intra-class correlation coefficient between pairs of units that are in the same systematic
sample and its value depends on the arrangements of the units in the population. It is defined as
(
E y ij − Y y iu − Y )( ) k n
(yij − Y )(yij − Y ) (1)
(n − 1)(N − 1)S
2
w = =
(
E y ij − Y )
2 2
i =1 j u
Proof:
Systematic sampling - 4
By definition , ( )
var y sy =
1 k
yi . − Y
k i =1
2
( ) 2 2
k k
n 2 k var y sy = n 2 y i . − Y = ny i . − nY
i =1 i =1
2
k n
= y ij − nY
i =1
j =1
( ) ( ) ( )2
k
= y i1 − Y + y i 2 − Y + + y in − Y
i =1
2
( )
var y sy =
S 2 N −1
n
N
1 + (n − 1) w =
n
1 + (n − 1) w
1
Theorem 5: Show that limits of w is − w 1
n −1
Proof:
We know that ( )
var y sy =
N −1 2 n −1 2
N
S −
n
S w(sy ) = 2 − w2 (sy ) (1)
2
And ( )
var y sy =
n
1 + (n − 1) w (2)
Comparing equation (1) and (2 ) , it follows that
2
1 + (n − 1) w = 2 − w2 (sy )
n
n w2 (sy )
1 + (n − 1) w = n −
2
n w2 (sy )
(n − 1) w = n − 1 −
2
n w(sy )
2
w = 1− (3)
n −1 2
w2 (sy )
1 , it follows from equation (3) we get, −
1
Since 0 w 1.
2 n −1
Note:
( )
We have, V y sys 0 . So from (*) we get −
1
n −1
. Thus the minimum value of is −
1
n −1
and in this case
( )
V y sys = 0 .
Systematic sampling - 5
N −n 2
V (yn ) n(k − 1)
S
E= = Nn = (1)
( )
V y sys (nk − 1) S 1 + (n − 1) (nk − 1)1 + (n − 1)
2
nk n
Obviously this depends on the value of .
E 1
n(k − 1)
1
(nk − 1)1 + (n − 1)
(nk − n ) nk − 1 + (n − 1)(nk − 1)
− 1 (nk − 1)
1
−
(nk − 1)
Thus systematic sampling would be more efficient as compared with SRSWOR if
1
−
(nk − 1)
On the other hand, SRSWOR would be superior to systematic sampling if
1
−
(nk − 1)
However, if assumes the minimum possible value, = −
1
(n − 1)
( )
, then V y sys = 0 and consequently E = .
( )
Thus in this case reduction in V y sys over SRSWOR will be 100% .
k −1
If assumes the maximum value, i.e., if = 1 , then from (1) , we get E = .
nk − 1
( )2
k n
2
S wst =
1
y ij − y. j (a )
n(k − 1) i =1 j =1
(yij − y. j )(yij − y. j )
k n
and wst =
i =1 j j =1
(b )
n(n − 1)(k − 1)S wst
2
We have,
Systematic sampling - 6
( )
V y sys =
1 k
( yi. − y.. )2
k i =1
2
1 k 1 n 1 n
= y ij − y. j
k i =1 n j =1 n j =1
2
n
( )
k
= 2 y ij − y. j
1
n k i =1 j =1
1 k n
( ) ( )( )
k n
= 2 y ij − y. j 2 + y ij − y. j y ij − y. j
n k i =1 j =1 i =1 j j =1
1
n k
= 2 n(k − 1)S wst 2
+ n(n − 1)(k − 1) wst S wst
2
From (a ) and (b )
k −1 2
= S wst 1 + (n − 1) wst (Proved )
nk
Systematic Sampling Vs Stratified Random Sampling:
n 1 1
We know, V ( y st ) = n j − N j p 2j S 2j
j =1
Nj
But N j = k and n j = 1, ( j = 1, 2, , n ) and p j =
k 1
= =
N nk n
1 1
n
V ( y st ) = 1 − 2 S 2j
j =1 kn
(k − 1) n
=
n2k
S 2j
j =1
(k − 1) 1 k
n
( )2 1 k
( )2
= yij − y. j
n 2 k j =1 k − 1 i =1
Since, S j =
2
yij − y. j
k − 1 i =1
(y − y. j )2
k n
2 ij
1
=
n k i =1 j =1
k −1 2
( )2
k n
1
= S wst Since, S wst
2
= y ij − y. j
nk n(k − 1) i =1 j =1
k −1 2
V ( y st )
S wst
E = = nk
V y sys( ) k −1 2
S wst 1 + (n − 1) wst
nk
1
=
1 + (n − 1) wst
Thus we see that the relative efficiency of systematic sampling over stratified random sampling depends upon the
values of wst .
Solution:
Let us suppose that the population has the linear trend given by the model Yi = i ; (i = 1, 2 , , N ) then
Systematic sampling - 7
N N
N (N + 1) N N
N (N + 1)(2 N + 1)
Yi = i = 2
and Yi2 = i 2 = 6
i =1 i =1 i =1 i =1
N
(N + 1)
Yi =
1
YN =
N i =1 2
( )2
N
1
and S2 = Yi − YN
N − 1 i =1
1 N 2 2
=
Yi − NYN
N − 1 i =1
1 N (N + 1)(2 N + 1) N (N + 1)2
= −
N − 1 6 4
N (N + 1)
=
12
1 1 N −n 2
V ( y n )R = − S 2 = S
n N Nn
n(k − 1) nk (nk + 1)
= Since N = nk
n2k 12
=
(k − 1)(nk + 1)
(1)
12
k −1 n
We known, V ( y st ) = 2 S 2j
n k j =1
N (N + 1)
We have S 2 = for the population of N units. Since j th stratum consists of k units, we have
12
k − 1 nk (k + 1) k 2 − 1
V ( y st ) = = (2)
n2k 12 12n
( )
For finding out V y sys , we have
Systematic sampling - 8
y i. = mean of the values of i th sample
1 n
= yij
n j =1
=
1
i + (i + k ) + (i + 2k ) + + i + (n − 1)k
n
= ni + 1 + 2 + + (n − 1)k
1
n
1 (n − 1)
= + k (3)
i 2
Also
N + 1 nk + 1
y.. = YN = = (4)
2 2
k +1
y i. − y.. = i −
2
( ) 1 k
V y sys = ( y i. − y.. )2
k i =1
2
1 k k +1
= i −
k i =1 2
1 k 2 k + 1 k + 1
2
=
k i =1
i +
2
− 2i
2
2
1 k 2 k + 1 k +1 k
=
k i =1
i + −
2
i
k i =1
=
(k + 1)(2k + 1) + (k + 1)2 − (k + 1)2 =
k 2 −1
(5)
6 4 2 12
From (1), (2) and (5) , we get
( ) k +1
V ( y st ) : V y sys : V ( y n ) ::
n
: (k + 1) : (nk + 1)
1
:1: n (approx.)
n
( )
V ( y st ) V y sys V (yn )
Thus if the population is suspect of a linear trend then stratified random sampling is most effective (with
systematic sampling as the next best) in eliminating the effect of linear trend.
Systematic sampling - 9