Professional Documents
Culture Documents
Stratified Sampling
Stratified Sampling
Stratified Sampling
An important objective in any estimation problem is to obtain an estimator of a population parameter
which can take care of the salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample, and in turn, the sample mean will serve as a good estimator of the population
mean. Thus, if the population is homogeneous with respect to the characteristic under study, then the
sample drawn through simple random sampling is expected to provide a representative sample.
Moreover, the variance of the sample mean not only depends on the sample size and sampling fraction
but also on the population variance. In order to increase the precision of an estimator, we need to use a
sampling scheme which can reduce the heterogeneity in the population. If the population is
heterogeneous with respect to the characteristic under study, then one such sampling procedure is a
stratified sampling.
Example: In order to find the average height of the students in a school of class 1 to class 12, the
height varies a lot as the students in class 1 are of age around 6 years, and students in class 10 are of
age around 16 years. So one can divide all the students into different subpopulations or strata such as
Students of class 1, 2 and 3: Stratum 1
Students of class 4, 5 and 6: Stratum 2
Students of class 7, 8 and 9: Stratum 3
Students of class 10, 11 and 12: Stratum 4
Now draw the samples by SRS from each of the strata 1, 2, 3 and 4. All the drawn samples combined
together will constitute the final stratified sample for further analysis.
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 1
Notations:
We use the following symbols and notations:
N : Population size
k : Number of strata
Ni : Number of sampling units in ith strata
k
N Ni
i 1
Population (N units)
Sample Sample k
n ni
Sample
1 2 ……… k i 1
n1 units n2 units nk units
…
Strata are constructed such that they are non-overlapping and homogeneous with respect to the
k
characteristic under study such that N
i 1
i N.
Draw a sample of size ni from ith ( i 1, 2,..., k ) stratum using SRS (preferably WOR)
independently from each stratum.
All the sampling units drawn from each stratum will constitute a stratified sample of size
k
n ni .
i 1
In cluster sampling, the clusters are constructed such that they are
within heterogeneous and
among homogeneous.
[Note: We discuss the cluster sampling later.]
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk from each of the
strata. So, one can have k estimators of a parameter based on the sizes n1 , n2 ,..., nk respectively. Our
interest is not to have k different estimators of the parameters, but the ultimate goal is to have a single
estimator. In this case, an important issue is how to combine the different sample information together
into one estimator, which is good enough to provide information about the parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
ni
1
yi
ni
y
j 1
ij : sample mean from ith stratum
1 k k
Y NiYi wY
Ni
i i : population mean where wi .
N i 1 i 1 N
1 k
y ni yi
n i 1
as a possible estimator of Y .
unbiased estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean
of strata sample means with strata sizes as weights given by
1 k
yst Ni yi .
N i 1
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 4
Now
1 k
E ( yst ) Ni E ( yi )
N i 1
1 k
Ni Y i
N i 1
Y
Variance of yst
k k ni
Var ( yst ) w Var ( yi )
2
i w w Cov( y , y ).
i j i j
i 1 i ( j ) 1 j 1
Since all the samples have been drawn independently from each of the strata by SRSWOR so
Cov( yi , y j ) 0, i j
Ni ni 2
Var ( yi ) Si
Ni ni
where
1 Ni
Si2
N i 1 j 1
(Yij Y i ) 2 .
Thus
k
N i ni 2
Var ( yst ) wi2 Si
i 1 N i ni
k
n Si2
w 1 i
2
i .
i 1 Ni ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the strata.
If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small.
The total variation in the population is fixed and can be orthogonally partitioned into between and
within strata variations, i.e.,
Total variation = Between strata variation + Within strata variation ( Si2 ).
Since Si2 is small, so obviously “Between strata variation” has to be large. That is why it was
mentioned earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2
Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) Si2
1 ni
where si2 ( yij yi )2
ni 1 j 1
N i ni 2
and Var ( yi ) si
N i ni
k
so Var ( yst ) wi2 Var ( yi )
i 1
k
N n 2
wi2 i i si
i 1 N i ni
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this
case
k
yst wi yi
i 1
E ( yst ) Y
k
N i 1 2 k 2 i2
Var ( yst ) w 2
i Si wi
i 1 N i ni i 1 ni
k
w2 s 2
Var ( yst ) i i
i 1 ni
Ni
1
where i2
ni
(y
j 1
ij yi ) 2 .
Note: The sample size cannot be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size, whereas variability is
inversely proportional to the sample size.
Based on different ideas, some allocation procedures are as follows:
1. Equal allocation
Choose the sample size ni to be the same for all the strata.
Draw samples of equal size from each stratum.
Let n be the sample size and k be the number of strata, then
n
ni for all i 1, 2,..., k .
k
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size N i , i.e.,
ni Ni
or ni CNi
where C is the constant of proportionality.
k k
n CN
i 1
i
i 1
i
or n CN
n
C .
N
n
Thus ni N i .
N
Such allocation arises from considerations like operational convenience.
n C N S
i 1
i
i 1
*
i i
k
or n C * N i Si
i 1
n
or C * k
.
N S
i 1
i i
nNi Si
Thus ni k
.
N S
i 1
i i
k
This allocation arises when the Var yst is minimized subject to the constraint n
i 1
i (prespecified).
There are some limitations to the optimum allocation. The knowledge of Si (i 1, 2,..., k ) is needed to
know ni . If there are more than one characteristics, then they may lead to conflicting allocation.
where
C : total cost
C0 : overhead cost, e.g., setting up the office, training people etc
To find ni under this cost function, consider the Lagrangian function with a Lagrangian
multiplier as
k
w2 S 2 k k
w2 S 2
i i 2 Ci ni i i
i 1 ni i 1 i 1 Ni
2
kw S
i i Ci ni terms independent of ni .
i 1
ni
How to determine ?
There are two ways to determine .
(i) Minimize variability for a fixed cost.
(ii) Minimize cost for given variability.
We consider both cases.
Ci wi Si
or i 1
.
C0*
1 wi Si
Substituting in the expression for ni , the optimum ni is obtained as
Ci
wi Si C0*
ni
*
.
Ci
k
i 1
Ci wi Si
k
wS i i
w S Ci
ni i i i 1 k 2 2 .
Ci wi Si
V0 N
i 1 i
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n ni .
i 1
Sample size under proportional allocation for fixed cost and for fixed variance
k
(i) If cost C C0 is fixed then C0 C n .
i 1
i i
n
Under proportional allocation, ni N i nwi
N
k
C0
So C0 n wi Ci or n
Co wi
. Thus ni .
i 1
k
wC i i
wiCi
i 1
k
The required sample size to estimate Y in this case is n ni .
i 1
w S 2 2
i i
or n i 1
k
wi2 Si2
V0
i 1 Ni
k
w S 2 2
i i
or ni wi i 1
.
w2 S 2 k
V0 i i
i 1 Ni
This is known as Bowley’s allocation.
n
k Ni Ni 2
Ni 2
Varprop ( y ) st N
Si
Ni Ni N
i 1
n
N
N n k N i Si 2
Nn i 1 N
N n k
Nn i 1
wi Si2 .
N S
i 1
i i
k
1 1
Vopt ( yst ) wi2 Si2
i 1 ni Ni
2 2
k
wi Si k
wi2 Si2
i 1 ni i 1 Ni
k
k N i Si k w2 S 2
wi2 Si2 i 1 i i
i 1 nN i i
S i 1 Ni
k
1 N S k k w2 S 2
. i 2 i N i Si i i
i 1 n N i 1 i 1 N i
2 2
1 k N S k
w2 S 2 1 k 1 k
i i i i wi Si wS i i
2
.
n i 1 N i 1 N i n i 1 N i 1
In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N 1) S 2
(Y ij Y )2
i 1 j 1
k Ni 2
(Yij Yi ) (Yi Y )
i 1 j 1
k Ni k Ni
(Y ij Yi )
2
(Y Y )
i
2
i 1 j 1 i 1 j 1
k k
( N i 1) Si2 N (Y Y ) i i
2
i 1 i 1
Ni 1 N 1
1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2 Si (Yi Y ) 2
i 1 N i 1 N
N n 2 N n k
Ni 2 N n k
Ni N -n
or
Nn
S
Nn
i 1 N
Si
Nn
i 1 N
(Yi Y ) 2 (Premultiply by
Nn
on both sides)
N n k
VarSRS (Y ) V prop ( y st )
Nn
w (Y Y )
i 1
i i
2
k
Since w (Y Y )
i 1
i i
2
0,
Consider
N n k 2
1 k
2
1 k
V prop ( yst ) Vopt ( yst ) wi Si wi Si wS
2
i i
Nn i 1 n i 1 N i 1
1 k
2
k
wi Si wi Si
2
n i 1 i 1
1 k 1
wi Si2 S 2
n i 1 n
k
1
wi ( Si S ) 2
n i 1
k
where S wi Si and the larger gain in efficiency is achieved when S i differs from S more.
i 1
Combining the results in (a) and (b), we have Varopt ( yst ) Varprop ( yst ) VarSRS ( y )
1 ni
s
2
i
ni 1 j 1
( yij yi )2 .
In stratified sampling,
k
Ni ni 2
Var ( yst ) wi2 Si .
i 1 Ni ni
The second term in this expression represents the reduction due to finite population correction.
The confidence limits of Y can be obtained as
assuming yst is normally distributed and Var ( yst ) is well determined so that t can be read from
normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.
The distribution of Var ( yst ) is generally complex. An approximate method of assigning an effective
2
k 2
gi si
i 1
number of degrees of freedom (ne ) to Var ( yst ) is ne k 2 4
gi si
i 1 ni 1
Ni ( Ni ni ) k
where gi
ni
and Min(ni 1) ne (n 1) assuming y
i 1
i ij are normally distributed.
n1 N1
and
(n N1 ) wi Si
ni k
; i 2,3,..., k
wS
i 2
i i
Suppose in revised allocation, we find that n2 N2 then the revised allocation would be
n1 N1
n2 N 2
(n N1 N 2 ) wi Si
ni k
; i 3, 4,..., k .
wS
i 3
i i
In such cases, the formula for the minimum variance of yst need to be modified as
( * wi Si )2 *
wi Si2
Min Var ( y st )
n* N
where *
denotes the summation over the strata in which ni Ni and n* is the revised total sample
where Qi 1 Pi .
k
Ni ni 2 2
Also Var ( yst ) wi Si .
i 1 Ni ni
1 k
Ni2 ( Ni ni ) PQ
So Var ( pst )
N2
i 1 Ni 1
i i
ni
.
N n 1 k N i2 PQ
Varprop ( pst ) i i
N Nn i 1 N i 1
N n k
wi PQ
Nn i 1
i i
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni N i i i
Ni 1
N i PQ
i i
Ni PQ
Thus ni n k
i i
.
N
i 1
i PQ
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C C0 Ci ni is
i 1
PQ
i i
nN i
Ci
ni k
.
PQ
N
i 1
i
i i
Ci
k Ni 2
(Yij Yi ) (Yi Y )
i 1 j 1
k Ni k
(Yij Y ) 2 N i (Yi Y ) 2
i 1 j 1 i 1
k k
( N i 1) Si2 N i (Yi Y ) 2
i 1 i 1
k
k
( N i 1) Si2 N wiYi 2 Y 2 .
i 1 i 1
In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by
one.
E ( si2 ) Si2
So Sˆi2 si2 .
Var ( yi ) E ( yi 2 ) [ E ( yi )]2
E ( yi 2 ) Yi 2
or Yi 2 E ( yi 2 ) Var ( yi ).
An unbiased estimate of Yi 2 is
Substituting these estimates in the expression (n 1)S 2 as follows, the estimate of S 2 is obtained as
k
k
( N 1) S 2 ( N i 1) Si2 N wi Yi 2 Y 2
i 1 i 1
k
N k
w iYˆi 2 Yˆ 2
1
as Sˆ 2
N 1 i 1
( N i 1) Sˆi2
N 1 i 1
1 k 2 N k N n 2 2 k N i ni 2 2
N 1 i 1
i i
N 1 s wi yi2 i i
N 1 i 1
si yst wi si
N i ni i 1 N i ni
1 k 2 N k k
N n
N 1 i 1
N i 1 si
N 1 i 1
wi ( yi y st ) 2
wi (1 wi ) i i si2 .
i 1 N i ni
Thus
N n ˆ2
Var SRS ( y ) S
Nn
N n k 2 N ( N n) k k
N n
N ( N 1)n i 1
( N i 1) si
nN ( N 1) i 1
wi ( yi y st ) 2
wi (1 wi ) i i si2
i 1 N i ni
and
k
Ni ni 2 2
Var ( yst ) wi si .
i 1 Ni ni
If any other particular allocation is used, then substituting the appropriate ni under that allocation,
The subsamples need not necessarily be independent. The assumption of independent subsamples
helps in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful
if the sample design is complicated and the expression for variance of the composite estimator is
complex.
Note that
1 g
E Var ( t ) E (t j ) 2 g ( t ) 2
g ( g 1)
j 1
1 g
Var (t j ) g Var ( t )
g ( g 1) j 1
1
( g 2 g )Var ( t ) Var ( t )
g ( g 1)
If the distribution of each estimator tj is symmetric about , then the confidence interval of can be
obtained by
g 1
1
P Min(t1 , t2 ,..., t g ) Max(t1 , t2 ,..., t g ) 1 .
2
Let Yˆij ( tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
Note: This topic is to be read after the next module on ratio method of estimation. Since it is related to
the stratification, so it is given here.
In post-stratification,
draw a sample by simple random sampling from the population and carry out the survey.
After the completion of the survey, stratify the sampling units to increase the precision of the
estimates.
Assume that the stratum size N i is fairly accurately known. Let
m n.
i 1
i
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi 0 is negligibly
small. In case, mi 0 for some strata, two or more strata can be combined to make the sample size
non-zero before evaluating the final estimates.
1 1
To find E , proceed as follows :
mi Ni
Consider the estimate of ratio based on ratio method of estimation as
n N
y
y j
Y
Y j
Rˆ j 1
n
, R j 1
N
.
x X
x X
j j
j 1 j 1
We know that
N n RS X2 S XY
E ( Rˆ ) R . .
Nn X2
1 if j th unit belongs to i th stratum
Let x j
0 otherwise
and
y j 1 for all j = 1,2,...,N.
y j
n
Rˆ j 1
n
x
ni
j
j 1
N
Yj 1
j
N
R N
X
Ni
j
j 1
1 N 2 2 1 N i2 1 Ni 2
S X j NX Ni N 2 Ni
2
N 1 j 1 N 1 N N 1
x
N
1 N 1 N N
S xy X jY j NXY N i i 2 0.
N 1 j 1 N 1 N
n N N ( N n)( N Ni )
E ( Rˆ ) R E .
ni Ni nNi2 ( N 1)
Thus
1 1 N N ( N n)( N Ni ) 1
E
ni N i nNi n 2 N i2 ( N 1) Ni
( N n) N N 1
1 .
n( N 1) N i N i n n
1 1 ( N n) N N 1
E 1
mi Ni n( N 1) Ni Ni n n
Now substitute this in the expression of Var ( y post ) as
k 1 1
Var ( y post ) wi2 E Si2
i 1 mi N i
k N n N N 1
wi2 Si2 . 1
i 1 ( N 1)n N i nN i n
N n k 2 2 1 1 1
n( N 1) i 1
wi Si 1
wi nwi n
N n k 1
2
n ( N 1) i 1
wi Si2 n 1
wi
N n k
n ( N 1) i 1
2
(nwi 1 wi ) Si2
N n k N n k
i i n2 ( N 1)
n( N 1) i 1
w S 2
i 1
(1 wi ) Si2 .
Assuming N 1 N.
N n n N n n
V ( y post )
Nn i 1
wi Si2 2 (1 wi ) Si2
n N i 1
N n n
V prop ( yst ) (1 wi )Si2 .
Nn 2 i 1
The second term is the contribution to the variance of y post due to mi ' s not being proportionately
distributed.
N n k N n 2 k
2 w 1)
(1 wi ) S w2 S w (k 1) (Since i
Nn i 1 Nn 2 i 1
k 1 N n 2
Sw
n Nn
k 1
Var ( yst ).
n
n
The increase in the variance over Varprop ( yst ) is small if the average sample size n per stratum is
2
reasonably large.
Thus a post-stratification with a large sample produces an estimator which is almost as precise as an
estimator in the stratified sampling with proportional allocation.