Professional Documents
Culture Documents
Stratified Sampling
An important objective in any estimation problem is to obtain an estimator of a population parameter
which can take care of the salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample and in turn, the sample mean will serve as a good estimator of population mean.
Thus, if the population is homogeneous with respect to the characteristic under study, then the sample
drawn through simple random sampling is expected to provide a representative sample. Moreover, the
variance of sample mean not only depends on the sample size and sampling fraction but also on the
population variance. In order to increase the precision of an estimator, we need to use a sampling
scheme which can reduces the heterogeneity in the population. If the population is heterogeneous with
respect to the characteristic under study, then one such sampling procedure is stratified sampling.
Example: In order to find the average height of the students in a school of class 1 to class 12, the
height varies a lot as the students in class 1 are of age around 6 years and students in class 10 are of
age around 16 years. So one can divide all the students into different subpopulations or strata such as
Population (N units)
1 2 … … … n ni
k i 1
n1 units n2 units nk
it
Strata are constructed such that they are non-overlapping and homogeneous with respect to the
k
characteristic under study such that N
i 1
i N.
Draw a sample of size ni from ith ( i 1, 2,..., k ) stratum using SRS (preferably WOR)
independently from each stratum.
All the sampling units drawn from each stratum will constitute a stratified sample of size
k
n ni .
i 1
In cluster sampling, the clusters are constructed such that they are
within heterogeneous and
among homogeneous.
[Note: We discuss the cluster sampling later.]
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk from each of the
strata. So, one can have k estimators of a parameter based on the sizes n1 , n2 ,..., nk respectively. Our
interest is not to have k different estimators of the parameters but the ultimate goal is to have a single
estimator. In this case, an important issue is how to combine the different sample information together
into one estimator which is good enough to provide the information about the parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
1 Ni
Yi
Ni
y
j 1
ij : population mean of ith stratum
1 ni
yi
ni
y
j 1
ij : sample mean from ith stratum
1 k k
Ni
Y N Y
i i i i : population mean where wi
wY .
N i 1 i 1 N
unbiased estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean
of strata sample means with strata sizes as weights given by
1 k
yst Ni yi .
N i 1
Now
1 k
E ( yst ) Ni E( yi )
N i 1
1 k
Ni Y i
N i 1
Y
Variance of yst
k k ni
Var ( yst ) wi2 Var ( yi ) w w Cov( y , y ).
i j i j
i 1 i ( j ) 1 j 1
Since all the samples have been drawn independently from each of the strata by SRSWOR so
Cov( yi , y j ) 0, i j
Ni ni 2
Var ( yi ) Si
Ni ni
where
1 Ni
Si2
Ni 1 j 1
(Yij Y i ) 2 .
Thus
k
Ni ni 2
Var ( yst ) wi2 Si
i 1 Ni ni
k
n S2
wi2 1 i i .
i 1 Ni ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the
strata. If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small. That is why it was
For example, the units in geographical proximity will tend to be more closer. The consumption pattern
in the households will be similar within a lower income group housing society and within a higher
income group housing society whereas they will differ a lot between the two housing societies based
on income.
Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) Si2
1 ni
where si2 ( yij yi )2
ni 1 j 1
( y ) N i ni s 2
and Var i i
N i ni
k
so Var st i ( yi )
( y ) w2 Var
i 1
k
N n 2
wi2 i i si
i 1 N i ni
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this
case
k
yst wi yi
i 1
E ( yst ) Y
k
N 1 k
2
Var ( yst ) wi2 i Si2 wi2 i
i 1 N i ni i 1 ni
k 2 2
wi si
( y )
Var st
i 1 ni
1 Ni
where
ni
i
2
(y j 1
ij yi ) 2 .
effective way?
There are two aspects of choosing the sample sizes:
(i) Minimize the cost of survey for a specified precision.
(ii) Maximize the precision for a given cost.
Note: The sample size cannot be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size whereas variability is
inversely proportional to the sample size.
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size Ni , i.e.,
ni Ni
or ni CNi
where C is the constant of proportionality.
k k
ni CNi
i 1 i 1
or n CN
n
C .
N
n
Thus ni Ni .
N
Such allocation arises from the considerations like operational convenience.
ni C * N i Si
i 1 i 1
k
or n C * N i S i
i 1
n
or C * k
N S
i 1
i i
nN i S i
Thus ni k
.
N S
i 1
i i
k
This allocation arises when the Var y st is minimized subject to the constraint n
i 1
i (prespecified).
There are some limitations of the optimum allocation. The knowledge of Si (i 1, 2,..., k ) is needed to
know ni . If there are more than one characteristics, then they may lead to conflicting allocation.
where
C : total cost
C0 : overhead cost, e.g., setting up of office, training people etc
To find ni under this cost function, consider the Lagrangian function with Lagrangian
multiplier as
How to determine ?
There are two ways to determine .
(i) Minimize variability for fixed cost .
(ii) Minimize cost for given variability.
We consider both the cases.
Ci wi Si
or i 1
.
C0*
wS C0*
ni* i i k .
Ci
Ci wi Si
i 1
The required sample size to estimate Y such that the variance is minimum for given cost C C0* is
k
n ni* .
i 1
k
wi Si
wi Si Ci
ni i 1 .
Ci k
wi2 Si2
0 N
V
i 1 i
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n ni .
i 1
n
Under proportional allocation, ni Ni nwi
N
k
So C0 n wi Ci
i 1
C0
or n k
.
wiCi
i 1
Co wi
Thus ni .
wiCi
k
The required sample size to estimate Y in this case is n ni .
i 1
w S 2 2
i i
or n i 1
k
wi2 Si2
V0
i 1 Ni
k
w S 2 2
i i
or ni wi i 1
.
wi2 Si2k
V0
i 1 Ni
This is known Bowley’s allocation.
n
k Ni Ni 2
Ni 2
Varprop ( y ) st N
Si
Ni Ni N
i 1
n
N
N n k
N i Si2
Nn i 1 N
N n k
wi Si2 .
Nn i 1
N S
i 1
i i
k
1 1
Vopt ( yst ) wi2 Si2
i 1 ni Ni
k
w2 S 2 k w 2 S 2
i i i i
i 1 ni i 1 Ni
k
k N i Si k w2 S 2
wi Si i 1
2 2
i i
i 1 nN i Si i 1 N i
k
1 N S k k w2 S 2
. i 2 i N i Si i i
i 1 n N i 1 i 1 N i
2 2
1 k N S k
w2 S 2 1 k 1 k
i i i i wi Si wS i i
2
.
n i 1 N i 1 N i n i 1 N i 1
In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N 1) S 2 (Y ij Y )2
i 1 j 1
k Ni 2
(Yij Yi ) (Yi Y )
i 1 j 1
k Ni k Ni
(Yij Yi )2 (Y Y ) i
2
i 1 j 1 i 1 j 1
k k
( N i 1) Si2 N (Y Y ) i i
2
i 1 i 1
N 1 2 N 1 2 k k
Ni
S i Si (Yi Y ) 2 .
N i 1 N i 1 N
Ni 1 N 1
1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2 Si (Yi Y ) 2
i 1 N i 1 N
N n 2 N n k
Ni 2 N n k
Ni N -n
or
Nn
S
Nn
i 1 N
Si
Nn
i 1 N
(Yi Y ) 2 (Premultiply by
Nn
on both sides)
N n k
VarSRS (Y ) V prop ( y st )
Nn
w (Y Y )
i 1
i i
2
k
Since w (Y Y )
i 1
i i
2
0,
Consider
N n k 2
1 k
2
1 k
V prop ( yst ) Vopt ( yst ) wi Si wi Si 2
wi i
S
Nn i 1 n i 1 N i 1
1 k
2
k
i i i i
2
w S w S
n i 1 i 1
1 k 1
wi Si2 S 2
n i 1 n
1 k
wi ( Si S )2
n i 1
where
k
S wi Si
i 1
1 ni
si2
ni 1 j 1
( yij yi )2 .
In stratified sampling,
k
N i ni 2
Var ( yst ) wi2 Si .
i 1 N i ni
The second term in this expression represents the reduction due to finite population correction.
The confidence limits of Y can be obtained as
( y )
yst t Var st
normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.
( y ) is
number of degrees of freedom (ne ) to Var st
2
k 2
gi si
ne i k1 2 4
gi si
i 1 ni 1
Ni ( Ni ni )
where gi
ni
k
and Min(ni 1) ne (ni 1)
i 1
n1 N1
and
(n N1 ) wi Si
ni k
; i 2,3,..., k
wS
i2
i i
Suppose in revised allocation, we find that n2 N2 then the revised allocation would be
n1 N1
n2 N 2
( n N1 N 2 ) wi Si
ni k
; i 3, 4,..., k .
wi Si
i 3
In such cases, the formula for minimum variance of yst need to be modified as
( * wi Si )2 *
wi Si2
Min Var ( y st )
n* N
where *
denotes the summation over the strata in which ni Ni and n* is the revised total sample
where Qi 1 Pi .
k
Ni ni 2 2
Also Var ( yst ) wi Si .
i 1 Ni ni
1 k Ni2 ( Ni ni ) PQ
So Var ( pst ) N 1 ni i .
N 2 i 1 i i
N n 1 k N i2 PQ
Varprop ( pst ) i i
N Nn i 1 N i 1
N n k
wi PQ
Nn i 1
i i
prop ( p ) N n w pi qi .
k
Var st i
Nn i 1 ni 1
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni N i i i
Ni 1
N i PQ
i i
N i PQ
Thus ni n k
i i
.
N
i 1
i PQ
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C C0 Ci ni is
i 1
PQ
i i
nN i
Ci
ni k
.
PQ
N
i 1
i
i i
Ci
k Ni 2
(Yij Yi ) (Yi Y )
i 1 j 1
k Ni k
(Yij Y ) 2 N i (Yi Y ) 2
i 1 j 1 i 1
k k
( N i 1) Si2 N i (Yi Y ) 2
i 1 i 1
k
k
( N i 1) Si2 N wiYi 2 Y 2 .
i 1 i 1
In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by
one.
E ( si2 ) Si2
So Sˆi2 si2 .
Var ( yi ) E ( yi 2 ) [ E ( yi )]2
E ( yi 2 ) Yi 2
or Yi 2 E ( yi 2 ) Var ( yi ).
An unbiased estimate of Yi 2 is
N n 2
yi2 i i si .
N i ni
So an estimate of Y 2 is
Yˆ 2 yst2 Var
( y )
st
k
N n 2 2
yst2 i i wi si .
i 1 N i ni
SRS ( y ) N n Sˆ 2
Var
Nn
N n k 2 N ( N n) k k
N n
N ( N 1)n i 1
( N i 1) si i i
nN ( N 1) i 1
w ( y y st ) 2
wi (1 wi ) i i si2
i 1 N i ni
and
N i ni 2 2
k
( y )
Var st
i 1 N i ni
wi si .
The subsamples need not necessarily be independent. The assumption of independent subsamples
helps in obtaining an unbiased estimate of the variance of the composite estimator. This is even
helpful if the sample design is complicated and the expression for variance of the composite estimator
is complex.
1 g
(t )
E Var E (t j ) 2 g ( t ) 2
g ( g 1)
j 1
1 g
Var (t j ) g Var ( t )
g ( g 1) j 1
1
( g 2 g )Var ( t ) Var ( t )
g ( g 1)
If the distribution of each estimator tj is symmetric about , then the confidence interval of can be
obtained by
Let Yˆij (tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
1 L
(Yˆ )
Var j ( tot )
L( L 1) i 1
(Yˆij (tot ) Yˆj (tot ) )2 .
1 L k
L( L 1) i 1 j 1
(Yˆij ( tot ) Yˆj ( tot ) ) 2 .
In post stratification,
draw a sample by simple random sampling from the population and carry out the survey.
After the completion of survey, stratify the sampling units to increase the precision of the
estimates.
Assume that the stratum size Ni is fairly accurately known. Let
m n.
i 1
i
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi 0 is
negligibly small. In case, mi 0 for some strata, two or more strata can be combined to make the
1 1
To find E , proceed as follows :
mi Ni
Consider the estimate of ratio based on ratio method of estimation as
n N
y
yj Y
Y j
Rˆ
j 1 j 1
n
, R N
.
xj Xj
x X
j 1 j 1
We know that
N n RS X2 S XY
E ( Rˆ ) R . .
Nn X2
1 if j th unit belongs to i th stratum
Let xj
0 otherwise
and
y j 1 for all j = 1,2,...,N.
y j
n
Rˆ j 1
n
x
ni
j
j 1
N
Y j 1
j
N
R N
X
Ni
j
j 1
1 N 2 2 1 N i2 1 Ni 2
S X j NX Ni N 2 Ni
2
N 1 j 1 N 1 N N 1
x
N
1 N 1 N N
S xy X jY j NXY N i i 2 0.
N 1 j 1 N 1 N
n N N ( N n)( N N i )
E ( Rˆ ) R E .
ni N i nN i2 ( N 1)
Thus
1 1 N N ( N n)( N N i ) 1
E
ni N i nN i n 2 N i2 ( N 1) Ni
( N n) N N 1
1 .
n( N 1) N i N i n n
1 1 ( N n) N N 1
E 1
mi Ni n( N 1) Ni Ni n n
Now substitute this in the expression of Var ( y post ) as
N n k N n k
i i n2 ( N 1)
n( N 1) i 1
w S 2
i 1
(1 wi ) Si2 .
Assuming N 1 N .
N n n N n n
V ( y post )
Nn i 1
i i n2 N
w S 2
i 1
(1 wi ) Si2
N n n
2
V prop ( yst ) (1 wi ) Si2 .
Nn i 1
The second term is the contribution in the variance of y post due to mi ' s not being proportionately
distributed.
If Si2 S w2 , say for all i, then the last term in the expression is
N n k N n 2 k
2 w 1)
(1 wi ) Sw2 Sw (k 1) (Since
Nn2
i
Nn i 1 i 1
k 1 N n 2
Sw
n Nn
k 1
Var ( yst ).
n
n
The increase in the variance over Varprop ( yst ) is small if the average sample size n per stratum is
2
reasonably large.
Thus a post stratification with a large sample produces an estimator which is almost as precise as an
estimator in the stratified sampling with proportional allocation.