0% found this document useful (0 votes)
134 views19 pages

Understanding Stratified Sampling Techniques

Stratified sampling involves dividing a population into non-overlapping sub-populations, or strata, that are homogeneous within but heterogeneous between. A random sample is then selected from each stratum, allowing for more accurate estimates of population parameters by accounting for known characteristics related to the variable of interest. Key principles include ensuring strata are exhaustive and homogeneous, and the process involves careful selection of stratification variables and sample allocation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views19 pages

Understanding Stratified Sampling Techniques

Stratified sampling involves dividing a population into non-overlapping sub-populations, or strata, that are homogeneous within but heterogeneous between. A random sample is then selected from each stratum, allowing for more accurate estimates of population parameters by accounting for known characteristics related to the variable of interest. Key principles include ensuring strata are exhaustive and homogeneous, and the process involves careful selection of stratification variables and sample allocation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

04 Stratified sampling

❖ Introduction:
Sometimes the population is divided into several non-overlapping sub-populations or groups
in such a way that units within themselves are homogeneous but between themselves they are
heterogeneous. These sub-populations or groups are called strata.

Sampling frames are constructed for each of these strata and the sampling is performed
independently within each stratum. If a random sampling strategy is followed to select the
sample within each stratum, the whole procedure is called stratified random sampling.

❖ Stratified random sampling:


Stratified sampling is a sampling plan in which the population is divided into several non-
overlapping strata and select a random sample from each stratum in such a way that units
within the strata are homogeneous but between strata they are heterogeneous.

Strata are generally formed based on some known characteristics of the population, which is
believed to be related to the variable of interest. This variable is known as auxiliary variable
or stratification variable or stratification factor.

The justification of adopting stratified sampling is that if we know nothing about the structure
of the population apart from its size, we cannot do better than take a simple random sample.
However, it is an extreme situation that we know nothing about the population other than its
size.

Most often, we know that a population consists of individuals who can be classified by their
characteristics such as religion, socio-economic status, income, expenditure, occupation, level
of education, etc.

For example, in studying the living and working conditions of the people, different types of
area (that is, City Corporation, municipal, urban, semi-urban, rural, etc.) may serve as
stratification variable, since this variable is believed to be related to the living and working
conditions of the people.

For another example, in studying the television viewing habit among the university students,
academic performance score (high, medium, low) or place of residence (urban, rural) may
serve as stratification variables, since each of these variables is believed to be related to the
television viewing habit of the students.

Stratification process possesses the following salient features:


1) The entire population is divided into several distinct sub-populations, called
strata.
2) Within each stratum, a separate and independent sample is selected.
3) For each individual stratum, stratum mean, proportion, variance and other
characteristics are computed.
4) These estimates are then properly weighted to form a combined estimate for
the entire population.

❖ Principles of stratification:
For stratification, a few principles should be followed to take full advantage of stratified
sampling. These are:

1
04 Stratified sampling

1) The strata should be non-overlapping and exhaustive.


2) The strata should be made as homogeneous as possible, ensuring greater
similarity within the strata than between the strata.
3) Strata are to be formed on the basis of some known characteristics of the
population, which are believed to have some relationship with the subject of
inquiry and variable of interest.
4) When stratification with respect to the characteristics under study becomes
difficult for practical reasons, administrative convenience may be considered
as the basis for forming the strata.
5) To improve the sampling design, strata should be formed on the basis of
natural characteristics.
6) Past data, intuition, expert judgment or preliminary findings from pilot surveys
may also be used to set up the strata. This, however, requires that we have
prior knowledge of the nature of the population from which we are sampling.

❖ Steps Involved in Stratified Sampling:


In carrying out stratified sampling, some important points need to be carefully considered
(Murthy, 1977). These include, among others,
1) Choice of stratification variable
2) Formation of strata
3) Number of strata
4) Sampling within the strata
5) Allocating sample to strata

❖ Estimators and their properties:


Consider a population consisting of N units. For stratified sampling, the population of N
k
units is first divided into k distinct classes or groups N1, N2 , . . . , Nk such that  Ni = N . These
i =1
sub-populations are our strata.

When the strata have been identified, samples of predetermined sizes are drawn from each
stratum. If n 1, n 2 , . . . , n k denote the sizes of the samples to be drawn from strata of sizes
k
N1, N2 , . . . , Nk , respectively, then  n i = n .
i =1

If a random sampling procedure is followed in selecting samples from each stratum, the
whole procedure is described as stratified random sampling. Now, let us suppose that y ij is
the value of the j th unit in the i th stratum. So, the mean of the Ni elements in the i th stratum
is given by:
Yi Ni
1
Yi =
Ni
=
Ni
 yij
j =1

The stratum variance is given by:


Ni
 ( y ij − Y i )
1 2
Si 2 =
Ni − 1 j =1
th
The sample mean for the i stratum is given by:
ni
yi 1
yi =
ni
=
ni
 yij
j =1

2
04 Stratified sampling

The sample variance is defined as:


ni

 ( y ij − yi )
1 2
si2 =
n i −1 j =1

Layout of stratified sampling


Strata
Elements 1 2 … i … k
. .
1 y11 y21 … yi1 … yk 1
… …
2 y12 y22 … yi 2 … yk 2
… …
…. …… …… … …… … ……
… …
j y1 j y2 j … yij … ykj
… …
….. …… …… … …… … ……
. …
Ni y1N1 y2N 2 … yiNi … ykN k
… …
Population N1 N2 … Ni
… Nk

Total Y1 =  j =1
y1 j Y2 = j =1
y2 j
. Yi = j =1
yij
. Yk = y
j =1
kj

Sample n1 n2 … ni
… nk

Total y1 =  j =1
y1 j y2 = j =1
y2 j
… yi = j =1
yij
… yk = y
j =1
kj

Population N1 N2 … Ni … Nk

   y
Y1 1 Y2 1 Yi 1 Yk 1
Y1 = = y1 j Y2 = = y2 j Yi = = yij Yk = =
Mean N1 N1 N2 N2 … Ni Ni … Nk Nk
kj
j =1 j =1 j =1 j =1

Sample n1 n2 … ni
… nk

   y
y1 1 y2 1 yi 1 yk 1
y1 = = y1 j y2 = = y2 j yi = = yij yk = =
Mean n1 n1 n2 n2 … ni ni … nk nk
kj
j =1 j =1 j =1 j =1

Population N1 N2
… Ni
… Nk

( y ) ( y )
( ) ( y )
1 2 1 2
1 1 2
S12 = − Y1 S22 = − Y2 2 Sk2 = − Yk
N1 − 1
1j
N2 − 1
2j Si2 = yij − Yi Nk − 1
kj
Variance j =1 j =1 … Ni − 1
j =1
… j =1

Sample n1 n2
… ni
… nk
( y )  ( y2 j − y2 )  ( yij − yi ) ( y )
1 2 1 2 1 2 1 2
s12 = 1j − y1 s22 = si2 = sk2 = − yk
n1 − 1 n2 − 1 ni − 1 nk − 1
kj
Variance j =1
j =1 . j =1 . j =1

The weighted population mean is given by:


k
Y w = Wi Yi
i =1

That is, the population mean is equal to the sum of the k strata means Yi , each multiplied by
k
Ni
its proper weight Wi , where Wi = and Wi = 1 . The weighted mean Yw is then equal to the
N i =1
ordinary population mean Y , which is evident from the following:

3
04 Stratified sampling

k k N
N  1 k 1 k 1 k i
Y w = Wi Yi =   i  Yi =  Ni Yi =  Yi =   y ij = Y
i =1 i =1  N  N i =1 N i =1 N i =1 j =1

The mean of the overall stratified sample is necessarily a weighted mean and is defined as:

1 k k
y st = 
N i =1
Ni y i = Wi y i
i =1

The sample mean y i ( i = 1, 2, . . . , k ) is obtained separately and independently within each


stratum and is the multiplied by the stratum weight Wi . These products when summed over all
k strata, result in the weighted sample mean y st . Note that the mean y st is different from the

overall sample mean y , which is defined as:


n
1 k i 1 k k
y=  
n i =1 j =1
y ij =  n i y i =  w i y i
n i =1 i =1

 ni 
Obviously, y st coincides with y only when the sampling fraction   is the same in all the
 Ni 
ni n n i Ni
strata. That is, = = = . This stratification is referred to as the stratification with
Ni N n N
proportional allocation of ni .

In stratified sampling plan, each stratum can be considered a separate population, from each
of which a separate simple random sample is selected. The simple estimator of the sum in
stratum I is N1 y 1 , that in stratum II is N2 y 2 and so on. Therefore, a reasonable estimator of
the total of the stratum sums is the sum of the estimators:
k
N1 y 1 + N 2 y 2 + . . . + N k y k =  Ni y i
i =1
This is the stratified estimator Yˆ st of the population total, since
k k
Yˆ st = N y st = N  Wi y i =  Ni y i
i =1 i =1

❖ Theorem: (4.1):
If in every stratum the sample mean yi is unbiased, then y st is an unbiased estimator of the
population mean Y , which is evident from the following:
Ni
y st =
1 k
 Ni y i
N i =1
( )
= E y st =
1 k

N i =1
( )
1 k 1 k 1 k
N i E y i =  Ni Y i =  Y i == 
N i =1 N i =1 N i =1
 y ij =Y
j =1

❖ Theorem: (4.2):
In stratified random sampling, the variance of y st is given by:

 n  S2 W i2 Si2 W i2 Si2 W i2 Si2 Wi Si2


( )
k k k k k
var y st = W i2 1 − i  i = − = −
i =1  Ni  ni i =1 ni i =1 Ni i =1 ni i =1 N

This theorem can be verified as follows:

4
04 Stratified sampling

( ) ( )
k k
y st = Wi y i = var y st = Wi2 var y i
i =1 i =1

 n  S2 W i2 Si2 W i2 Si2 W i2 Si2 Wi Si2


( )
k k k k k
= var y st = W i2 1 − i  i = − = −
i =1  Ni  ni i =1 ni i =1 Ni i =1 ni i =1 N

 N − ni   i2
( )
k
= var y st = Wi2  i 
i =1  Ni − 1  ni

 ni 
If the sampling fractions   are negligible in all strata, then we have that
 Ni 
Wi2 Si2
( )
k
var y st = 
i =1 ni

ni n
With proportional allocation, that is, with = , we have that
Ni N

 n  S2 1− f  k
( )
k
var y st = W i2 1 − i  i =   Wi Si
2

i =1  Ni  ni  n  i =1

ni n
With proportional allocation, that is, with = , if all the strata have the same variance, say,
Ni N
2
S , we have that
1− f 1− f 1− f 
( )
k k
var y st = Wi Si2 = Wi S 2 =  S
2
n i =1 n i =1  n 

If Yˆ = N y st is the estimate of the population total, then we have that

 n  S2  n  S2 S2
( ) ( )
k k k
var Yˆ = N 2 var y st = N 2 W i2 1 − i  i =  N i2 1 − i  i =  Ni ( Ni − ni ) i
i =1  Ni  ni i =1  Ni  ni i =1 ni

ni n
With proportional allocation, that is, with = , we have from the above equation that
Ni N

 n  S2  n  S2
( )
k k
var Yˆ = N 2 W i2 1 − i  i =  N i2 1 − i  i
i =1  Ni  ni i =1  Ni  ni
1− f  2  N −n
k k
=  N  Ni Si =    Ni Si
2
 n  i =1  n  i =1

 n  s2
( ) ( ) ( ),
k
Now, v y st = s 2 y st = W i2 1 − i  i is an unbiased estimator of var y st which is evident
i =1  Ni  ni
from the following:

( )
E v y st  = E s 2 y
( )  = W 2 1 − ni 
k E si2 k  n  S2 ( ) ( )
   st   i   = W i2 1 − i  i = var y st
i =1  Ni  ni i =1  Ni  ni

Again, we have from the above that

5
04 Stratified sampling

 n  s2 Wi2 si2 k Wi2 si2 Wi2 si2 k Wi si2


( ) ( )
k k k
v y st = s 2 y st = Wi2 1 − i  i = − = −
i =1  Ni  ni i =1 ni i =1 N i i =1 ni i =1 N

❖ Theorem: (4.3):
In stratified random sampling, the variance of the sample proportion p st is given by:
 N − ni  Pi (1 − Pi )
( st ) 
k
var p = W 2 i i   , which is evident from the following:
i =1  Ni − 1  ni

 N − ni   i2  Ni − ni  Pi (1 − Pi )
( )
k k
var y st = Wi2  i  = Wi2  
i =1  Ni − 1  ni i =1  Ni − 1  ni

 ni 
If the sampling fractions   are negligible in all strata, then we have that
 Ni 
Pi (1 − Pi )
( )
k
var p st = Wi2
i =1 ni

ni n
With proportional allocation, that is, with = , we have that
Ni N

 N − ni  Pi (1 − Pi ) k Ni  ni  Wi Pi (1 − Pi ) 1 − f
( )
k k
var p st = Wi2  i   1 −    Wi Pi (1 − Pi )
i =1  i
N − 1  ni i =1 N  N i ni n i =1

❖ Confidence interval:
Estimates of population parameters such as means, proportions and totals under stratified
sampling can be judged for their validity by constructing confidence intervals, that is interval
and estimates having a given probability of containing the population parameters. When ni
and ( Ni − ni ) are large,

An approximate 100 (1 −  ) % confidence interval for Y is given by:

yst  z v ( yst )
2

And that for the population proportion P in a given category is:

pst  z v ( pst )
2

Here, v ( yst ) and v ( pst ) are the unbiased estimates of var ( yst ) and var ( pst ) respectively.

❖ Estimating sample size:


We note that the variance of the stratified estimator depends not only on the overall sample
size but also on its allocation among different strata. One possible approach is to assume that
this allocation has been decided upon; specifically, that a given fraction wi of the total sample
k
size n will be allocated to each stratum. Since,  ni = n , the size of the sample from stratum
i =1

6
04 Stratified sampling

i is therefore n wi . That is, ni = n wi ; i = 1, 2, , k . We assume that the estimate has a specified


variance V0 . Recall that the expression for var ( yst ) is:

Wi2 Si2 k Wi2 Si2 W 2 S2 k W S2


( )
k k
var y st =  − =  i i − i i
i =1 ni i =1 Ni i =1 ni i =1 N

Replacing the expression ( ) by V0 , we write,


var y st

k
Wi 2 Si2 k Wi Si2 k
Wi 2 Si2 k Wi Si2
V0 =  ni

N
 = V0 =  −  ( putting , n i = n w i )
i =1 i =1 i =1 n wi i =1 N
k
Wi 2 Si2
k
Wi 2 Si2 k
Wi Si2
 wi
 
1 i =1
= = V0 + = n =
n wi N k
Wi Si2
i =1 i =1
V0 +  N
i =1
k
N2 S2 k
Ni2 Si2
 Ni 2 wi  wi
= n = i =1 i
= i =1
( for estimating the mean )
k
Ni Si2 k
V0 +  N2
N V0 +
2
 Ni Si2
i =1 i =1

If an estimate of the population proportion is desired, the required sample size may be
obtained by using the formula:
k Ni2 Pi (1 − Pi )
 wi
i =1
n= k
N 2 V0 +  Ni Pi (1 − Pi )
i =1

Here, Pi is the population proportion in the ith stratum and V0 = var ( pst ) . All the equations
above involve unknown population characteristics and are thus of no practical use as it
stands. However, it can provide a rough indication of the required sample size if the estimates
of Si or Pi are available from a pilot or similar studies.

Another problematic aspect of the formula is that the weight of the fraction of the sample size
in a particular stratum wi must be decided beforehand.

❖ Allocating sample size to strata:


Once we specify the total number n of observations to be included in the sample, the next
important task is to decide on the number of observations to be taken from each individual
stratum. This is known as the problem of allocation. The variability of the observations
within the strata is an important consideration in the allocation of samples sizes: the more
homogeneous the strata are made, the greater will be the precision of the stratified sample.
The allocation of the samples to different strata is governed by the considerations of three
factors, which are as follows:
1) Total number of units in each stratum ( Ni ) .
2) Variability of observations within the stratum ( Si2 ) .
3) Cost of taking observations per sampling unit in each stratum.

7
04 Stratified sampling

There are usually five methods of allocation of sample size to different strata in a stratified
sampling procedure. These are:
1) Arbitrary allocation
2) Equal allocation
3) Proportional allocation
4) Neyman allocation or minimum variance allocation or optimum allocation
with fixed sample size n .
5) Cost optimum allocation.

❖ Arbitrary allocation:
In this allocation, the choice of the sample size to different strata depends entirely on the
convenience of the sampler. This is thus a purposive allocation. The only restriction is that
sum of the sample sizes in different strata will add to n , the total sample size. That is,
k
 ni = n .
i =1

❖ Equal allocation:
In this allocation, the same number of elements is drawn from each stratum. In other words,
for the stratum i , the sample size is given by:
n
ni =
k

Here, ni is the size of the sample to be drawn from the ith stratum, n is the overall sample
size and k is the number of strata in the population. Thus, for example, if a population
consists of k = 4 strata and a sample of n = 500 is to be allocated to these strata, then the
n 500
principle of equal allocation asserts that ni = = = 125 .
k 4

This implies that 125 units will be drawn from each of the four strata. The equal allocation
approach is of considerable practical interest for reasons of administrative convenience or
ease in fieldwork.

❖ Proportional allocation:
A proportional allocation is the most widely used method of allocation. It is achieved if the
sampling fractions are same for every stratum. Under this design, the size of the sample in a
stratum is proportional to the size of the population in the stratum.

Symbolically, if ni represents the sample size and N i the population size in the ith stratum,
ni
then the sampling fraction is specified to be the same for each stratum. In other words,
Ni
the number of elements ni taken from each stratum is given by:
n
ni =   Ni
N

This implies that proportionally allocated stratified sampling gives each sampling unit the
same probability of selection in the entire population. Thus, if N = 300 and n = 30 , then
n 30
= = 0.10 . That is, 10% of the observations from each stratum is to be included in the
N 300
sample.

8
04 Stratified sampling

❖ Example: (4.14)
A population with 300 elements is divided into 4 strata. The numbers of units in these strata
are 50, 120, 70 and 60, respectively. A stratified sample of 30 is to be selected. Use
proportional allocation technique to allocate sample size to different strata.

Solution:
n 30
Here, N = 300 , N1 = 50 , N 2 = 120 , N3 = 70 , N 4 = 60 , n = 30 and = = 0.10 . Using the
N 300
proportional allocation technique, the sample sizes to different strata are:

1 1 1 1


n1 =   50 = 5 , n2 =   120 = 12 , n3 =   70 = 7 and n4 =   60 = 6
 
10  
10  
10  10 

n 30
Note that all the four strata have a uniform sampling fraction = = 0.10 that equals the
N 300
overall sampling fraction.

How does the variance under proportional allocation compare with the variance of
simple random sample of the same size?

We attempt to make this comparison by establishing a relationship between the variance of


y st under proportional allocation with that of the variance of y under simple random sample.
Recall that the variance of the stratified sample mean is:

 N − ni   i2
( )
k
var y st = Wi2  i 
i =1  Ni − 1  ni

Ni
Here, Wi = . But, under the proportional sample,
N

n N 
n i =   Ni = n i = n  i  = n i = n Wi
N N 

Putting this value in the above equation and replacing ( )


var y st by ( )P , we have that
var y st

   
( )P = Wi2  NiN−i −n 1Wi  nWi i = 1n Wi  NiN−i n− 1Wi  i2
k 2 k
var y st
i =1 i =1
1 k  N −nW  1 k
 n  for l arg e N 
= Wi  i N i  i2 = n Wi 1 − N  i2  i 

n i =1  i  i =1
1− f k
=
n
Wi  i2
i =1

Now, recall that the variance of the mean of the simple random sample is given by:

 N − n   N − n  1− f
2 2

var ( y ) =   =  = 
2
( for l arg e N )
 N −1  n  N  n  n 

We see that the above two equations differ only with respect to the last term. We shall later
show that the variance of the mean of the simple random sample is larger than that of the
stratified sample under proportional allocation. That is,

9
04 Stratified sampling

var ( y )  var y st( )P


The values of var ( y ) and ( )P
var y st may be used to show the gain in precision from the use of
proportionate stratification. Simple random sample usually serves as useful basis for this
comparison.

A commonly used measure for this comparison is the design effect, which is the ratio of the
variance of the estimator for a complex design to the variance of the estimator of a simple
random sample of the same size. For example, suppose that var ( y st )P = 0.29 and var ( y ) = 1.70 .
Then the design effect is given by:
Deff =
( )P = 0.29 = 0.17
var y st
var ( y ) 1.70

The variance of the proportionate stratified sample is thus 83% smaller than that of the same
sized simple random sample. It implies that a simple random sample of 3 = 18 is needed to
0.17
give the same precision as proportionate stratified sample of size n = 3.

The above results lead to the conclusion that for large N , the stratified estimator under
proportional allocation has no greater (and in practice smaller) variance than the simple
random sample of the same size.

Put simply, proportional stratified sampling is better than simple random sampling. Similar
conclusion can be drawn for the variance of the estimator of a population proportion based on
a simple random sample. For these reasons, proportional stratified sampling should be
preferred to simple random sample in all cases where feasible.

How does the estimator y st compare with y when the sampling is done under
proportional allocation?

We show bellow that when the stratified sample is proportional, the estimator y st is equal to
the ordinary sample mean y of all the observations. Further, the p st is equal to the ordinary
proportion of all the sample observations belonging to the category C . Let y ij be the j th
observation in the sample from the i th stratum. Since, the sampling is made proportional,

n N 
n i =   Ni = n i = n  i  = n i = n Wi
N N 
Thus,
k k ni k ni k ni
( y st )P = Wi y i = Wi n1  y ij =  Wi n 1W  y ij = n1  y ij = y
i =1 i =1 i j =1 i =1 i j =1 i =1 j =1

For proportion, we have that


k k k k k
( p st )P = Wi p i =   NNi  p i =   nni  p i =   n nWi  p i = Wi p i = p
i =1 i =1 i =1 i =1 i =1

The result is important in the sense that, instead of calculating weighted averages or
proportions, it may be more convenient to pool all the sample observations and compute the
average or proportions instead.
10
04 Stratified sampling

This approach is more popular than any other stratified sampling procedure. Proportionate
sampling will generally have higher statistical efficiency than will a simple random sample.
This method is also much easier to carry out than other stratified methods.

A third advantage is that such a sampling procedure provides a self-weighting sample; the
population mean can be estimated simply by calculating the mean of all sample cases. On the
other hand, proportionate samples will often gain little precision in statistical efficiency if the
strata means and variances are similar for the major variables under study.

Note that in proportionate sampling, each stratum is properly represented so that the sample
drawn from it is proportionate to the stratum share of the total population.

❖ Neyman allocation or minimum variance allocation:


In this allocation procedure, it is assumed that the sampling cost per unit of each stratum is
same and the total sample size is fixed. So, this is also known as optimum allocation with
fixed sample size n . In this case, the sampling fraction in each stratum is made proportional
to the standard deviation of that stratum: the more homogeneous the stratum, the smaller its
standard deviation, the smaller will be its proportion being included in the sample. That is, for
k
fixed  ni = n , the minimum var( y st ) occurs if the k values of f i are so chosen that
i =1

f i  Si = n i  N i Si = n i = A N i Si

k k
n
=  n i = A  Ni Si = A = k
i =1 i =1
 N i Si
i =1

N i Si ni N i Si 1 Si
= n i = n k
= f i = =n k
=n k
Ni Ni
 N i Si  N i Si  N i Si
i =1 i =1 i =1

In other words, for a given size of the sample, the above allocation yields the estimate of the
mean with the maximum precision. In application of this formula, the problem is to have an
estimate of Si , which is usually unknown. In most cases, rough estimates based on previous
investigation or small-scale study may be employed.

If the objective is to minimize the variance of the stratified estimator p st of the proportion of
elements in the population belonging to a given category subject to the same total sample
constraints, the optimal solution can be shown to be:
Ni Pi (1 − Pi )
ni = n k
 Ni Pi (1 − Pi )
i =1
❖ Theorem: (4.6):
The variance of stratified random sample var( y st ) is the minimum for a fixed total size of the
sample n if ni  N i S i .

Proof:
Our problem here is to see how a given total sample size n should be allocated among
different strata so that the stratified estimator y st of the population mean will have the

11
04 Stratified sampling

smallest possible variance. Formally, the problem is to determine n1 , n2 ,  , nk so as to


minimize:
k  n   Si2 
var ( yst ) =  Wi 2 1 − i   
 Ni   ni 
i =1 
k
subject to the constraint that the total sample size equals n =  n i . This is equivalent to
i =1
minimizing the function:
 k  k  n   Si2   k 
 = var ( yst ) +    n i − n  =  Wi 2 1 − i    +   ni − n
 Ni   ni 
 i =1  i =1   i =1 

for ni ,  being an unknown Lagrange’s multiplier. Our calculus knowledge dictates that for
a minimum of the function, we should have
 2 
=0 and 0
 ni  n i2
Now differentiating the above function with respect to ni and equating the derivative to zero,
we have that
   Wi 2 Si2  Wi Si
= − − 0 + = 0 = ni = (1)
 ni  n 2  
 i 

2  2 Wi 2 Si2
= 0
 n i2 n i3

Now from the above we have


k k

Wi Si k  Wi Si  Wi Si
ni = =  n i = i =1
= n = i =1
( 2)
 i =1  

Now, dividing equation (1) by ( 2 ) , we have that

ni WS Wi Si Ni Si
= k i i = ni = n k
=n k
= ni  Ni Si
n
 Wi Si  Wi Si  Ni Si
i =1 i =1 i =1

Thus, the theorem is proved. The above result implies that the allocation made in accordance
with it yields the estimate of the mean with the greatest precision. A formula for minimum
variance with fixed n is obtained by substituting the value of n i into the general formula for
( ) , which is given by:
var y st

Wi2 Si2 k Wi Si2


( )
k
var y st =  −
i =1 ni i =1 N
Wi2 Si2 Wi Si2
( ) Neyman = var ( y st )min = 
k k
= var y st −
WS
i =1 n k i i i =1 N

Wi Si
i =1
2 2
 k   k 
 Wi Si  k 2
  Ni Si  k 2
=  i =1  − Wi Si =
  i =1
2
 − N i Si
 2
n i =1 N nN i =1 N

12
04 Stratified sampling

❖ Theorem: (4.7):
A proportional stratified sample is optimal (that is, the best possible stratified sample) if all
the Si are the same.
Proof:
We have proved that for optimum ni ,
Ni Si
ni = n k
 Ni Si
i =1

If the strata do not vary with respect to the variance, that is, S1 = S2 = . . . = Sk = S , then the
above expression assumes the form:
 n 
Ni S
ni = n =   Ni
k
S  Ni  
N
i =1

The above is the formula for the sample size when sampling is made under proportional
allocation. Thus, the theorem is proved.

❖ Theorem: (4.8):
The sample size required to estimate the mean under Neyman allocation with a given
variance V0 is given by:
2
 k 
  Wi Si 
 i =1 
n= k
1
V0 +  Wi S i2
N i =1

And that under proportional allocation, it is given by:


k
 Wi S i2  Ni 
i =1
n=  Here, Wi = N 
1 k
 
V0 +
N
 Wi S i2
i =1
Proof:
In Neyman allocation with fixed n ,

Ni Si ni Ni Si
ni = n k
= = wi = k
n
 Ni Si  Ni Si
i =1 i =1
Now, we know that
k N i2 S i2
 N i Si
i =1
k N2 S i2 k k N i2 S i2 k
  N i Si   N i Si
i

i =1 wi i =1 i =1 N i Si i =1
n= k
= k
= k
N 2 V0 +  Ni S i2 N 2 V0 +  N i S i2 N 2 V0 +  N i S i2
i =1 i =1 i =1

13
04 Stratified sampling

2 2
 k   k 
  N i Si    Wi Si 
 i =1   i =1 
= n = k
=
1 k
N 2 V0 +  N i S i2 V0 +  Wi S i2
i =1 N i =1

To prove the second part, we note that in proportional allocation

N  ni Ni
ni = n  i  = = wi =
 N  n N

So, we have from the above that


k N i2 S i2
k N2 S i2  k k
 Ni
N  N i S i2  W i S i2
i
i =1
i =1 wi N i =1 i =1
n= k
= k
= k
= k
1
N 2 V0 +  Ni S i2 N 2 V0 +  Ni S i2 N 2 V0 +  N i S i2 V0 + Wi S i2
i =1 i =1 i =1 N i =1

❖ Cost-optimum allocation:
In this allocation procedure, it is assumed that the sampling cost per unit of each stratum is
different and the total sampling cost (cost of conducting the survey) is fixed. So, this is also
known as optimum allocation with fixed sampling cost.

In this case, the sampling fraction in each stratum is made proportional to the standard
deviation of that stratum (the more homogeneous the stratum, the smaller its standard
deviation, the smaller will be its proportion being included in the sample) and inversely
proportional to the cost c i , where, c i is the sampling cost per unit in the i th stratum.
k
That is, for fixed sampling cost, C = c 0 +  n i c i , where, c 0 is the fixed overhead cost of
i =1

taking a sample, C is the given budget and c i represents the average variable cost per unit in
the i stratum, the minimum var( y st ) occurs if the k values of f i are so chosen that
th

Si N i Si N i Si
fi  = n i  = n i = A
ci ci ci
k k
N i Si n
=  n i = A  = A = k
N i Si

i =1 i =1 ci
i =1 ci

N i Si N i Si Si
ci ni ci 1 ci
= n i = n k
= f i = =n k
=n k
N i Si Ni N
N i Si i N i Si
 ci
 ci
 ci
i =1 i =1 i =1

In other words, for a given size of the sample, the above allocation yields the estimate of the
mean with the maximum precision. In application of this formula, the problem is to have an

14
04 Stratified sampling

estimate of Si , which is usually unknown. In most cases, rough estimates based on previous
investigation or small-scale study may be employed.

If the objective is to minimize the variance of the stratified estimator p st of the proportion of
elements in the population belonging to a given category subject to the same total sample
constraints, the optimal solution can be shown to be:

Pi (1 − Pi )
Ni
ci
ni = n
k Pi (1 − Pi )
 Ni ci
i =1

❖ Theorem: (4.9):
The variance of stratified random sample var( y st ) is the minimum for a fixed sampling cost,
k
N i Si
C = c0 +  n i c i if ni 
ci
.
i =1

Proof:
Here, our problem is to find n i ( i = 1, 2 , . . . , k ) so as to minimize var ( yst ) subject to the
constraints that the total sampling cost does not exceed a given budget C . That is,
k
 n i c i = C − c 0 . This is equivalent to minimizing the function:
i =1

 k  k  n   Si2   k 
 = var ( yst ) +    n i c i − C + c 0  =  Wi 2 1 − i    +   nici − C + c0 
   ni 
 i =1  i =1 N i   i =1 

for ni ,  being an unknown Lagrange’s multiplier. Our calculus knowledge dictates that for
a minimum of the function, we should have
 2 
=0 and 0
 ni  n i2
Now differentiating the above function with respect to ni and equating the derivative to zero,
we have that
   Wi 2 Si2  Wi Si
= − −0  +  ci = 0 = ni = (1)
 ni  n i2   ci
 
 2  2 Wi 2 Si2
= = 0
 n i2 n i3

Now from the above we have

Wi Si k k Wi Si k Wi Si
ni = =  n i =  = n =  ( 2)
 ci i =1 i =1  ci i =1  ci

Now, dividing equation (1) by ( 2 ) , we have that

15
04 Stratified sampling

Wi Si Wi Si N i Si
ni ci ci ci N i Si
= k
= ni = n k
=n k
= ni 
n Wi Si Wi Si N i Si
  
ci
i =1 ci i =1 ci i =1 ci
N i Si Si
ni ci 1 ci
= f i = =n k
=n k
Ni N i Si N i N i Si
 ci
 ci
i =1 i =1

❖ Theorem: (4.10):
Show that the total sample size for a stratified sample optimally allocated in terms of cost
subject to fixed variance V0 is given by:
 k   k Wi Si 
  Wi Si c i  
  

 i =1   i =1 c i 
n=
1 k
V0 +
N i =1
Wi S i2 
Proof:
Here, we have assumed that var ( y st ) is fixed to be a specified value V0 . Thus, we have that
k
 n  S2 k
Wi 2 Si2 k Wi Si2 k
Wi 2 Si2 k
Wi Si2
V0 = var ( yst ) = Wi2 1 − Nii  nii =  −  =  = V0 + 
i =1   i =1 ni i =1 N i =1 ni i =1 N
k  

Wi Si Wi Si
 
k
i =1 ci k
Wi Si2  ci 
= Wi2 Si2 Wi Si
= V0 +  N
 ni = n k 

i =1 n i =1  Wi Si 
ci  ci 
 i =1 
k k  k WS  k  k NS 
    
Wi Si
Wi Si c i  i i  N i Si c i  i i 
k ci Wi Si2
k  ci   ci 
i =1 i =1  i =1  = i =1  i =1 
= Wi Si ci
n
= V0 +
N
 = n = k
Wi Si2 k
i =1 i =1
V0 +
N
 N 2V0 +  Ni Si2
i =1 i =1

A formula for computing n for proportion is obtained as follows:

 k   k P (1 − Pi ) 
  Ni Pi (1 − Pi ) c i    Ni i 
 i =1   i =1 ci 
 
n= k
N 2 V0 +  Ni Pi 1 − Pi ( )
i =1
❖ Theorem: (4.11):
k
For the fixed cost function, C = c 0 +  c i n i , the minimum sample size required for estimating
i =1
the population mean is given by:
k
(C − c 0 ) 
N i Si
i =1 ci
n= k
 N i Si ci
i =1

16
04 Stratified sampling

And the optimum allocation for fixed cost reduces to the optimum allocation for fixed sample
size, if the cost per unit is the same in all strata. If further, S i remains unchanged in all strata,
this allocation leads to proportional allocation.

Proof:
We know that when the total sample size has been decided, the cost optimum solution for n i
is given by:
N i Si
ci
ni = n k N i Si
 ci
i =1
k
With n i defined in the above, the cost function, C = c 0 +  c i n i , takes the form
i =1
k
(C − c 0 ) 
N i Si N i Si
k ci k n N i Si ci i =1 ci
C = c0 + ci n k
= C − c 0 =  k
= n = k
N i Si N i Si
i =1
 ci
i =1
 ci
 N i Si ci
i =1 i =1 i =1

Now, if the cost per unit is the same in all strata, that is, c i = c , then we have that

Ni Si Ni Si
ci c Ni Si
ni = n k
=n k
=n k
Ni Si Ni Si
 ci
 c
 Ni Si
i =1 i =1 i =1

The above is the sample size under optimum allocation when the total sample size is fixed
(Neyman allocation). If further, S i remains unchanged in all strata, that is, S i = S , then we
have that
N i Si Ni n
ni = n =n =   Ni
k k
N
 N i Si N
i =1 i =1

The above is the sample size in different stratum under proportional allocation.

❖ Precision of stratified sampling:


To make a comparative study of the precision of stratified sampling, let us begin with the
study of variance obtained under stratified sampling and simple random sampling. Two
variants of the stratified sampling such as proportional allocation and Neyman allocation are
considered in this comparison.

Let the variances of the estimated means under simple random sampling, proportional
allocation and Neyman allocation be denoted by: VR , VP and VN , respectively. In all cases,
the finite population correction is assumed to be negligible.

ni n
Now, if the terms in and are ignored, then it can be proved that VR  VP  VN , which is
Ni N
verified as follows:

17
04 Stratified sampling

1− f  2 S
2
 n 
VR =   S =  if is ignored 
 n  n  N 

k  n  S2 k W 2S2  ni 
VP =  Wi 2 1 − i  i =  i i  if is ignored 
i =1  Ni  ni i =1 ni  Ni 
 under proportional allocation 
kWi Si2  
=  n ni n 
n  n = N = =
i =1
Ni N 
i i
 N
k  n  S2 k W 2S2  ni 
VN =  Wi 2 1 − i  i =  i i  if is ignored 
i =1  Ni  ni i =1 ni  Ni 
2
 k   under Neyman allocation 
2   Wi Si   
k Wi Si2  i =1   WS 
= = ni = n k i i
Wi Si n  
i =1 n k   Wi Si 
 
 Wi Si i =1

i =1

So, we have from the above that


2
 k 
  Wi Si 
S2 k Wi Si2  i =1 
VR = VP =  VN =
n i =1 n n
Now, by the definition
N Ni
1 k i k
  y ij − Y ( ) = ( N − 1) S 2 =   y ij − Y ( )
2 2
S2 =
N − 1 i =1 j =1 i =1 j =1
k Ni
( )
2
= ( N − 1) S 2 =    y ij − Yi + (Yi − Y ) 
 
i =1 j =1
k Ni k Ni k Ni
=   y ij − Yi ( ) +   (Yi − Y ) 2 + 2  ( y ij − Yi ) (Yi − Y )
2

i =1 j =1 i =1 j =1 i =1 j =1
k k k Ni
=  ( Ni − 1) Si 2 +  Ni (Yi − Y ) + 2 (Yi − Y )  y ij − Yi
2
( )
i =1 i =1 i =1 j =1
k k
=  ( Ni − 1) Si 2 +  Ni (Yi − Y )
2

i =1 i =1

 1 k  1  2 k
= N 1 −  S 2 =  Ni 1 −  Si +  Ni (Yi − Y )
2

 N i =1  Ni  i =1
k k  
= N S 2 =  Ni Si 2 +  Ni (Yi − Y )  Sin ce ,
2 1 1
and are negligible 
i =1 i =1  Ni N 
k k
 Wi Si 2  Ni (Yi − Y )
2

( Dividing both sides by Nn )


2
S i =1 i =1
= = +
n n Nn

= VR = VP + positive quantity = VR  VP

Now, the difference between VP and VN is given by:

18
04 Stratified sampling

 k 
2   k  
2

k W S2
  i iW S  
1 k
  i i 
N S 
 i =1   i =1 
VP − VN =  i i
− =   N i Si −
2

i =1 n n N n  i =1 N 
 
 
k  k 
N i ( Si − S )
1 1
  Here , S =  Ni Si 
2
=
N n i =1  N i =1 

= VP − VN = positive quantity = VP  VN = VR  VP  VN

❖ Advantage of stratified sampling:


1) Stratification tends to decrease the variances of the sample estimates. This
results in smaller bound on the error of estimation. This is particularly true if
measurements within strata are homogeneous.
2) The cost per observation in the survey may be reduced by stratification.
3) When separate estimates for population parameters for each sub-population
within an overall population are required, stratification is rewarding.
4) Stratification makes it possible to use different sampling designs in different
strata.
5) Stratification is particularly more effective when there are extreme values in
the population, which can be segregated into separate strata, thereby reducing
the variability within strata.
6) Stratified sampling is most effective in handling heterogeneous population
such as data on wages of industrial workers (which varies from industry to
industry), amount of rain fall (which differs among various geographical
areas) and the like.
7) Stratification provides a chance to improve sampling design considerably if
the strata could be formed on the basis of natural characteristics.
8) In stratified sampling, confidence intervals may be constructed individually
for the parameter of interest in each stratum. This is an added advantage over
other methods of sampling.
9) The estimates in various strata may be made with whatever precision is
desired simply by adjusting the sample size selected from each stratum.

❖ Disadvantage of stratified sampling:


The major disadvantage of stratified sampling is that it may take more time to select the
sample than would be the case for simple random sampling. More time is involved because
complete frames are necessary within each of the strata and each stratum must be sampled.

19

You might also like