You are on page 1of 20

Introduction to Sampling Theory

Lecture 34
Cluster Sampling

Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Slides can be downloaded from


http://home.iitk.ac.in/~shalab/sp
1
Estimation of a Proportion in case of Equal Cluster:
Now, we consider the problem of estimation of the proportion of

units in the population having a specified attribute on the basis of a

sample of clusters. Let this proportion be P.

Suppose that a sample of n clusters is drawn from N clusters by

SRSWOR.

2
Estimation of a Proportion in case of Equal Cluster:
Defining P : Population proportion, Q = 1 ‐ P
Pi : Proportion of elements in the ith cluster, belonging to the
specified category and Qi  1  Pi , i  1, 2,..., N

yij = 1 if the jth unit in the ith cluster belongs to the specified category

(i.e. possessing the given attribute), and

yij = 0 otherwise, we find that


yi  Pi ,
1 N
Y 
N
 P  P,
i 1
i

3
Sampling for Proportions and Percentages (SRS)
Suppose a sample of size n is drawn from a population of size N by
simple random sampling.
Let
a : Number of units in the sample which falls into class C and
(n – a) : Number of units falls in class C*.

𝒂
Sample proportion of units in Class C is p=
𝒏

𝒏 𝒂 𝒂
Sample proportion of units in Class C* q = =1‐ = 1‐ p
𝒏 𝒏

4
Sampling for Proportions and Percentages (SRS)
Define and associate an indicator variable Y with the characteristic
under study and then for i = 1,2,..,N
1 i th unit belongs to C
Yi  
0 i th unit belongs to C *.
N
Population total: YTOTAL  Y  A
i 1
i

A Y i
Population mean: Y   P i 1

N N
n

y i
a
Sample mean: y  i 1
  p.
n n 5
Sampling for Proportions and Percentages (SRS)
Define ∑𝑵 𝒀𝟐
𝒊 𝟏 𝒊 𝑨 𝑵𝑷, so we can write S2 in terms of P and Q as
follows:

1 N
S 
2

N  1 i 1
(Yi  Y ) 2

1 N
 ( Yi 2  NY 2 )
N  1 i 1
1
 ( NP  NP 2 )
N 1
N
 PQ.
N 1

6
Sampling for Proportions and Percentages (SRS)
Define ∑𝒏𝒊 𝟏 𝒚𝟐𝒊 𝒂 𝒏𝒑, so we can write s2 in terms of p and q as
follows:

1 n
s 
2
 i
n  1 i 1
( y  y ) 2

1 n
 ( yi2  ny 2 )
n  1 i 1
1
 (np  np 2 )
n 1
n
 pq.
n 1

7
Estimation of a Proportion in case of Equal Cluster:
MPQ
Si2  i i
,
( M  1)
N
M  PQ
i i
S w2  i 1
,
N ( M  1)
NMPQ
S2  ,
NM  1)
1 N
S 
2
b 
N  1 i 1
( Pi  P ) 2 ,

1 N 2 2
  i
N  1  i 1
P  NP 

1  N N
2
   i i 
( N  1)  i 1
P (1  P ) 
i 1
Pi  NP 

1  N

 
( N  1) 
NPQ  
i 1
i i .
PQ
 8
Estimation of a Proportion in case of Equal Cluster:
Then, using the result that ycl is an unbiased estimator of Y ,
we find that
1 n
Pˆcl   Pi
n i 1
is an unbiased estimator of P and
 N

 N  n 


NPQ   PQ
i i
.
Var ( Pˆcl )   
i 1

 Nn  ( N  1)

This variance of Pˆcl can be expressed as

ˆ N  n PQ
Var ( Pcl )  [1  ( M  1)  ],
N  1 nM

9
Estimation of a Proportion in case of Equal Cluster:
where the value of  can be obtained as
M ( N  1) Sb2  NSw2
 .
( MN  1)

by substituting Sb , S w and S 2 in , we obtain


2 2

M 1  PQ i i
  1 i 1
.
( M  1) N PQ

The variance of Pˆcl can be estimated unbiasedly by


 ( Pˆ )  N  n s 2
Var cl b
nN
N n 1 n
 
nN (n  1) i 1
( Pi  Pˆc ) 2

N n  ˆ ˆ n

where Qˆ cl  I  Pˆcl . 
Nn( n  1) 
nP Q
cl cl  
i 1
PQ
i i
 10
Estimation of a Proportion in case of Equal Cluster:
The efficiency of cluster sampling relative to SRSWOR is given by

M ( N  1) 1 ( N  1) NPQ
E  .
( MN  1) 1  ( M  1)   NM  1  N



NPQ  
i 1
i i 
PQ

1
If N is large, then E  .
M

11
Estimation of a Proportion in case of Equal Cluster:
An estimator of the total number of elements belonging to a

specified category is obtained by multiplying Pˆcl by NM, i.e. by

NMPˆcl .

The expressions of variance and its estimator are obtained by

multiplying the corresponding expressions for Pˆcl by N 2 M 2 .

12
Case of Unequal Cluster:
In practice, the equal size of clusters are available only when
planned.
For example, in a screw manufacturing company, the packets of
screws can be prepared such that every packet contains the same
number of screws.

In real applications, it is hard to get clusters of equal size.


For example,
• the villages with equal areas are difficult to find, the districts
with the same number of persons are difficult to find,
• the number of members in a household may not be the same in
each household in a given area. 13
Cluster Sampling:

Population

Cluster 1 Cluster 2 Cluster N Population


M1 Units M2 Units … MN Units N Clusters

Cluster n
Cluster 1
M1 Units
Cluster 2
M2 Units
… Mn Units
Sample
n Clusters

14
Case of Unequal Cluster:
Let there be N clusters and Mi be the size of ith cluster, let
N
M0   Mi
i 1

1 N
M
N
M
i 1
i

1 Mi
yi 
Mi
 ij
y
j 1
: mean of i th
cluster

1 N Mi N
Mi 1 N
Mi
Y 
M0

i 1 j 1
yij  
i 1 M
yi 
N

i 1 M 0
yi .
0

15
Case of Unequal Cluster:
Suppose that n clusters are selected with SRSWOR and all the
elements in these selected clusters are surveyed.

Assume that M i ’s (i = 1,2,…,N) are known.

16
Case of Unequal Cluster:
Based on this scheme, several estimators can be obtained to
estimate the population mean. We consider four type of such
estimators.

17
1. Mean of Cluster Means
Consider the simple arithmetic mean of the cluster means as

1 n
yc   yi
n i 1
1 N
E  yc   y i
N i 1

N
Mi
 Y ( where Y   yi ).
i 1 M0

18
1. Mean of Cluster Means: Bias
The bias of yc is
Bias  yc   E  yc   Y
1 N N
 Mi 

N
i 1
y i     yi
i 1  M 0 

1 N M0 N 
=  
M 0  i 1
M i yi   yi 
N i 1 
  N  N 
1
N   M i    yi  
   M i yi   i 1   i 1  
M0  i 1 N 
 
 
1 N
 N 1 
  ( M i  M )( yi  Y )     S my
M0 i 1  M0 

Bias  yc   0 if M i and yi are uncorrelated.


19
1. Mean of Cluster Means: MSE and Estimate of Variance
The mean squared error is
MSE  yc   Var  yc    Bias  yc  
2

2
N  n 2  N 1  2
 Sb    Smy
Nn  M0 
where
1 N
S 
2
b  i
N  1 i 1
( y  Y ) 2

1 N
S my  
N  1 i 1
( M i  M )( yi  Y ).

N n 2
An estimate of Var  yc  is Var  yc  
 sb
Nn
1 n
  yc  yc  .
2
where s 
2

n  1 i 1
b
20

You might also like