Ratio and Product Methods of Estimation: Y X Y X Y X

Chapter 5
Ratio and Product Methods of Estimation
An important objective in any statistical estimation procedure is to obtain the estimators of parameters of
interest with more precision. It is also well understood that incorporation of more information in the
estimation procedure yields better estimators, provided the information is valid and proper. Use of such
auxiliary information is made through the ratio method of estimation to obtain an improved estimator of
population mean. In ratio method of estimation, auxiliary information on a variable is available which is
linearly related to the variable under study and is utilized to estimate the population mean.
Let Y be the variable under study and X be any auxiliary variable which is correlated with Y . The
observations xi on X and yi on Y are obtained for each sampling unit. The population mean X of X
(or equivalently the population total X tot ) must be known. For example, xi ' s may be the values of
yi ' s from
- some earlier completed census,
- some earlier surveys,
- some characteristic on which it is easy to obtain information etc.
For example, if yi is the quantity of fruits produced in the ith plot, then xi can be the area of ith plot or
the production of fruit in the same plot in previous year.
Let ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) be the random sample of size n on paired variable (X, Y) drawn,
preferably by SRSWOR, from a population of size N. The ratio estimate of population mean Y is
y
YˆR =
= ˆ
X RX
x
N
assuming the population mean X is known. The ratio estimator of population total Ytot = ∑ Yi is
i =1
y
YˆR (tot ) = tot X tot
xtot
N n
where X tot = ∑ X i is the population total of X which is assumed to be known, ytot = ∑ yi and
i =1 i =1
n
xtot = ∑ xi are the sample totals of Y and X respectively. The YˆR (tot ) can be equivalently expressed as
i =1
Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 1
y
YˆR (tot ) = X tot
x
= RX
ˆ .
tot
Ytot
Looking at the structure of ratio estimators, note that the ratio method estimates the relative change
X tot
yi
that occurred after ( xi , yi ) were observed. It is clear that if the variation among the values of and
xi
ytot y
is nearly same for all i = 1,2,...,n then values of (or equivalently ) vary little from sample to
xtot x
sample and the ratio estimate will be of high precision.
Bias and mean squared error of ratio estimator:

Assume that the random sample ( xi , yi ), i = 1, 2,..., n is drawn by SRSWOR and population mean X is
known. Then
N
 
n
1 yi
E (YˆR ) =
N
∑x
i =1
X
i
 
n
≠ Y (in general).
y  y2 
Moreover, it is difficult to find the exact expression for E   and E  2  . So we approximate them
x x 
and proceed as follows:
Let
y −Y
ε0 = ⇒ y = (1 + ε o )Y
Y
x−X
ε1 = ⇒ x = (1 + ε1 ) X .
X
Since SRSWOR is being followed , so
E (ε 0 ) = 0
E (ε1 ) = 0
1
E (ε 02 )
= 2
E ( y − Y )2
Y
1 N −n 2
= 2 SY
Y Nn
f SY2
=
n Y2
f
= CY2
n
N −n 2 1 N S
where f =
N
, SY = ∑
N − 1 i =1
(Yi − Y ) 2 and CY = Y is the coefficient of variation related to Y.
Y
Similarly,
f 2
E (ε12 ) =
CX
n
1
E (ε 0ε=
1) E[( x − X )( y − Y )]
XY
1 N −n 1 N
=
XY Nn N − 1 i =1
∑ ( X i − X )(Yi − Y )
1 f
= . S XY
XY n
1 f
= ρ S X SY
XY n
f S S
= ρ X Y
n X Y
f
= ρ C X CY
n
SX
where C X = is the coefficient of variation related to X and ρ is the population correlation coefficient
X
between X and Y.
Writing YˆR in terms of ε ' s, we get
y
YˆR = X
x
(1 + ε 0 )Y
= X
(1 + ε1 ) X
(1 ε 0 )(1 + ε1 ) −1Y
=+
Assuming ε1 < 1, the term (1 + ε1 ) −1 may be expanded as an infinite series and it would be convergent.
x−X
Such assumption means that < 1, i.e., possible estimate x of population mean X lies between 0
X
and 2 X , This is likely to hold true if the variation in x is not large. In order to ensures that variation in
x is small, assume that the sample size n is fairly large. With this assumption,
YˆR = Y (1 + ε 0 )(1 − ε1 + ε12 − ...)

= Y (1 + ε 0 − ε1 + ε12 − ε1ε 0 + ...).
So the estimation error of YˆR is
YˆR − Y= Y (ε 0 − ε1 + ε12 − ε1ε 0 + ...).
In case, when sample size is large, then ε 0 and ε1 are likely to be small quantities and so the terms
involving second and higher powers of ε 0 and ε1 would be negligibly small. In such a case
YˆR − Y  Y (ε 0 − ε1 )
and
E (YˆR − Y ) =
0.
So the ratio estimator is an unbiased estimator of population mean upto the first order of approximation.
If we assume that only terms of ε 0 and ε1 involving powers more than two are negligibly small (which is
more realistic than assuming that powers more than one are negligibly small), then the estimation error
of YˆR can be approximated as
YˆR − Y  Y (ε 0 − ε1 + ε12 − ε1ε 0 )
Then the bias of YˆR is given by
 f f 
E (YˆR − Y=) Y  0 − 0 + C X2 − ρ C X C y 
 n n 
f
Bias (Yˆ )= E (YˆR − Y )= YC X (C X − ρ CY ).
n
upto the second order of approximation. The bias generally decreases as the sample size grows large.
The bias of YˆR is zero, i.e.,
Bias (YˆR ) = 0
if E (ε12 − ε 0ε1 ) =
0
Var ( x ) Cov( x , y )
or if − =
0
X2 XY
1  X 
or if 2 Var ( x ) − Cov( x , y )  =
0
X  Y 
Cov( x , y )
or if Var ( x ) −
= 0 (assuming X ≠ 0)
R
Y Cov( x , y )
or if =
R =
X Var ( x )
which is satisfied when the regression line of Y on X passes through origin.
Now, to find the mean squared error, consider
YˆR ) E (YˆR − Y ) 2
MSE (=
= E Y 2 (ε 0 − ε1 + ε12 − ε1ε 0 + ...) 2 
 E Y 2 (ε 02 + ε12 − 2ε 0ε1 )  .
Under the assumption ε1 <1 and the terms of ε 0 and ε1 involving powers more than two are negligible
small,
ˆ ) Y 2  f C 2 + f C 2 − 2 f ρC C 
MSE (Y=R  n X n Y n
X Y

2
Y f
= C X2 + CY2 − 2 ρ C X C y 
n 
up to the second order of approximation.
Efficiency of ratio estimator in comparison to SRSWOR

Ratio estimator is better estimate of Y than sample mean based on SRSWOR if
MSE (YˆR ) < VarSRS ( y )

f f
or if Y 2 (C X2 + CY2 − 2 ρ C X CY ) < Y 2 CY2
n n
or if C X − 2 ρ C X CY < 0
2
1 CX
or if ρ > .
2 CY
Thus ratio estimator is more efficient than the sample mean based on SRSWOR if
1 CX
ρ> if R > 0
2 CY
1 CX
and ρ < − if R < 0.
2 CY
It is clear from this expression that the success of ratio estimator depends on how close is the auxiliary
information to the variable under study.
Upper limit of ratio estimator:

Consider
Cov(= ˆ ) − E ( Rˆ ) E ( x )
Rˆ , x ) E ( Rx
y 
= E x  − E ( Rˆ ) E ( x )
x 
= Y − E ( R) X .
ˆ
Thus
Y Cov( Rˆ , x )
E ( Rˆ=
) −
X X
Cov( Rˆ , x )
= R−
X
Bias=
( Rˆ ) E ( Rˆ ) − R
Cov( Rˆ , x )
= −
X
ρ Rˆ , x σ Rˆσ x
= −
X
where ρ Rˆ , x is the correlation between Rˆ and x ; σ Rˆ and σ x are the standard errors of Rˆ and x
respectively.
Thus
− ρ Rˆ , x σ Rˆσ x
Bias ( Rˆ ) =
X
σ Rˆσ x
≤
X
(ρ Rˆ , x
≤1 .)
assuming X > 0. Thus
Bias ( Rˆ ) σx
≤
σ Rˆ X
Bias ( Rˆ )
or ≤ CX
σ Rˆ
where C X is the coefficient of variation of X. If C X < 0.1, then the bias in R̂ may be safely regarded
as negligible in relation to standard error of Rˆ .
Alternative form of MSE (YˆR )

Consider
N N 2
∑ (Y − RX =
i i
i 1 =i 1
) ∑ (Y − Y ) + (Y − RX ) 
2
i i
N 2
= ∑ (Yi − Y ) + R( X i − X )  (Using =
i =1
Y RX )
N N N
=i 1 =i 1
= ∑ (Yi − Y )2 + R 2 ∑ ( X i − X )2 − 2 R∑ ( X i − X )(Yi − Y )
=i 1
N
1
∑ (Yi − RX i )2 =
N − 1 i =1
SY2 + R 2 S X2 − 2 RS XY .
The MSE of YˆR has already been derived which is now expressed again as follows:
fY 2 2
YˆR )
MSE (= (CY + C X2 − 2 ρ C X CY )
n
f 2  SY2 S X2 S 
= Y  2 + 2 − 2 XY 
n Y X XY 
f Y2  2 Y2 2 Y 
=
S + 2 S X − 2 S XY 
2  Y
nY  X X 
= ( SY2 + R 2 S X2 − 2 RS XY )
f
n
N
f
= ∑
n( N − 1) i =1
(Yi − RX i ) 2
N −n N
= ∑
nN ( N − 1) i =1
(Yi − RX i ) 2 .
Estimate of MSE (YˆR )
Let U i = 1, 2,.., N then MSE of YˆR can be expressed as

Yi − RX i , i =
f 1 N
MSE (YˆR )
= ∑
n N − 1 i =1
(U i − U ) 2
f
= SU2
n
1 N
=
where SU2 ∑
N − 1 i =1
(U i − U ) 2 .
Based on this, a natural estimator of MSE (YˆR ) is

 (Yˆ ) = f s 2
MSE R u
n
1 n
=
where su2 ∑
n − 1 i =1
(ui − u ) 2
2
1 n
= ∑ ( yi − y ) − Rˆ ( xi − x ) 
n − 1 i =1 
=+
s 2 Rˆ 2 s 2 − 2 Rs
y
ˆ ,
x xy
y
Rˆ = .
x
Based on the expression
N
f
MSE (YˆR )
= ∑
n( N − 1) i =1
(Yi − RX i ) 2 ,
an estimate of MSE (YˆR ) is

n
 (Yˆ ) f
=
MSE R ∑ ( yi − Rx
n(n − 1) i =1
ˆ )2
i
.
f 2 ˆ2 2
= ( s y + R sx − 2 Rs
ˆ ).
xy
n
Confidence interval of ratio estimator
If the sample is large so that the normal approximation is applicable, then the 100(1- α )% confidence
intervals of Y and R are
ˆ  ˆ ˆ  ˆ 
 YR − Z α Var (YR ), YR + Z α Var (YR ) 
 2 2 
and
ˆ   ( Rˆ ) 
 R − Z α Var ( Rˆ ), Rˆ + Z α Var 
 2 2 
respectively where Z α is the normal derivate to be chosen for given value of confidence coefficient
2
(1 − α ).
If ( x , y ) follows a bivariate normal distributions, then ( y − Rx ) is normally distributed. If SRS is
followed for drawing the sample, then assuming R is known, the statistic
y − Rx
N −n 2
( s y + R 2 sx2 − 2 R sxy )
Nn
is approximately N(0,1).
This can also be used for finding confidence limits, see Cochran (1977, Chapter 6, page 156) for more
details.
Conditions under which the ratio estimate is optimum

The ratio estimate YˆR is the best linear unbiased estimator of Y when
(i) the relationship between yi and xi is linear passing through origin., i.e.
yi β xi + ei ,
=
where ei ' s are independent with E (ei / xi ) = 0 and β is the slope parameter
(ii) this line is proportional to xi , i.e.
Var ( yi=
/ xi ) E=
(ei2 ) Cxi
where C is constant.
n
Proof. Consider the linear estimate of β because βˆ = ∑  i yi where =
yi β xi + ei and  i ‘s are constant.
i =1
Then β̂ is unbiased if Y = β X as E (=
y ) β X + E (ei / xi ).
If n sample values of xi are kept fixed and then in repeated sampling

n
E ( βˆ ) = ∑  i xi β
i =1
n n
and Var ( βˆ )
= 2
i i
=i 1 =i 1
i ∑
=  Var ( y / x ) C ∑  2i xi
n
when ∑  i xi 1.
So E ( βˆ ) β=
=
i =1
Consider the minimization of Var ( yi / xi ) subject to the condition for being the unbiased estimator
n
∑  x = 1 using Lagrangian function. Thus the Lagrangian function with Lagrangian multiplier is
i =1
i i
n
ϕ = Var ( yi / xi ) − 2λ (∑  i xi − 1.)
i =1
n n
=C ∑ 12 xi − 2λ (∑  i xi − 1).
=i 1 =i 1
Now
∂ϕ
=⇒ λ xi , i =
0  i xi = 1, 2,.., n
∂ i
∂ϕ n
0 ⇒ ∑  i xi =
= 1
∂λ i =1
n
Using ∑ x
i =1
i i =1
n
or ∑λx
i =1
i =1
1
or λ = .
nx
Thus
1
i =
nx
n
∑y y i
=
and so βˆ =
i =1
.
nx x
Thus β̂ is not only superior to y but also the best in the class of linear and unbiased estimators.
Alternative approach:
This result can alternatively be derived as follows:
y Y
The ratio estimator Rˆ = is the best linear unbiased estimator of R = if the following two
x X
conditions hold:
(i) For fixed x, E ( y ) = β x, i.e., the line of regression of y on x is a straight line passing
through the origin.
(ii) λ x where λ is constant of proportionality.
For fixed x , Var ( x) ∝ x, i.e., Var ( x) =
=
Proof: Let y (=
y1) , y2 ,..., yn ) ' and x ( x1 , x2 ,..., xn ) ' be two vectors of observations on
y ' s and x ' s. Hence for any fixed x ,

E ( y) = β x
Var ( y ) = Ω = λ diag( x1 , x2 ,..., xn )
where diag( x1 , x2 ,..., xn ) is the diagonal matrix with x1 , x2 ,..., xn as the diagonal elements.
The best linear unbiased estimator of β is obtained by minimizing
S 2 =( y − β x ) ' Ω −1 ( y − β x )
n
( yi − β xi ) 2
=∑ .
i =1 λ xi
Solving
∂S 2
=0
∂β
n
⇒ ∑ ( yi − βˆ xi ) =
0
i =1
y ˆ
or β=
ˆ = R.
x
ˆ = Yˆ is the best
Thus R̂ is the best linear unbiased estimator of R . Consequently, RX R
linear unbiased estimator of Y .
Ratio estimator in stratified sampling
Suppose a population of size N is divided into k strata. The objective is to estimate the population mean
Y using ratio method of estimation.
In such situation, a random sample of size ni is being drawn from the ith strata of size N i on variable
under study Y and auxiliary variable X using SRSWOR.
Let
yij : jth observation on Y from ith strata
xij : jth observation on X from ith strata i =1, 2,…,k; j = 1, 2,..., ni .
An estimator of Y based on the philosophy of stratified sampling can be derived in following two
possible ways:
1. Separate ratio estimator

- Employ first the ratio method of estimation separately in each strata and obtain ratio estimator
YˆRi i = 1, 2,.., k assuming the stratum mean X i to be known.
- Then combine all the estimates using weighted arithmetic mean.
This gives the separate ratio estimator as

k N Yˆ
YRs = ∑
ˆ i Ri
i =1 N
k
= ∑ wiYˆRi
i =1
k
yi
= ∑ wi Xi
i =1 xi
ni
1
where yi =
ni
∑yj =1
ij : sample mean of Y from ith strata
ni
1
xi =
ni
∑xj =1
ij : sample mean of X from ith strata
Ni
1
Xi =
Ni
∑x
j =1
ij : mean of all the X units in ith stratum
No assumption is made that the true ratio remains constant from stratum to stratum. It depends on
information on each X i .
2. Combined ratio estimator:
- Find first the stratum mean of Y ' s and X ' s as
k
yst = ∑ wi yi
i =1
k
xst = ∑ wi xi .
i =1
- Then define the combined ratio estimator as
y
YˆRc = st X
xst
N
where X is the population mean of X based on all the N = ∑ N i units. It does not depend on individual
i =1
stratum units. It does not depend on information on each X i but only on X .
Properties of separate ratio estimator:

k k
Note that there is an analogy between Y = ∑ wiYi and YRs = ∑ wY
i Ri .
i =1 i =1
y
We already have derived the approximate bias of YˆR = X as
x
Yf
E (YˆR ) =
Y+ (Cx2 − ρ C X CY ) .
n
So for YˆRi , we can write
f
E (YˆRi ) =
Yi + Yi i (Cix2 − ρi CiX CiY )
ni
1 Ni 1 Ni
=
=
=
where Yi ∑ ij i N ∑
y
Ni j 1 =
, X xij
i j 1
N i − ni 2 Siy
2
Six2
=fi = , Ciy = , C 2
ix ,
Ni Yi 2 X i2
1 Ni 1 Ni
=N − 1
∑ S=
(2
Y
iy
j 1=
ij − Yi ) 2
, =
S 2
ix
N − 1
∑
j 1
( X ij − X i ) 2 ,
i i
ρi : correlation coefficient between the observation on X and Y in ith stratum

Cix : coefficient of variation of X values in ith sample.
Thus
k
E (YˆRs ) = ∑ wi E (YˆRi )
i =1
k
 f 
= ∑ wi Yi + Yi i (Cix2 − ρi Cix Ciy 
i =1  ni 
k
wY f
= Y + ∑ i i i (Cix2 − ρi Cix Ciy )
i =1 ni
(YˆRs ) E (YˆRs ) − Y
Bias=
k
wiYi fi
= ∑
i =1 ni
Cix (Cix − ρi Ciy )
upto the second order of approximation.

Assuming finite population correction to be approximately 1, ni = n / k and Cix , Ciy and ρi are the same
for all the strata as Cx , C y and ρ respectively, we have
k 2
(YˆRs )
Bias= (Cx − ρ Cx C y ) .
n
Thus the bias is negligible when the sample size within each stratum should be sufficiently large and
YRs is unbiased when Cix = ρ Ciy .
Now we derive the approximate MSE of YˆRs . We already have derived the MSE of YˆR earlier as
Y2f
(YˆR )
MSE= (C X2 − CY2 − 2 ρ Cx C y )
n
N
f
= ∑
n( N − 1) i =1
(Yi − RX i ) 2
Y
where R = .
X
Thus the MSE of ratio estimate upto the second order of approximation based on ith stratum is
fi
=
MSE (YˆRi ) (CiX2 − CiY2 − 2 ρi CiX CiY )
ni ( N i − 1)
Ni
fi
= ∑
ni ( N i − 1) j =1
(Yij − Ri X ij ) 2
and so
k
MSE (YˆRs ) = ∑ wi2 MSE (YˆRi )
i =1
 wi2 fi 2 2 k

∑ 
i =1  ni
= Yi (CiX + CiY2 − 2 ρi CiX CiY ) 

k  N

fi
= ∑  wi2 ∑
i
(Yij − Ri X ij ) 2 
=i 1 = ni ( N i − 1) j 1 
An estimate of MSE (YˆRs ) can be found by substituting the unbiased estimators of SiX2 , SiY2 and SiXY
2
as
six2 , siy2 and sixy , respectively for ith stratum and Ri = Yi / X i can be estimated by ri = yi / xi .

k
 wi2 fi 2 
=
MSE (YˆRs ) ∑ 
i =1  ni
( siy + ri 2 six2 − 2ri sixy ) .

Also
 (Yˆ )  wi2 fi ni k 
=
MSE ∑  ∑ ( yij − ri xij ) 2 
 ni (ni − 1) j 1
Rs
=i 1 = 
Properties of combined ratio estimator:
Here
k
∑w y yst i i
YˆRC
= =
i =1
k
X = X Rˆc X .
∑ wi xi
xst
i =1
It is difficult to find the exact expression of bias and mean squared error of YˆRc , so we find their
approximate expressions.
Define
yst − Y
ε1 =
Y
x −X
ε 2 = st
X
E (ε1 ) = 0
E (ε 2 ) = 0
N i − ni wi2 SiY2 k
fi wi2 SiY2
k
 f SY2 f 2 
∑ Nn Y2 ∑(ε12 )
E= =
Y2
 Recall that in case of Yˆ , E=
R (ε 2
1 ) =
n Y2 n
CY 
=i 1 = i i i 1 ni  
k
fi wi2 SiX2
E (ε 22 ) = ∑
i =1 ni X 2
k
f i SiXY
E (ε1ε 2 ) = ∑ wi2 .
i =1 ni XY
Thus assuming ε 2 < 1,
(1 + ε1 )Y
YˆRC = X
(1 + ε 2 ) X
= Y (1 + ε1 )(1 − ε 2 + ε 22 − ...)
= Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 − ...)
Retaining the terms upto order two due to same reason as in the case of YˆR ,
YˆRC  Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 )
Yˆ − Y= Y (ε − ε − ε ε + ε 2 )
RC 1 2 1 2 2
The approximate bias of YˆRc upto second order of approximation is
YˆRc ) E (YˆRc − Y )
Bias (=
 YE (ε1 − ε 2 − ε1ε 2 + ε 22 )
= Y 0 − 0 − E ( ε1ε 2 ) + E ( ε 22 ) 
k 
f  S 2 S 
= Y ∑  i wi2  iX2 − iXY  
i =1  ni X XY  
k 
f  S 2 ρ S S 
= Y ∑  i wi2  iX2 − i iX iY  
i =1  ni X XY  
Y k  fi 2  SiX ρi SiY  
= ∑  wi SiX  X − Y 
X i =1  ni  
k
f 
= R ∑  i wi2 SiX ( CiX − ρi CiY ) 
i =1  ni 
Y
, ρi is the correlation coefficient between the observations on Y and X in the i stratum,
th
where R =
X
Cix and Ciy are the coefficients of variation of X and Y respectively in the ith stratum.
The mean squared error upto second order of approximation is
YˆRc ) E (YˆRc − Y ) 2
MSE (=
 Y 2 E (ε1 − ε 2 − ε1ε 2 + ε 2 ) 2
 Y 2 E (ε12 + ε 22 − 2ε1ε 2 )
 fi 2  SiX2 SiY2 2 SiXY  
k
= Y ∑  wi  2 + 2 −
2

i =1  ni X Y XY  
k 
f  S 2 S 2 2ρ S S 
= Y 2 ∑  i wi2  iX2 + iY2 − i iX iY  
i =1  ni X Y X Y 
 fi 2  Y 2 2
Y2 k
Y 
= ∑  wi  2 SiX + SiY − 2 ρi SiX SiY  
Y2
2
i =1  ni X X 
k
f 
= ∑  i wi2 ( R 2 SiX2 + SiY2 − 2 ρi RSiX SiY ) .
i =1  ni 
An estimate of MSE (YRc ) can be obtained by replacing SiX2 , SiY2 and SiXY by their unbiased estimators
Y y
six2 , siy2 and sixy respectively whereas R = is replaced by r = . Thus the following estimate is
X x
obtained:
 wi2 fi 2 2 
( r six + siy2 − 2rsixy ) 
k
 (Y )
=
MSE Rc ∑ 
i =1  ni 
Comparison of combined and separate ratio estimators
An obvious question arises that which of the estimates YˆRs or YˆRc is better. So we compare their MSEs.
Note that the only difference in the term of these MSEs is due to the form of ratio estimate. It is
yi
− Ri = in MSE (YˆRs )
xi
Y
− R = in MSE (YˆRc ).
X
Thus
=∆ MSE (YˆRc ) − MSE (YˆRs )

 wi2 fi
k

= ∑ 
i =1  ni
( R 2 − Ri2 ) SiX2 + 2( Ri − R) ρi SiX SiY  

k
w f2

= ∑  i i ( R − Ri ) 2 SiX2 + 2( R − Ri )( Ri SiX2 − ρi SiX SiY )  .
i =1  ni 
The difference ∆ depends on
(i) The magnitude of the difference between the strata ratios ( Ri ) and whole population ratio
(R).
(ii) The value of ( Ri Six2 − ρi Six Siy ) is usually small and vanishes when the regression line of y on
x is linear and passes through origin within each stratum. See as follows:
Ri Six2 − ρi Six Siy =
0
ρi Six Siy
Ri =
Six2
which is the estimator of the slope parameter in the regression of y on x in the ith stratum. In
such a case
MSE (YˆRc ) > MSE (YˆRs )

but Bias (Yˆ ) < Bias (Yˆ ).
Rc Rs
So unless Ri varies considerably, the use of YˆRc would provide an estimate of Y with negligible bias
and precision as good as YˆRs .
• If Ri ≠ R, YˆRs can be more precise but bias may be large.
• If Ri  R, YˆRc can be as precise as YˆRs but its bias will be small. It also does not require
knowledge of X 1 , X 2 ,..., X k .
Ratio estimators with reduced bias:
The ratio type estimators that are unbiased or have smaller bias than Rˆ , YˆR or YˆRc (tot ) are useful in sample
surveys . There are several approaches to derive such estimators. We consider here two such approaches:
1. Unbiased ratio – type estimators:

Y
Under SRS, the ratio estimator has form X to estimate the population mean Y . As an alternative to
x
this, we consider following as an estimator of population mean
1 n Y 
YˆRo = ∑  i X .
n i =1  X i 
Yi
=
Let Ri = , i 1, 2,.., N ,
Xi
then
1 n
YˆR 0 = ∑ Ri X
n i =1
= rX
where
1 n
r= ∑ Ri
n i =1
(YˆR 0 ) E (YˆR 0 ) − Y
Bias=
= E (rX ) − Y
= E (r ) X − Y .
Since
1 n 1 N
E (r ) =
=
∑(N ∑
n i 1= i 1
Ri )
1 n
= ∑R
n i =1
= R.
So Bias (YˆR=
0) RX − Y .
N −n
Using the result that under SRSWOR, Cov( x , y ) = S XY , it also follows that
Nn
N −n 1 N
= ∑ ( Ri − R )( X i − X )
Cov(r , x )
Nn N − 1 i =1
N −n 1 N
= (∑ Ri X i − NRX )
Nn N − 1 i =1
N −n 1 N
Y
= (∑ i X i − NRX )
n N − 1 i =1 X i
N −n 1
= ( NY − NRX )
Nn N − 1
N −n 1
= [− Bias (YˆR 0 )].
n N −1
N −n N −n
Thus using the result that in SRSWOR, Cov( x , y ) = S XY , and therefore Cov(r , x ) = S RX , we
Nn Nn
have
n( N − 1)
Bias (YˆRo ) = − Cov(r , x )
N −n
n( N − 1) N − n
= − S RX
N − n Nn
 N −1 
= −  S RX
 N 
1 N
where =
S RX ∑ ( Ri − R )( X i − X ).
N − 1 i =1
The following result helps in obtaining an unbiased estimator of population mean:.
Since under SRSWOR set up,
E ( sxy ) = S xy
1 n
where=
sxy ∑ ( xi − x )( yi − y ),
n − 1 i =1
1 N
=
S xy ∑ ( X i − X )(Yi − Y ).
N − 1 i =1
So an unbiased estimator of the bias in Bias (YˆR 0 ) =
−( N − 1) S RX is obtained as follows:
 (Yˆ ) = − ( N − 1) s
Bias R0 rx
N
N −1 n
=− ∑ (ri − r )( xi − x )
N (n − 1) i =1
N −1 n
=
− (∑ ri xi − n r x )
N (n − 1) i =1
N − 1  n yi 
=
−  ∑ xi − nr x 
N (n − 1)  i =1 xi 
N −1
=
− (ny − nr x ).
N (n − 1)
So
 (Yˆ ) =
Bias R0

E YˆR 0 − Y =( )
−
n( N − 1)
N (n − 1)
( y − r x ).
Thus
E YˆR 0 − Bias (YˆR 0 )  =
Y
 
 n( N − 1) 
or E YˆR 0 + ( y − r x ) =
Y.
 N (n − 1) 
Thus
n( N − 1) n( N − 1)
YˆR 0 + (y − r x) =
rX + (y − r x)
N (n − 1) N (n − 1)
is an unbiased estimator of population mean.
2. Jackknife method for obtaining a ratio estimate with lower bias

Jackknife method is used to get rid of the term of order 1/n from the bias of an estimator. Suppose the
E ( Rˆ ) can be expanded after ignoring finite population correction as

a a
E ( Rˆ ) = R + 1 + 22 + ...
n n
Let n = mg and the sample is divided at random into g groups, each of size m. Then
ga ga
E ( gRˆ ) =gR + 1 + 2 2 2 + ...
gm g m
a a
= gR + 1 + 2 2 + ...
m gm
Let Rî* = ∑ * i where the ∑ * denotes the summation over all values of the sample except the ith
*
y
∑ xi
group. So Rî* is based on a simple random sample of size m(g - 1),
so we can express
a1 a
E ( Rî* ) =
R+ + 2 2 2 + ...
m( g − 1) m ( g − 1)
or
a1 a
E ( g − 1) Rî*  = ( g − 1) R + + 2 2 + ...
m m ( g − 1)
Thus
a2
E  gRˆ − ( g − 1) Rî*  =R − + ...
g ( g − 1)m 2
or
a g
E  gRˆ − ( g − 1) Rî*  =R − 22 + ...
n g −1
1
Hence the bias of  gRˆ − ( g − 1) Rî*  is of order 2 .
n
Now g estimates of this form can be obtained, one estimator for each group. Then the jackknife or
Quenouille’s estimator is the average of these of estimators
g
∑ Rˆ i
RˆQ = gRˆ − ( g − 1) i =1
.
g
Product method of estimation:

1 C
The ratio estimator is more efficient than the sample mean under SRSWOR if ρ > . x , if R > 0,
2 Cy
1 Cx
which is usually the case. This shows that if auxiliary information is such that ρ<− , then we
2 Cy
cannot use the ratio method of estimation to improve the sample mean as an estimator of the population
mean. So there is a need of another type of estimator which also makes use of information on auxiliary
variable X. Product estimator is an attempt in this direction.
The product estimator of the population mean Y is defined as
yx
YˆP = .
X
assuming the population mean X to be known
We now derive the bias and variance of Yˆp .
y −Y x−X
Let ε 0
= = , ε1 ,
Y X
(i) Bias of Yˆp .
We write Yˆp as
yx
Yˆp = =Y (1 + ε 0 )(1 + ε1 )
X
= Y (1 + ε 0 + ε1 + ε 0ε1 ).
Taking expectation we obtain bias of Yˆp as
1 f
=Bias (Yˆp ) E=
(ε 0ε1 ) Cov( y , x ) S xy ,
X nX
which shows that bias of Yˆp decreases as n increases. Bias of Yˆp can be estimated by
 (Yˆ ) = f s .
Bias p xy
nX
(ii) MSE of Yˆp :
Writing Yˆp is terms of ε 0 and ε1 , we find that the mean squared error of the product estimator Yˆp upto
second order of approximation is given by
Yˆp ) E (Yˆp − Y ) 2
MSE (=
= Y 2 E (ε1 + ε 0 + ε1ε 2 ) 2
≈ Y 2 E (ε12 + ε 02 + 2ε1ε 2 ).
Here terms in (ε1 , ε 0 ) of degrees greater than two are assumed to be negligible. Using the expected
values we find that
f
MSE (Yˆp ) =  SY2 + R 2 S X2 + 2 RS XY  .
n
(iii) Estimation of MSE of Yˆp
The mean squared error of Yˆp can be estimated by
 (Yˆ ) = f 2 2 2
MSE  s y + r sx + 2rsxy 
n
p
where r = y / x .
(iv) Comparison with SRSWOR:

From the variances of the sample mean under SRSWOR and the product estimator, we obtain
f
Var ( y ) SRS − MSE (Yˆp ) =
− RS X (2 ρ SY + RS X ),
n
f 2
where Var ( y ) SRS = SY which shows that Yˆp is more efficient than the simple mean y for
n
1 Cx
ρ<− if R > 0
2 Cy
and for
1 Cx
ρ >− if R < 0.
2 Cy
Multivariate Ratio Estimator
Let y be the study variable and X 1 , X 2 ,..., X p be p auxiliary variables assumed to be corrected with y .
Further it is assumed that X 1 , X 2 ,..., X p are independent. Let Y , X 1 , X 2 ,..., X p be the population means of
the variables y , X 1 , X 2 ,..., X p . We assume that a SRSWOR of size n is selected from the population of
N units. The following notations will be used.

Si2 : the population mean sum of squares for the variate X i ,
si2 : the sample mean sum of squares for the variate X i ,
S02 : the population mean sum of squares for the study variable y,
s02 : the sample mean sum of squares for the study variable y,
Si
Ci = : coefficient of variation of the variate X i ,
Xi
S0
C0 = : coefficient of variation of the variate y,
Y
S
ρi = iy : coefficient of correlation between y and X i ,
Si S 0
y
YˆRi = :ratio estimator of Y , based on X i
Xi
where i = 1, 2,..., p. Then the multivariate ratio estimator of Y is given as follows.

p p
=YˆMR
=i 1 =i 1
∑=
wiYˆRi , ∑ wi 1
p
Xi
= y ∑ wi .
i =1 xi
(i) Bias of the multivariate ratio estimator:

The approximate bias of YˆRi upto the second order of approximation is
f
=
Bias (YˆRi ) Y (Ci2 − ρi Ci C0 ).
n
The bias of YˆMR is obtained as

p
Yf
Bias (YˆMR )
= ∑w
i =1
i
n
(Ci2 − ρi Ci C0 )
p
Yf
=
n
∑ w C (C − ρ C ).
i =1
i i i i 0
(ii) Variance of the multivariate ratio estimator:
The variance of YˆRi upto the second order of approximation is given by
f 2 2
YˆRi )
Var (= Y (C0 + Ci2 − 2 ρi C0Ci ).
n
The variance of YˆMR upto the second order of approximation is obtained as
f 2 p 2 2
Var (YˆMR )
= Y ∑ wi (C0 + Ci2 − 2 ρi C0Ci ).
n i =1

Ratio and Product Methods of Estimation: Y X Y X Y X

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ratio and Product Methods of Estimation: Y X Y X Y X

Uploaded by

Copyright:

Available Formats

Chapter 5

Ratio and Product Methods of Estimation

Bias and mean squared error of ratio estimator:

Writing YˆR in terms of ε ' s, we get

YˆR = Y (1 + ε 0 )(1 − ε1 + ε12 − ...)

So the estimation error of YˆR is

YˆR − Y= Y (ε 0 − ε1 + ε12 − ε1ε 0 + ...).

of YˆR can be approximated as

YˆR − Y  Y (ε 0 − ε1 + ε12 − ε1ε 0 )

Then the bias of YˆR is given by

The bias of YˆR is zero, i.e.,

Now, to find the mean squared error, consider

Efficiency of ratio estimator in comparison to SRSWOR

MSE (YˆR ) < VarSRS ( y )

Upper limit of ratio estimator:

as negligible in relation to standard error of Rˆ .

Alternative form of MSE (YˆR )

Estimate of MSE (YˆR )

Let U i = 1, 2,.., N then MSE of YˆR can be expressed as

Based on this, a natural estimator of MSE (YˆR ) is

an estimate of MSE (YˆR ) is

intervals of Y and R are

Conditions under which the ratio estimate is optimum

(ii) this line is proportional to xi , i.e.

If n sample values of xi are kept fixed and then in repeated sampling

y ' s and x ' s. Hence for any fixed x ,

The best linear unbiased estimator of β is obtained by minimizing

linear unbiased estimator of Y .

xij : jth observation on X from ith strata i =1, 2,…,k; j = 1, 2,..., ni .

1. Separate ratio estimator

YˆRi i = 1, 2,.., k assuming the stratum mean X i to be known.

- Then combine all the estimates using weighted arithmetic mean.

This gives the separate ratio estimator as

stratum units. It does not depend on information on each X i but only on X .

Properties of separate ratio estimator:

So for YˆRi , we can write

ρi : correlation coefficient between the observation on X and Y in ith stratum

upto the second order of approximation.

for all the strata as Cx , C y and ρ respectively, we have

Thus assuming ε 2 < 1,

The approximate bias of YˆRc upto second order of approximation is

The mean squared error upto second order of approximation is

=∆ MSE (YˆRc ) − MSE (YˆRs )

MSE (YˆRc ) > MSE (YˆRs )

and precision as good as YˆRs .

• If Ri ≠ R, YˆRs can be more precise but bias may be large.

1. Unbiased ratio – type estimators:

2. Jackknife method for obtaining a ratio estimate with lower bias

E ( Rˆ ) can be expanded after ignoring finite population correction as

Product method of estimation:

We now derive the bias and variance of Yˆp .

(i) Bias of Yˆp .

Taking expectation we obtain bias of Yˆp as

second order of approximation is given by

(iii) Estimation of MSE of Yˆp

The mean squared error of Yˆp can be estimated by

(iv) Comparison with SRSWOR:

N units. The following notations will be used.

where i = 1, 2,..., p. Then the multivariate ratio estimator of Y is given as follows.

(i) Bias of the multivariate ratio estimator:

The bias of YˆMR is obtained as

The variance of YˆMR upto the second order of approximation is obtained as

You might also like