Regression Estimator

RATIO AND REGRESSION METHODS OF ESTIMATION
In the theory of simple random sampling, we considered estimators using observed values of
the characteristic under study. Many a time the characteristic y under study is highly
correlated to an auxiliary characteristic x , and data on x are either readily (easily) available
or can be easily collected for all the units in the population. The knowledge of x , which is
additional information about the population under study, is termed as auxiliary or
supplementary information. In such situations, it is customary (used) to consider estimators
of characteristic y that use the data on x and are more efficient than the estimators that use
data on the characteristic y alone. Two such methods of estimation are known as:
i) Ratio method of estimation
ii) Regression method of estimation.
Ratio method of estimation

Notation
yi , Measurement of the main variable on the i  th unit of the population.
xi , Measurement of the auxiliary variable on the i  th unit of the population.
N
Y   yi , Population total of y .
i 1
1 N Y
Y 
N i 1
yi  , Population mean of y .
N
N
X   xi , Population total of x .
i 1
1 N X
X 
N i 1
xi  , Population mean of x .
N
Y Y
R  , Ratio of the population totals or means of y and x .
X X
 , Correlation coefficient between y and x in the population.
Suppose it is desired to estimate Y or Y or R by drawing a simple random sample of n
units from the population. Let us assume that based on n pairs of observations ( yi , xi ) ,
i  1, 2, , n , y and x are the sample means of y and x respectively, and the population
total X or mean X is known. The ratio estimators of the population ratio R , the total Y ,
and the mean Y may be defined by
Rˆ  , YˆR  X  Rˆ X , and YˆR  X  Rˆ X , respectively.

y y y
x x x
For example, if y is the number of bullocks on a holding and x its area in acres, the ratio R̂
is an estimator of the number of bullocks per acre of holding in the population. The product
70 RU Khan
of R̂ with X (the average size of a holding in acres) would provide an estimator of Y (the
average number of bullocks per holding in the population), i.e.
y : Number of bullocks on a holding
x : Area in acres on a holding
y
Rˆ  : Estimate of the number of bullocks per acre on a holding in the population
x

Yˆ  R X : Estimate of the average number of bullocks per holding in the population
In sampling hospitals to estimate the number of patient-days of care during a particular
month, y may be the number of patient-days of care provided by a hospital during the month,
and x the number of beds in the hospital. As another example, y and x may denote the
values of the characteristic under study on two successive occasions, e.g. the acreage under a
crop during the current year and the previous year respectively.
y
Theorem: In srswor , for large n , Rˆ  is approximately unbiased for the population
x
N
1 f
2  i
ratio R and has an approximate variance V ( Rˆ )  ( y  R xi ) 2 .
n ( N  1) X i 1
Proof:
Consider,
y yRx yRx
Rˆ  R   R   , since n is large, x  X .
x x X
Under this condition,
1 1 1 Y 
E ( Rˆ  R)  E ( y  R x )  [ E ( y )  R E ( x )]   Y  X   0 ,  E ( Rˆ )  R .
X X X X 
Alternative method
By definition,
 y 1
E ( Rˆ )  E    E ( y ) E   , since y and x are not independent but highly correlated.
x x
If n is large, then we can take x  X , under this condition, E (Rˆ ) reduces to
1 Y
E ( Rˆ )  E ( y )   R . This shows that for large n , R̂ can be taken as unbiased
X X
estimate of R .
To obtain the variance, we have
2
 yRx 1
V ( Rˆ )  E ( Rˆ  R) 2  E    2 E ( y  R x)
2
(4.1)
 x  X
Now consider the variate
d i  yi  R xi , i  1, 2, , N .
Ratio and regression methods of estimation 71
Let d and D be the sample mean and population mean of variable d respectively,
where,
1 n 1 N Y
d 
n i 1
( yi  R xi )  y  R x , and D   ( yi  R xi )  Y  R X  0 , since R  .
N i 1 X
As sampling is simple random, wor , then

1 1  1 f 2
E (d )  D  0 , and V (d )     S d2  Sd , (4.2)
n N  n
where,
1 N 1 N 2 1 N
S d2  
N  1 i 1
( d i  D ) 2
 
N  1 i 1
d i  
N  1 i 1
( y i  R xi ) 2 (4.3)
and
V (d )  E (d  D ) 2  E (d ) 2  E ( y  Rx ) 2 (4.4)
In view of equations (4.1), (4.2), (4.3), and (4.4), we get
N
1 1 f
2  i
ˆ
V ( R)  V (d )  ( y  R xi ) 2 .
2
X n ( N  1) X i 1
Alternative expressions of V (Rˆ )

i) In terms of correlation coefficient: The correlation coefficient  between y and x is
defined by
N N
 ( y i  Y ) ( xi  X )  ( y i  Y ) ( xi  X )
i 1 1 N
  i 1 , as S 2   ( xi  X ) 2
N N ( N  1) S y S x N  1 i 1
 ( y i  Y )  ( xi  X )
2 2
i 1 i 1
N
or  ( yi  Y ) ( xi  X )  ( N  1)  S y S x , and hence,
i 1
N N
1 f 1 f
V ( Rˆ ) 
2  ( y i  R xi ) 2  2  ( y i  R X  R X  R xi ) 2
n ( N  1) X i 1 n ( N  1) X i 1
N
1 f Y
  ( yi  Y  R X  R xi ) 2 , since R  X
n ( N  1) X 2 i 1
N
1 f

2 [( yi  Y )  R( xi  X )]2
n ( N  1) X i 1
1 fN N N 
   ( y i  Y )  R  ( xi  X )  2 R  ( y i  Y ) ( xi  X ) 
2 2 2
n ( N  1) X 2 i 1 i 1 i 1 
72 RU Khan
1 f
 [ ( N  1) S y2  ( N  1) R 2 S x2  2 R ( N  1)  S y S x ]
2
n ( N  1) X
1  f  Y S y R 2 S x2 Y 2 R  S y S x 
2 2
1 f
 ( S y  R Sx  2 R  S y Sx ) 
2 2 2
 
nX 2 n  2 2 2 2 
Y X X Y X 
1  f 2  S y S 2 2  S y Sx  1 f
2
 R  x   R 2 (C yy  C xx  2  C y C x ) ,
n Y 2 X2 Y X  n
 
Sy S
where, C y  , and C x  x are the coefficient of variation of y and x respectively,
Y X
thus, C yy and C xx are the square of the coefficient of variation and are also called relative
variances.
ii) In terms of covariance: The covariance of y and x is defined by
1 N
S yx   ( yi  Y ) ( xi  X )   S y S x , so that,
N  1 i 1
 1  f 2  S y  1 f
2
S 2 2 S yx 
V ( R)  R  x  R 2 (C yy  C xx  2 C yx ) ,
 2 
n
Y X2 Y X 
n
where, C yx is called relative covariance.
Estimation of V (Rˆ )
1 n 1 N
Taking 
n  1 i 1
( y i  ˆ
R x i ) 2
as an unbiased estimate of 
N  1 i 1
( yi  R xi ) 2 , then,
n
1 f
Vˆ ( Rˆ )  v( Rˆ ) 
2  ( yi  Rˆ xi ) 2 .
n (n  1) X i 1
i) Interms of correlation coefficient

n
 ( y i  y ) ( xi  x )
1 f
v( Rˆ )  ( s 2y  Rˆ 2 s x2  2 Rˆ r s y s x ) , as r  i 1 ,
nX 2 (n  1) s y s x
1 n
s2  
n  1 i 1
( yi  y ) 2 .
ii) Interms of Covariance

1 f 2 ˆ 2 2
v( Rˆ )  ( s y  R s x  2 Rˆ s yx ) , since s yx  r s y s x .
2
nX
YˆR  Rˆ X  X is approximately unbiased for estimating the populations

y
Corollary:
x
1 f N
mean with approximate variance, V (YˆR )   ( yi  R xi ) 2 , provided sample size n
n ( N  1) i 1
is large.
Proof: For large n , we have, E ( Rˆ )  R , so that, E (YˆR )  E ( Rˆ X )  X E ( Rˆ )  X R  Y .

Further,
N
1 f
V (YˆR )  V ( Rˆ X )  X 2 V ( Rˆ )   ( y i  R xi ) 2 .
n ( N  1) i 1
Alternative expressions of V (YˆR )
i) In terms of correlation coefficient

By definition,
1 f 2
V (YˆR )  X 2V ( Rˆ )  ( S y  R 2 S x2  2 R  S y S x )
n
1 f 2
 Y (C yy  C xx  2  C y C x ) .
n
ii) In terms of covariance

1 f 2 1 f 2
V (YˆR )  X 2V ( Rˆ )  ( S y  R 2 S x2  2 R S yx )  Y (C yy  C xx  2 C yx ) .
n n
Estimation of V (YˆR )
1 n 1 N
Taking 
n  1 i 1
ˆ 2
( yi  R xi ) as an unbiased estimate of 
N  1 i 1
( yi  R xi ) 2 , then,
n
1 f 1 f 2 ˆ 2 2
Vˆ (YˆR )  X 2 v ( Rˆ )   ( yi  Rˆ xi ) 2  ( s y  R s x  2 Rˆ r s y s x ) .
n (n  1) i 1 n
Also,
1 f 2 ˆ 2 2
Vˆ (YˆR )  ( s y  R s x  2 Rˆ s yx ) .
n
Corollary: YˆR  Rˆ X  Rˆ N X is approximately unbiased for estimating the population

total with approximate variance
N 2 (1  f ) N
V (YˆR )  
n ( N  1) i 1
( yi  R xi ) 2 , provided sample size n is large.
Proof: If n is large, E ( Rˆ )  R , then
E (YˆR )  E ( Rˆ N X )  N X E ( Rˆ )  N X R  NY  Y .
74 RU Khan
and
N 2 (1  f ) N
V (YˆR )  V ( Rˆ N X )  N 2 X 2 V ( Rˆ )  
n ( N  1) i 1
( y i  R xi ) 2 .
Alternative expressions of V (YˆR )
i) In terms of correlation coefficient
N 2 (1  f ) N (1  f ) 2
V (YˆR )  N 2 X 2 V ( Rˆ )  
n ( N  1) i 1
( y i  R xi ) 2 
n
Y (C yy  C xx  2  C y C x )
ii) In terms of covariance
N 2 (1  f ) 2
V (YˆR )  N 2 X 2 V ( Rˆ )  ( S y  R 2 S x2  2 R S yx )
n
(1  f ) 2
 Y (C yy  C xx  2 C yx ) .
n
Estimation of V (YˆR )
Taking
1 n 1 N

n  1 i 1
( y i  ˆ
R x i ) 2
as an unbiased estimate of 
N  1 i 1
( yi  R xi ) 2 , then
N 2 (1  f ) n
Vˆ (YˆR )  N 2 X 2 Vˆ ( Rˆ )   ( yi  Rˆ xi ) 2
n (n  1) i 1
N 2 (1  f ) 2 ˆ 2 2
 ( s y  R s x  2 Rˆ r s y s x ) .
n
Also
N 2 (1  f ) 2 ˆ 2 2
Vˆ (YˆR )  ( s y  R s x  2 Rˆ s yx ) .
n
Corollary: Show that, to the first order of approximation,
1  f
(CV ) 2  (C yy  C xx  2  C y C x ) .
n
Proof: We know that
x 1 f 2
CV ( x)  , and V ( Rˆ )  R (C yy  C xx  2  C y C x ) , so that,
x n
V ( Rˆ ) 1  f
[CV ( Rˆ )]2   (C yy  C xx  2  C y C x ) , since R  E (Rˆ )
2 n
R
Similarly, we can see (CV ) 2 for Yˆ , and Yˆ .

Note: The quantity (CV ) 2 is called the relative variance and is same for all the three
estimates R̂ , Yˆ and Yˆ .
R R
 Rˆ  1  f
Corollary: If C x  C y  C , show that the relative variance V    2 C 2 (1   ) .
R n
Proof: By definition,
 Rˆ  1 1 1  f 2 
V    V ( Rˆ )   R (C yy  C xx  2  C y C x )
R R
2
R2  n 
1 f 1 f
 (C 2  C 2  2  C 2 )  2 C 2 (1   ) .
n n
Example: In a locality there are 50 lanes. In 2005 there were 6250 persons living. Recently
sample of 5 lanes showed the number of residents changing as following:
Lane Number : 1 2 3 4 5
Person living in 2005 : 100 150 160 200 140
Recently : 120 160 200 170 150
Estimate the standard error of the number of persons residing in the locality using
i) The recent sample only.
ii) The information about 2005 as well as recent sample.
Solution:
 N n 2
i) y  160 , then Yˆ  N y  8000 , and V (Yˆ )  N 2  S .
 Nn 
1 n
Since S 2 is unknown, so its estimator s 2 can use s 2y  
n  1 i 1
( yi  y ) 2  850 , hence,
V (Yˆ )  382500 , and SE (Yˆ )  V (Yˆ )  618 .

ii) Using ratio method
160
YˆR  Rˆ X   6250  6667
150
and
N 2 (1  f ) 2 ˆ 2 2
v (YˆR )  ( s y  R s x  2 Rˆ s yx ) ,
n
1 n 1 n
where s x2   i
n  1 i 1
( x  x ) 2
 1300 , and s yx   ( yi  y ) ( xi  x )  750 .
n  1 i 1
Therefore,
v(YˆR )  36455.5555 , and hence, SE (YˆR )  191 .
76 RU Khan
Comparison of ratio estimate with mean per unit estimate

The conditions under which the ratio estimator is superior to the mean per unit will be worked
out with a comparison of their variances. In srswor , the variance of the mean per unit is
given by
1 1  1 f 2
V ( y)     S 2  Sy
n N  n
and the variance of the mean based on the ratio method is
1 f 2
V (YˆR )  ( S y  R 2 S x2  2 R  S y S x ) .
n
Obviously ratio estimate YˆR will more precise as compared to y if and only V (YˆR )  V ( y ) ,
so that
1 f 2 1 f 2
( S y  R 2 S x2  2 R  S y S x )  Sy
n n
R Sx
 R 2 S x2  2  R S y S x or  
2Sy
1 Sx / X 1  CV ( x) 
or   , or    .
2 Sy /Y 2  CV ( y ) 
Theorem: In simple random sampling, the bias of the ratio estimator R̂ is

Cov ( Rˆ , x )
B ( Rˆ )   .
X
Proof: We know that
Cov ( Rˆ , x )  E ( Rˆ x )  E ( x ) E ( Rˆ )  E ( y )  X E ( Rˆ )  Y  X E ( Rˆ )
Y 1 1
 E ( Rˆ )   Cov ( Rˆ , x )  R  Cov ( Rˆ , x )
X X X
and hence,
1
E ( Rˆ )  R   Cov ( Rˆ , x ) .
X
B( Rˆ )
Corollary: Prove that  CV ( x )
 Rˆ
Proof: We know that
1  ˆ x
B( Rˆ )   Rˆ , x  Rˆ  x  R , since correlation between R̂ and x can not  1 .
X X
Hence,
B( Rˆ ) x
  CV ( x ) .
 Rˆ X
Corollary: Prove that R̂ is an unbiased of R , if  ( Rˆ , x )  0 , where  stand for the

correlation coefficient.
Proof: We know that, the amount of bias of the ratio estimator R̂ is
1 1
E ( Rˆ )  R   Cov ( Rˆ , x )    Rˆ , x  Rˆ  x  0 .
X X
Theorem: If yi , xi is pair of variates defined on every unit in the population and y , x are
the corresponding means from a simple random sample of size n , then their covariance is
N n 1 N
E( y  Y ) ( x  X )   ( yi  Y ) ( xi  X ) .
n N N  1 i 1
Proof: Consider the variate

ui  yi  xi , i  1, 2,, N .
Let u and U be the sample mean and population mean of variable u respectively, where
u  y  x , and U  Y  X . As sampling is simple random, wor , then
N n 2 1 N
E (u )  U , and V (u )  E (u  U ) 2 
nN
S u , as S u2  
N  1 i 1
(ui  U ) 2 . That is
N n 1 N
E ( y  x  Y  X )2   ( y i  xi  Y  X ) 2
n N N  1 i 1
N n 1 N
or E[( y  Y )  ( x  X )]2  
n N N  1 i 1
[( yi  Y )  ( xi  X )]2
or E ( y  Y ) 2  E ( x  X ) 2  2 E ( y  Y ) ( x  X )
N n 1 N N N 
  ( y i  Y )   ( xi  X )  2  ( y i  Y ) ( xi  X )  .
2 2

n N N  1 i 1 i 1 i 1 
Hence,
N n 1 N 1 f N
E ( y  Y ) (x  X )   ( yi  Y ) ( xi  X )  n ( N  1)  ( yi  Y ) ( xi  X ) .
n N N  1 i 1 i 1
Theorem: Show that the first approximation to the relative bias of the ratio estimator in the
simple random sampling, wor , is given by
B( Rˆ ) 1  f 1 f
 ( R S x2   S y S x )  (C xx   C y C x ) .
R nXY n
Proof: We know that
 
 
ˆ y yRx yRx yRx 1 
RR R  
x x X  (x  X ) X  xX 
1  
 X 
78 RU Khan
Expanding by a Taylor’s series, we get

yRx x  X 
Rˆ  R  1    , as (1  x) 1  1  x  x 2  (1) r x r  valid for
X  X 
x 1.
Ignoring the terms of the second and higher orders, we have

1  1 
E ( Rˆ  R)   E ( y  R x )  E ( y  R x ) ( x  X ) . Now
X X 
E ( y  R x )  Y  R X  0 and
E [( y  R x ) ( x  X )]  E [ y ( x  X )]  R E [ x ( x  X )]
 E [( y  RX  RX ) ( x  X )]  R E [ x ( x  X )]
 E [( y  Y  RX ) ( x  X )]  R E [ x ( x  X )]
 E ( y  Y ) ( x  X )  RX E ( x  X )  R E [ x ( x  X )]
 E( y  Y ) ( x  X )  R E ( x  X ) ( x  X )
 E[( y  Y ) ( x  X )]  R E ( x  X ) 2
1 f N 1 f N
  i
n ( N  1) i 1
( y  Y ) ( x i  X )  R  ( xi  X ) 2
n ( N  1) i 1
1 f
 (  S y S x  R S x2 ) .
n
Hence,
1 f
E ( Rˆ  R)  B( Rˆ )  ( R S x2   S y S x )
2
nX
B( Rˆ ) 1 f 1 f  S x2 S y Sx 
  ( R S x2   S y S x )    
R n X 2 (Y / X ) n X2 Y X 
 
1 f
 (C xx   C y C x ) .
n
Sy
Note: The bias in the ratio estimator becomes zero, when R , because
Sx
1 f 1 f  Sy 2 
B( Rˆ )  ( R S x2   S y S x )   S   S S  , which is satisfied only if line
2 2 S x y x 
nX nX  x 
of regression of y on x passes through the origin.
Unbiased ratio type estimators

A frequently used technique in the construction of unbiased ratio type estimators is to adjust
an estimator for its bias. Ratio-type estimators obtain through this approach has been
proposed by several authors.
Hartley-Ross unbiased ratio type estimator

Define,
1 n 1 n yi
r  i nx
n i 1
r 
i 1 i
Bias (r )  E (r  R)  E (ri  R) , as in srs E (r )  E (ri )
y Y  1  yi  1 1 N yi 
 E  i     E  Y  X    E   yi  X 
 xi X  X  xi  X  N i 1 xi 

 N yi N  N 

1   xi  X  y i  1   ri ( xi  X ) 
NX  xi x  NX  
 i 1 i 1 i   i 1 
1 N
Since an unbiased estimate of  ri ( xi  X ) is
N  1 i 1
1 n 1  n yi  n

n  1 i 1
ri ( xi  x ) 
 
n  1  i 1 xi
xi  n x r  

( y  r x) .
 n 1
Thus,
1 n ( N  1)
Bias (r )  E (r )  R   ( y  r x) .
N X n 1
Correcting bias in r , we have an unbiased ratio type estimator for R
n ( N  1)
Rˆ HR  rHR  r  ( y  r x ) . Further,
(n  1) N X
n ( N  1)
YˆHR  Rˆ HR X  r X  ( y  r x) ,
(n  1)
and
n ( N  1)
YˆHR  Rˆ HR X  r X  ( y  r x) .
(n  1) N
For large samples, an approximation to sampling variance is given by
S y2  R 2 S x2  2 R  S y S x y 
V ( Rˆ HR )  , where R  E  i 
2
nX  xi 
 2 
2  S y  R Sx  2 R  S y Sx  1
2 2
ˆ
V (YHR )  X  ( S 2  R 2 S x2  2 R  S y S x )
 2  n y
 nX 
and
 2 
2 2  S y  R Sx  2 R  S y Sx  N
2 2 2
ˆ
V (YHR )  N X  ( S y2  R 2 S x2  2 R  S y S x ) .
 nX 2  n
 
80 RU Khan
Note: A lot of work has done on these lines, but most one of a theoretical nature and
difficult to use in practice.
Optimum property of ratio estimator

With srs from infinite population, the ratio estimate of the population mean Y is best linear
unbiased estimate, if following two conditions are satisfied:
i) The relation between xi and yi is straight line through origin.
ii) The variance of yi about this line is proportional to xi .

Proof:
i) E ( yi )  E ( yi | xi )  B xi , under this condition, we can write the model
yi  B xi  ei ,
where ei are independent of the xi and xi  0 , if xi fixed, then E (ei )  E (ei | xi )  0 .
ii) V ( yi )  V ( B xi )  V (ei | xi )  Cov ( B xi , ei )
 V (ei | xi )  A xi , Covariance is zero, as B xi and ei are independent.
 V ( yi )  xi , if and only if V (ei | xi )  A xi .

Let
n
b    i yi is a linear estimate of B , then
i 1
n n n
E (b)    i E ( yi )    i E ( yi | xi )    i B xi
i 1 i 1 i 1
n n
 B   i xi  B , if and only if   i xi  1 .
i 1 i 1
n n
Thus, the class b    i yi is the class of linear unbiased estimator of B , if   i xi  1 .
i 1 i 1
 n  n n
V (b)  V    i yi     i2 V ( yi )  A   i2 xi .
 
 i 1  i 1 i 1
n
Now our problem is to find the estimate in the class of estimators b    i yi , such that
i 1
n
V (b) is minimum under the condition   i xi  1 . For this we shall use the Lagrange
i 1
multipliers technique. Define the function,
   
  V (b)   1    i xi   A   i2 xi   1    i xi  .
   
 i  i  i 
n
To get  i ' s such that V (b) is minimum, subject to condition   i xi  1 is same as to get
i 1
 i ' s such that  is minimum.
Differentiate  with respect to  i and equate to zero, we get
 
 0  2 A  i xi   xi or  i  .
i 2A
But
n n x  1 1
  i xi  1   2 Ai  1 or  or  i  , for all i .
i 1 i 1
2A  xi  xi
i i
n
1
Hence, the estimator b in the linear class b    i yi has minimum V (b) for  i  .
i 1  xi
i
Thus, the best linear unbiased estimator of B is given by

n
b    i yi   yi /  xi  Rˆ . Now arranging the model over the population, we get
i 1 i i
Y  B X , since b  Rˆ is best linear unbiased estimate of B .
Therefore, the best linear unbiased estimate of Y will be Yˆ  b X  Rˆ X , which is the ratio
estimate of Y .
Ratio estimators in stratified random sampling

In stratified random sampling there are two ways of forming the ratio estimator of a
population total. These give
i) Separate ratio estimator
ii) Combined ration estimator
Separate ratio estimator

yi
If yi and xi are sample mean computed from the i  th stratum, the ratio are computed
xi
separately from each stratum and with the knowledge of population total X i for the i  th
y k
stratum, we may define the estimator YˆR s ( s for separate) as YˆR s   i X i .
x
i 1 i
Theorem: If an independent simple random sample is drawn in each stratum and sample
sizes are large in all strata, then
k N 2 (1  f )
V (YR s )   i
ˆ i ( S 2  R 2 S 2  2 R  S S ) , where R  Yi  Yi and 
yi i xi i i yi xi i i
i 1
ni Xi Xi
are the true ratio and the coefficient of correlation, respectively in the i  th stratum.
82 RU Khan
Proof: By definition, the ratio estimator for the i  th stratum
y N 2 (1  f i ) 2
YˆR i  i X i  Rˆ i X i , and V (YˆR i )  i ( S yi  Ri2 S xi
2
 2 Ri  i S yi S xi )
xi ni
k
Since YˆR s   YˆR i and sampling is independent in each stratum
i 1
k k N 2 (1  f )
V (YˆR s )  V (YˆR i )   i i 2
( S yi  Ri2 S xi
2
 2 Ri  i S yi S xi ) .
i 1 i 1
ni
Note: YˆR s is not an unbiased estimator of the population total and will have the same
B (YˆR s ) k (1  f )
properties as the ratio estimator, the relative bias is  i 2
(C xi   i C yi C xi ) .
Y i 1
ni
Corollary: In stratified random sampling, wor , almost unbiased estimator of V (YˆR s ) is

given by
k N 2 (1  f )
V (YR s )   i
ˆ ˆ i
( s 2yi  Rˆ i2 s xi
2
 2 Rˆ i s yxi ) ,
i 1
ni
where s yx i stands for the estimated covariance in the i  th stratum.
Combined ratio estimator

If X i is not known, but only X in known, and if Ri ' s do not differ considerably from
stratum to stratum and if ni ' s are not large enough, one can use the combined ratio estimator
Yˆ ( c for combined) for a sample from a stratified population as
Rc
Yˆ y 1 k
1 ˆ 1 k
1
YˆR c  st X  st X , where y st 
ˆ
X st x st N
 N i yi  N
Yst , and x st 
N
 N i xi  N Xˆ st .
i 1 i 1
Theorem: If the total sample size n is large and simple random sampling, wor , is done in
each stratum independently, then YˆR c is a consistent estimator and its
k N 2 (1  f )
V (YR c )   i
ˆ i 2
( S yi  R 2 S xi
2
 2 R  i S yi S xi ) .
i 1
ni
Proof:
By definition,
2 2
y   N y st X  x st Y 
V (YˆR c )  E (YˆR c  Y ) 2  E  st X  Y   E  
 x st   x st 
2
N X  Y  X
 E  y st  x st   N 2 E ( y st  R x st ) 2 , writing  1.
 x st  X  x st
Now consider the variate
uij  yij  R xij , j  1, 2, , N i , then,
u st  y st  R x st , and U  Y  R X  0 .
k   2 2
1 1 1 k N i ( N i  ni ) 2
V (u st )      Wi S u i   n Su i ,
i 1 i
n Ni  N 2 i 1 i
where
i N iN
1 1
S u2 i  
N i  1 j 1
(uij  U i ) 2  
N i  1 j 1
( yij  R xij  Yi  R X i ) 2
i N
1
 [( yij  Yi )  R ( xij  X i )]2
N i  1 j 1
i N
1
 
N i  1 j 1
[( yij  Yi ) 2  R 2 ( xij  X i ) 2  2 R ( yij  Yi ) ( xij  X i )]
 (S yi
2
 R 2 S xi2  2 R  i S yi S xi ) and hence,
1 N i ( N i  ni ) 2
k
V (u st ) 
N i 12  n
( S yi  R 2 S xi2  2 R  i S yi S xi ) .
i
Also,
V (u st )  E (u st  U ) 2  E (u st ) 2  E ( y st  R x st ) 2 , and therefore,
N ( N  ni ) 2
k
V (YˆR c )  N 2 V (u st )   i i ( S yi  R 2 S xi2  2 R  i S yi S xi )
i 1 ni
k
N i2 (1  f i ) 2
 ( S yi  R 2 S xi2  2 R  i S yi S xi ) .
i 1 ni
Corollary: In stratified random sampling, wor , almost unbiased estimator of V (YˆR c ) is

k N 2 (1  f )
Vˆ (YˆR c )   i i
( s 2yi  Rˆ 2 s xi
2
 2 Rˆ s yxi ) , where s yx i stands for the estimated
i 1
ni
covariance in the i  th stratum.
Comparison of separate and combined ratio estimators

To make a comparative study, we can write
k N 2 (1  f )
V (YˆR c )  V (YˆR s )   i i
[( R 2  Ri2 ) S xi
2
 2 ( R  Ri )  i S yi S xi )]
i 1
ni
k N 2 (1  f )
 i i
[( R  Ri ) 2 S xi
2
 2 ( Ri  R) (  i S yi S xi  Ri S xi
2
)] .
i 1
ni
84 RU Khan
Product Estimator
If the correlation coefficient between the under study variable y and auxiliary variable x is
negative, we cannot make use of the ratio estimator because it gives precise results provided
1
the correlation coefficient is greater than (C x / C y ) . In such situations, another type of
2
estimators for the mean Y , and the total Y , defined as
Xˆ
YˆP  y , and Yˆ p  N
x x
y  Yˆ which may be termed as the product estimators.
X X X
Note: For the product estimator in a large simple random sample, the coefficient of
variation of either Yˆ , or Yˆ is
P p
1 f
(CV ) 2  (C yy  C xx  2 C yx ) .
n
Regression method of estimation

Like ratio estimators, linear regression estimators also make use of auxiliary information
(variable) that is correlated with under study variable for increasing precision. Ratio
estimators often result in increased precision if the regression of under study variable ( y ) on
auxiliary variable (x) is linear and passes through the origin i.e. when the regression equation
of y on x is y  b x . When the regression of y on x is linear, the regression line does not
pass through the origin, in such situations, it is better to use estimators based on linear
regression.
Let a srs of size n has been obtained from a population of size N and yi and xi measured
on each unit of the sample and the population mean X of the x variate is known. The linear
regression estimators of the population mean Y and population total Y are given by
ylr  y  b ( X  x )  y  b ( x  X ) , and Yˆlr  N [ y  b ( X  x )] ,
where, lr denotes linear regression, and b is an estimate of the regression coefficient B of
y on x in the population (an estimate of the change in y for a unit change in x ). The
rationale (belief) of this estimate is that if x is below average we should expect y also to be
below average by an amount b ( X  x ) because of the regression of yi on xi .
Watson, J.D. (1937) used a regression of leaf area on leaf weight to estimate the average area
of the leaves on a plant. The procedure was to weigh all the leaves on the plant. For a small
sample of leaves, the area and the weight of each leaf were determined. The sample mean leaf
area was than adjusted by means of the regression on leaf weight. The point of the application
is, of course, that the weight of a leaf can be found quickly but determination of its area is
more time consuming. In another application described by Yates, Z. (1960), an eye estimate
1
of the volume of timber was made on each of a population of acre plots, and the actual
10
timber volume was measured for a sample of the plots. The regression estimate adjusts the
sample mean of the actual measurements on the rapid estimates.
Theorem: In simple random sampling, wor , in which b0 is a pre-assigned constant, the
linear regression estimate ylr  y  b0 ( X  x ) is unbiased estimate of Y with its variance
1 f N 1 f 2
V ( ylr )  
n ( N  1) i 1
[( yi  Y )  b0 ( xi  X )]2 
n
( S y  b02 S x2  2 b0 S yx ) .
Note: In most applications b is estimated from the result of its samples. However
sometimes it is reasonable to choose the value of b in advance and the estimator is called
difference estimator.
Proof:
By definition,
E( ylr )  E [ y  b0 ( X  x )]  E ( y )  b0 X  b0 E ( x )  Y  b0 X  b0 X  Y .
To obtain the variance, consider the variate
ui  yi  b0 ( xi  X ) , i  1, 2, , N .
Let u and U be the sample mean and population mean of variable u respectively, where
86 RU Khan
1 n
u  [ yi  b0 ( xi  X )]  y  b0 ( x  X )  y  b0 ( X  x )  ylr
n i 1
(4.5)
1 N 1 N
U  
N i 1
[ yi  b0 ( xi  X ]  Y  b0  ( xi X )  Y , and
N i 1
1 1  1 f 2
V (u )     S u2  S u , as sampling is simple random, wor , where,
n N  n
1 N 1 N
S u2  
N  1 i 1
(u i  U ) 2
 
N  1 i 1
[ yi  b0 ( xi  X )  Y ]2
1 N
 
N  1 i 1
[ ( yi  Y )  b0 ( xi  X )]2 .
Therefore, from equation (4.5), we get

N
1 f
V ( ylr )  V (u )  
n ( N  1) i 1
[ ( yi  Y )  b0 ( xi  X )]2
1 f N N N 
 ( yi  Y )  b0  ( xi  X )  2 b0  ( yi  Y ) ( xi  X )
2 2 2

n ( N  1) i 1 i 1 i 1 
1 f 2
 ( S y  b02 S x2  2 b0 S yx ) .
n
Corollary: In simple random sampling, wor , an unbiased estimate of V ( ylr ) is
1 f 2
Vˆ ( ylr )  v( ylr )  ( s y  b02 s x2  2 b0 s yx ) ,
n
where
1 n 1 n 1 n
s 2y   i
n  1 i 1
( y  y ) 2
, s 2
x   i
n  1 i 1
( x  x ) 2
, and s yx   ( y i  y ) ( xi  x ) .
n  1 i 1
S yx S y Sx Sy
Theorem: The value of b0 which minimizes V ( ylr ) is B    (called
S x2 S x2 Sx
the regression linear coefficient of y on x in the population) and the resulting minimum
1 f 2
variance is Vmin ( ylr )  S y (1   2 ) .
n
Proof:
We prove this theorem by contradiction, let
S yx
b0  B ,  b0  B  d  d, d 0 (4.6)
S x2
Substituting equation (4.6) in the expression of V ( ylr ) , we get
 2 
1  f  2  S yx  2  S yx 
V ( ylr )  Sy  
 d Sx  2   d S yx 

n   S2
 x


 S2
 x



 
 2 2 
1  f  2  S yx  2 S  2
 S  2  yx
S 
 Sy   S  d 2 S 2  2 d  yx   2 d S yx 
n   S2  x x  S2  x  S  
 x   x   x 
 
 2 2 
1  f  2  S yx  S 
 S y     d 2 S x2  2 d S yx  2  yx   2 d S yx 
n    S  
  Sx   x  
 2 
1  f  2  S yx 
 S y     d 2 S x2  .
n   
  Sx  
Clearly the RHS of the above expression will be minimum when d  0 i.e. b0  B .
 2  2
1  f  2  S yx   S S
  1  f  S y2    y x

  1  f S y2 (1   2 ) .
Vmin ( ylr )  S y   
 


n   Sx   n   Sx   n
   
Alternative method: We know that

1 f 2
V ( ylr )  ( S y  b02 S x2  2 b0 S yx ) .
n
To obtain the value of b0 such that V ( ylr ) is minimum, differentiate V ( ylr ) with respect to
b0 and equating to zero, we get
 V ( ylr ) 1 f S yx
0 (2 b0 S x2  2 S yx ) , or b0  .
 b0 n S x2
Therefore,
 2   2 2
1  f  2  S yx  2 S
 S  2  yx

 S   1 f S
 S y2   yx
 S 

V ( ylr )  Sy    2  yx 
n   S2  x  S2  yx  n   S   S  
  x   x     x   x  
  S ySx   1 f 2
2
1 f  2
 Sy      S y (1   2 ) .
n  
Sx   n
  
Theorem: If b is the least square estimate of B and ylr  y  b ( X  x ) , then under srs
1 f 2
of size n , V ( ylr )  S y (1   2 ) , provided n is large enough, so that the error (b  B)
n
in b is negligible.
Proof: Introduce the residual variate ei , define by the relation
ei  yi  Y  B ( xi  X ) .
88 RU Khan
Now averaging over the sample values of ei ' s , we get
e  y  Y  B (x  X ) , and ylr  e  Y  B ( x  X )  b ( X  x ) , so that
ylr  Y  e  (b  B) ( X  x )
Consider
n n
 y i ( xi  x )  [Y  B ( xi  X )  ei ] ( xi  x )
b  i 1  i 1
n n
 ( xi  x ) 2
 ( xi  x ) 2
i 1 i 1
n n n n
 Y ( xi  x )  B  ( xi  x ) 2   ei ( xi  x )  ei ( xi  x )
 i 1 i 1 i 1
 B  i 1 .
n n
 ( xi  x ) 2  ( xi  x ) 2
i 1 i 1
N
By using two properties of ei are that  ei  0 , and
i 1
N N
 ei ( xi  X )   [ yi  Y  B ( xi  X )] ( xi  X )
i 1 i 1
N N S yx
  ( yi  Y ) ( xi  X )  B  ( xi  X ) 2  0 , by definition of B , i.e. B  .
i 1 i 1 S x2
Thus,
1 n 1 N
(b  B)  0 , since  ei ( xi  x ) is an unbiased estimate of N  1  ei ( xi  X ) .
n  1 i 1 i 1
Therefore,
ylr  Y  e . By definition,
V ( ylr )  E ( ylr  Y ) 2  E (e ) 2
and V (e )  E [e  E (e )]2  E (e ) 2 , as E (e )  0 .
 V ( ylr )  V (e ) .
Since e is the sample mean of ei ' s by srswor , then
1 f 2 1 N 1 N 2
V (e ) 
n
S e , where S e2   i
N  1 i 1
( e  e ) 2
  ei
N  1 i 1
1 f N
 
n ( N  1) i 1
[ yi  Y  B ( xi  X )]2
1 f  N N N 
  ( y i  Y )  B  ( xi  X )  2 B  ( y i  Y ) ( xi  X ) 
2 2 2

n ( N  1) i 1 i 1 i 1 
 2 
1 f 2 1  f  2  S yx  2  S yx 
S 
 2 2
[ S y  B S x  2 B S yx ]  Sy  Sx  2
n n   S2 
 x 
 S2
 x
 yx 

 
 2
1 f  2  S ySx   1 f 2
 S y     
 S y (1   2 )  V ( ylr ) .
n   Sx   n
 
Estimation of V ( ylr )
As a sample estimate of V ( ylr ) , valid in large samples, we may use
1 f 2 1 f 2 sy
Vˆ ( ylr )  v ( ylr )  s y (1  r 2 )  ( s y  b 2 s x2 ) , since b  r
n n sx
1 f  1 n 1 n 
   ( yi  y ) 2  b 2  ( xi  x ) 2 
n  n  1 i 1 n  1 i 1 
   ( y i  y ) ( xi  x ) 
2 
n   n 
1 f  2  i  2
  ( yi  y )  
n (n  1) i 1   i
(x  x)

   i ( x  x ) 2
 i 1 
  i  
 2
1 f  n 1   
  ( y i  y ) 2    ( y i  y ) ( xi  x )  
  
n (n  1) i 1

 ( xi  x ) 2  i 

i
1 f  n n 
 ( y i  y )  b  ( y i  y ) ( xi  x ) 
2

n (n  1) i 1 i 1 
1 f  n n n 
 ( yi  y )  b  ( yi  y ) ( xi  x )  2b  ( yi  y ) ( xi  x ) 
2

n (n  1) i 1 i 1 i 1 
1 f n
 [( yi  y ) 2  b 2 ( xi  x ) 2  2 b ( yi  y ) ( xi  x )]
n (n  1) i 1
1 f n
 
n (n  1) i 1
[( yi  y ) b ( xi  x )]2 .
According to standard regression theory, it is suggested that the divisor (n  2) be used

instead of (n  1) .
90 RU Khan
Comparison of linear regression estimate with ratio and mean per unit estimate
For large samples
1 f 2
V ( y sr )  Sy (4.7)
n
1 f 2
V ( yR )  ( S y  R 2 S x2  2  R S y S x ) (4.8)
n
1 f 2
V ( ylr )  S y (1   2 ) (4.9)
n
From equation (4.7), and (4.9), it is clear that V ( ylr )  V ( y sr ) , unless   0 , in which case
V ( ylr )  V ( y sr ) and the two estimates are equally precise.
From equation (4.8), and (4.9), ylr will be precise than y R if and only if V ( ylr )  V ( y R ) .
Consider,
S y2 (1   2 )  S y2  R 2 S x2  2  R S y S x or   2 S y2  R 2 S x2  2  R S y S x
or  2 S y2  R 2 S x2  2  R S y S x  0 or (  S y  R S x ) 2  0
2 2
 S yx   S yx 
or  S y  R Sx   0 or   R  S x2  0
 S y Sx   S2 
   x 
or ( B  R) 2 S x2  0 (4.10)
As the LHS of equation (4.10) is a perfect square. Thus we conclude that the linear
regression estimate is always better than the ratio estimate except when B  R , i.e. if B  R ,
y
then b  Rˆ  , and
x
y
ylr  y  b ( X  x )  y  ( X  x)
x
X  y  Rˆ X  YˆR .
y
 y
x
This means that both the estimates linear regression and ratio have the same variance and this
occurs only when the regression of y on x is a straight line passes through the origin.
Corollary: In srs , the bias of ylr is approximated by B ( ylr )  Cov (b, x ) , which will
be negligible if the sample size is large.
Proof:
We have,
ylr  y  b ( X  x ) , so that
E ( ylr )  Y  E [b ( x  X )]  Y  [ E (b x )  X E (b)]  Y  Cov (b, x )
 E ( ylr )  Y  Cov (b, x )  B ( ylr ) .

Regression estimates in stratified sampling

As in case of ratio estimator, two types of regression estimators can be used in stratified
sampling.
Separate regression estimator

A separate regression estimator ylrs ( s for separate) in stratified sampling may be defined as
k k
ylrs  Wi [ yi  bi ( X i  xi )]  Wi ylr i ,
i 1 i 1
where ylr i  yi  bi ( X i  xi ) is the regression estimate for i  th stratum mean.
This estimate is appropriate when it is thought that the true regression coefficients Bi vary
from stratum to stratum.
Theorem: If sampling is independent in different strata and sample size is large enough in
each stratum, then ylrs is an almost unbiased estimator, and its approximate variance is given
by
k W 2 (1  f )
V ( ylrs )   i
ni
i (S 2  b 2 S 2  2 b
yi i xi i  i S yi S xi ) .
i 1
Corollary: If bi  Bi , the true regression coefficient in stratum i . The minimum value of

the variance may be written as
k W 2 (1  f ) k W 2 (1  f )  2 
S yxi
i  2 .
Vmin ( ylrs )   i i
S yi (1   i )  
2 2 i
S 
n n  yi S 2 
i 1 i i 1 i  xi 
Combined regression estimator

A combined regression estimator ylrc ( c for separate) in stratified sampling may be defined
as
ylrc  y st  bc ( X  xst ) ,
k k
where y st  Wi yi , x st  Wi xi are the stratified sample means of y and x variates,
i 1 i 1
and bc is the pooled estimate obtained by
k ni
1
bc 
k ni
  ( yij  yi ) ( xij  xi ) .
i 1 j 1
  ( xij  xi ) 2
i 1 j 1
Theorem: If sampling is independent in different strata and sample size is large enough in
each stratum, the variance of ylrc is given by
k W 2 (1  f )
V ( ylrc )   i i 2
( S yi  b 2 S xi
2
 2 b S yxi ) .
i 1
ni
92 RU Khan
Corollary: The value of b that minimizes this variance is obtained by minimizing

V ( ylrc ) with respect to b as
k W 2 (1  f ) k W 2 (1  f )

V ( ylrc )  0   i i 2
(2 b S xi  2 S yxi ) or  i i 2
(b S xi  S yxi )  0
b i 1
ni i 1
n i
k
Wi2 (1  f i )
k W 2 (1  f ) k W 2 (1  f )  n S yxi
b i  i i 1
2 i
 i
S xi i
S yxi , and b  Bc
k
n n Wi2 (1  f i ) 2
i 1 i 1
 n
i i
S xi
i 1 i
S yxi
The quantity Bc is a weighted mean of the stratum regression coefficients Bi  . If we
2
S xi
write
Wi2 (1  f i ) 2
ai  S xi , then Bc   ai Bi /  ai .
ni i i
k W 2 (1  f )
V ( ylrc )   i i 2
( S yi  Bc2 S xi
2
 2 Bc S yxi )
i 1
ni
k W 2 (1  f )   k W 2 (1  f ) S2
 i i 2 
S yi   ai  Bc2  2 Bc  i i
S yxi xi
ni   ni 2
i 1  i  i 1 S xi
k W 2 (1  f )   k
 i i 2 
S yi   ai  Bc2  2 Bc  ai Bi .
ni  
i 1  i  i 1
Comparison of separate and combined regression estimators

To make a comparative study, we can write
  k k W 2 (1  f ) 2
S yxi S2
V ( ylrc )  V ( ylrs )    ai  Bc2  2 Bc  ai Bi   i i
 xi
  ni 2 2
 i  i 1 i 1 S xi S xi
 
   ai  Bc2  2 Bc  ai Bi   ai Bi2   ai ( Bi  Bc ) 2 .
 
 i  i i i
This result shows that with the optimum choices the separate estimate has a smallest variance
than the combined estimate unless Bi the population regression coefficient is the same in all
strata. When the two estimates are equally efficient. These optimum choices would, of course,
2
require advance knowledge of the S yxi and S xi values.

Regression Estimator

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Estimator

Uploaded by

Copyright:

Available Formats

RATIO AND REGRESSION METHODS OF ESTIMATION

Ratio method of estimation

Rˆ  , YˆR  X  Rˆ X , and YˆR  X  Rˆ X , respectively.

As sampling is simple random, wor , then

Alternative expressions of V (Rˆ )

where, C yx is called relative covariance.

i) Interms of correlation coefficient

ii) Interms of Covariance

YˆR  Rˆ X  X is approximately unbiased for estimating the populations

Proof: For large n , we have, E ( Rˆ )  R , so that, E (YˆR )  E ( Rˆ X )  X E ( Rˆ )  X R  Y .

Alternative expressions of V (YˆR )

i) In terms of correlation coefficient

ii) In terms of covariance

Corollary: YˆR  Rˆ X  Rˆ N X is approximately unbiased for estimating the population

Proof: If n is large, E ( Rˆ )  R , then

Alternative expressions of V (YˆR )

i) In terms of correlation coefficient

ii) In terms of covariance

Similarly, we can see (CV ) 2 for Yˆ , and Yˆ .

V (Yˆ )  382500 , and SE (Yˆ )  V (Yˆ )  618 .

Comparison of ratio estimate with mean per unit estimate

Theorem: In simple random sampling, the bias of the ratio estimator R̂ is

Corollary: Prove that R̂ is an unbiased of R , if  ( Rˆ , x )  0 , where  stand for the

Proof: Consider the variate

Expanding by a Taylor’s series, we get

Ignoring the terms of the second and higher orders, we have

Unbiased ratio type estimators

Hartley-Ross unbiased ratio type estimator

Bias (r )  E (r  R)  E (ri  R) , as in srs E (r )  E (ri )

Optimum property of ratio estimator

ii) The variance of yi about this line is proportional to xi .

where ei are independent of the xi and xi  0 , if xi fixed, then E (ei )  E (ei | xi )  0 .

ii) V ( yi )  V ( B xi )  V (ei | xi )  Cov ( B xi , ei )

 V (ei | xi )  A xi , Covariance is zero, as B xi and ei are independent.

 V ( yi )  xi , if and only if V (ei | xi )  A xi .

Thus, the best linear unbiased estimator of B is given by

Y  B X , since b  Rˆ is best linear unbiased estimate of B .

Ratio estimators in stratified random sampling

Separate ratio estimator

Proof: By definition, the ratio estimator for the i  th stratum

Corollary: In stratified random sampling, wor , almost unbiased estimator of V (YˆR s ) is

where s yx i stands for the estimated covariance in the i  th stratum.

Combined ratio estimator

uij  yij  R xij , j  1, 2, , N i , then,

Corollary: In stratified random sampling, wor , almost unbiased estimator of V (YˆR c ) is

Comparison of separate and combined ratio estimators

Regression method of estimation

Therefore, from equation (4.5), we get

Alternative method: We know that

Now averaging over the sample values of ei ' s , we get

e  y  Y  B (x  X ) , and ylr  e  Y  B ( x  X )  b ( X  x ) , so that

As a sample estimate of V ( ylr ) , valid in large samples, we may use

According to standard regression theory, it is suggested that the divisor (n  2) be used

E ( ylr )  Y  E [b ( x  X )]  Y  [ E (b x )  X E (b)]  Y  Cov (b, x )

 E ( ylr )  Y  Cov (b, x )  B ( ylr ) .

Regression estimates in stratified sampling

Separate regression estimator

where ylr i  yi  bi ( X i  xi ) is the regression estimate for i  th stratum mean.

Corollary: If bi  Bi , the true regression coefficient in stratum i . The minimum value of

Combined regression estimator

Corollary: The value of b that minimizes this variance is obtained by minimizing

Comparison of separate and combined regression estimators

You might also like