Professional Documents
Culture Documents
Housila P. Singh
School of Studies in Statistics, Vikram University
Ujjain - 456 010 (M. P.), India
Florentin Smarandache
University of New Mexico, Gallup, USA
Published in:
M. Khoshnevisan, S. Saxena, H. P. Singh, S. Singh, F. Smarandache (Editors)
RANDOMNESS AND OPTIMAL ESTIMATION IN DATA SAMPLING
(second edition)
American Research Press, Rehoboth, USA, 2002
ISBN: 1-931233-68-3
pp. 44 - 61
Abstract
This paper proposes a family of estimators of population mean using information on several auxiliary
variables and analyzes its properties in the presence of measurement errors.
Keywords: Population mean, Study variate, Auxiliary variates, Bias, Mean squared error, Measurement
errors.
1. INTRODUCTION
The discrepancies between the values exactly obtained on the variables under consideration for sampled
units and the corresponding true values are termed as measurement errors. In general, standard theory of
survey sampling assumes that data collected through surveys are often assumed to be free of measurement
or response errors. In reality such a supposition does not hold true and the data may be contaminated with
measurement errors due to various reasons; see, e.g., Cochran (1963) and Sukhatme et al (1984).
One of the major sources of measurement errors in survey is the nature of variables. This may happen in
case of qualitative variables. Simple examples of such variables are intelligence, preference, specific
abilities, utility, aggressiveness, tastes, etc. In many sample surveys it is recognized that errors of
measurement can also arise from the person being interviewed, from the interviewer, from the supervisor or
leader of a team of interviewers, and from the processor who transmits the information from the recorded
interview on to the punched cards or tapes that will be analyzed, for instance, see Cochran (1968). Another
source of measurement error is when the variable is conceptually well defined but observations can be
obtained on some closely related substitutes termed as proxies or surrogates. Such a situation is
encountered when one needs to measure the economic status or the level of education of individuals, see
Salabh (1997) and Sud and Srivastava (2000). In presence of measurement errors, inferences may be
misleading, see Biemer et al (1991), Fuller (1995) and Manisha and Singh (2001).
Thereis today a great deal of research on measurement errors in surveys. An attempt has been made to
study the impact of measurement errors on a family of estimators of population mean using multiauxiliary
information.
44
2. THE SUGGESTED FAMILY OF ESTIMATORS
Let Y be the study variate and its population mean µ0 to be estimated using information on p(>1) auxiliary
( )
variates X1, X2, ...,Xp. Further, let the population mean row vector µ ′ = µ1 , µ 2 , " , µ p of the vector
~
X~ ′ = ( X 1 , X 2 , X p ) . Assume that a simple random sample of size n is drawn from a population, on the
study character Y and auxiliary characters X1, X2, ...,Xp. For the sake of simplicity we assume that the
y j = Yj + E j
xij = X ij + η ij , i = 1,2," , p;
j = 1,2, " , n.
where Yj and Xij are correct values of the characteristics Y and Xi (i=1,2,..., p; j=1,2,..., n).
For the sake of simplicity in exposition, we assume that the error Ej's are stochastic with mean 'zero' and
variance σ(0)2 and uncorrelated with Yj's. The errors ηij in xij are distributed independently of each other
and of the Xij with mean 'zero' and variance σ(i)2 (i=1,2,...,p). Also Ej's and ηij's are uncorrelated although
Define
45
xi
ui = , (i = 1,2," , p )
µi
u T = (u1 , u 2 , " u p )1× p , e T = (1,1, " ,1)1× p
1 n
y= ∑ yj
n j =1
1 n
xi = ∑ xij
n j =1
(
µˆ g = g y , u T )
(2.1)
(
where g y , u
T
) is a function of y, u , u ,", u
1 2 p such that
g (u
0 ,e
T
) = µ0
∂g (⋅)
⇒ =1
∂y (µ0 ,eT )
(
1. The function g y , u
T
) is continuous and bounded in Q.
2. The first and second order partial derivatives of the function g y , u ( T
) exist and are continuous and
bounded in Q.
∂g (⋅)
( )
µˆ g = g µ 0 , e T + ( y − µ 0 )
∂y (µ0 ,eT )
+ (u − e ) g (1) (µ0 ,eT )
T
( y − µ 0 )2 ∂ g2(⋅) T ∂g (⋅)
(1)
{
2
1
+ + 2( y − µ 0 )(u − e )
2 ∂y ( y * ,u *T ) ∂y ( y * ,u 7 T )
G
(
+ (u − e ) g (2 ) y*, u *T (u − e )
T
) }
(2.2)
46
where
denote the p element column vector of first partial derivatives of g(⋅) and g(2)(⋅) denotes a p×p matrix of
E (µˆ g ) = µ 0 + O(n −1 )
(2.3)
which follows that the bias of µ̂ g is of the order of n-1, and hence its contribution to the mean squared
{
MSE (µˆ g ) = E ( y − µ 0 ) + (u − e ) g (1) (µ0 ,eT )
T
}2
(
+ g (1) (µ0 ,eT ) ) (u − e)(u − e) (g ( ) (
T T 1
µ 0 , eT ]
))
=
n
[
1 2 2
µ 0 (C 0 + C (20 ) ) + 2µ 0 b T g (1) (µ0 ,eT ) + (g (1) (µ0 ,eT ) ) A(g (1) (µ0 ,eT ) )
T
]
(2.4)
(2.5)
47
Thus the resulting minimum MSE of µ̂ g is given by
( )[
min.MSE (µˆ g ) = µ 02 / n C 02 + C (20 ) − b T A −1b ]
(2.6)
( )[
MSE (µˆ g ) ≥ µ 02 / n C 02 + C (20 ) − b T A −1b ]
(2.7)
It is to be mentioned that the family of estimators µ̂ g at (2.1) is very large. The following estimators:
(1)
p
µ p
µˆ g = y ∑ ω i i ; ∑ω i = 1,
i =1 xi i =1
[Olkin (1958)]
(2 )
p
x p
µˆ g = y ∑ ω i i , ∑ω i − 1, [Singh (1967)]
i =1 µi i =1
(3 )
∑ω µ i i p
µˆ g =y i =1
p
, ∑ω i = 1, [Shukla (1966) and John (1969)]
∑ω x
i =1
i i
i =1
(4 )
∑ω x i i p
µˆ g =y i =1
p
; ∑ω i = 1, [Sahai et al (1980)]
∑ω µ
i =1
i i
i =1
ωi
(5 )
p
µ p
µˆ g = y ∏ i
i =1 x i
, ∑ω i = 1, [Mohanty and Pattanaik (1984)]
i =1
−1
(6 ) p ωx p
µˆ g = y ∑ i i , ∑ω i = 1, [Mohanty and Pattanaik (1984)]
i =1 µ i i =1
48
ωi
(7 )
p
x p
µˆ g = y ∏ i
i =1 µ i
, ∑ω i = 1, [Tuteja and Bahl (1991)]
i =1
−1
(8 ) p ωµ p
µˆ g = y ∑ i i , ∑ω i = 1, [Tuteja and Bahl (1991)]
i =1 xi i =1
(9 ) p
µ p +1
µˆ g = y ω p +1 + ∑ ω i i , ∑ω i = 1.
i =1 xi i =1
(10 ) p
x p +1
µˆ g = y ω p +1 + ∑ ω i i , ∑ω i = 1.
i =1 µi i =1
=1
(11) q µ p
xˆ q p
µˆ g = y ∑ ω i i + ∑ i ; ∑ ω i + ∑ ω i ; [Srivastava (1965) and Rao
i =1 xi i =q +1 µ i i =1 i = q +1
αi
p
x
µˆ g
(12 )
= y ∏ i (α i ' s are suitably constants ) [Srivastava (1967)]
i =1 µ i
(13 )
p x
αi
µˆ g = y ∏ 2 − i [Sahai and Rey (1980)]
i =1 µ
i
p
(14 ) xi
µˆ g = y ∏ [Walsh (1970)]
i =1 {µ i + α i ( xi − µ i )}
(15 ) p
µˆ g = y exp ∑θ i log u i [Srivastava (1971)]
i =1
p
µ g = y exp ∑θ i (u i − 1) [Srivastava (1971)]
(16 )
ˆ
i =1
p p
µˆ g = y ∑ ω i exp {(θ i / ω i ) log u i };
(17 )
i =1
∑ω
i =1
i = 1, [Srivastava (1971)]
p
µˆ g = y + ∑ α i ( xi − µ i )
(18 )
i =1
49
etc. may be identified as particular members of the suggested family of estimators µ̂ g . The MSE of these
( )(
V ( y ) = µ 02 / n C 02 + C (20 ) )
(2.8)
It follows from (2.6) and (2.8) that the minimum variance of µ̂ g is no longer than conventional unbiased
estimator y .
On substituting σ(0)2=0, σ(i)2=0 ∀i=1,2,…,p in the equation (2.4), we obtain the no-measurement error case.
MSE (µˆ g ) =
n
[
1 2 2
( ) (
C 0 µ 0 + 2 µ 0 b T g *(1) (µ0 ,eT ) + g *(1) (µ0 ,eT ) A * g *(1) (µ0 ,eT )
T
)]
= MSE (µˆ g *)
(2.9)
where
X X Xp
µˆ g = g * Y , 1 , 2 , ",
µ µ µp
1 2
(
= g * Y ,U T )
(2.10)
and Y and X i (i = 1,2, " , p ) are the sample means of the characteristics Y and Xi based on true
(2.11)
50
2
min.MSE (µˆ g *) =
µ0
n
[
C 02 − b T A *−1 b ]
2
σ0
=
n
1− R2 ( )
(2.12)
where A*=[a*ij] be a p×p matrix with a*ij = ρijCiCj and R stands for the multiple correlation coefficient of
Y on X1,X2,…,Xp.
From (2.6) and (2.12) the increase in minimum MSE (µ̂ ) due to measurement errors is
g
obtained as
µ 02 2
min.MSE (µˆ g ) − min.MSE (µˆ g *) = [
C (0 ) + b T A *−1 b − b T A −1b ]
n
>0
This is due to the fact that the measurement errors introduce the variances fallible measurements of study
variate Y and auxiliary variates Xi. Hence there is a need to take the contribution of measurement errors
into account.
To obtain the bias of the estimator µ̂ g , we further assume that the third partial derivatives of g ( y , u T )
∂g (⋅)
( )
µˆ g = g µ 0 , e T + ( y − µ 0 )
∂y (µ0 ,eT )
+ (u − e ) g (1) (µ0 ,eT )
T
{( y − µ 0 )2 ∂ g2(⋅)
2
1
+ 2( y − µ 0 )(u − e ) g (1) (µ0 ,eT )
T
+
2 ∂y (µ T
)
0 ,u
(
+ (u − e ) g (2 ) (µ0 ,eT ) (u − e )
T
) }
51
3
1 ∂ ∂
+ ( y − µ 0 ) + (u − e ) g y * , u *T
6 ∂y ∂u
( )
(3.1)
Noting that
(
g u0 eT = µ 0 )
∂g (⋅)
=1
∂y (µ0 ,eT )
∂ 2 g (⋅)
=0
∂y 2 (µ T
)
0 ,e
and taking expectation we obtain the bias of the family of estimators µ̂ g to the first degree of
approximation,
B (µˆ g ) =
1
2
{ T
( ) µ
E (u − e ) g (2 ) (µ0 ,eT ) (u − e ) + 2 0 } T (12 ) T
b g (µ0 ,e )
n
(3.2)
where bT=(b1,b2,…,bp) with bi=ρoiC0Ci; (i=1,2,…, p). Thus we see that the bias of µ̂ g depends also upon
(i )
The biases and mean square errors of the estimators µˆ g ; i = 1 to 18 up to terms of order n-1 along with
the values of g(1)(µ0,eT), g(2)(µ0,eT) and g(12)(µ0,eT) are given in the Table 3.1.
52
Table 3.1 Biases and mean squared errors of various estimators of µ0
− µ 0 ω~ 2µ 0 W~ p× p − ω~ µ0 T
C W~
n
− b T ω
~
µ02
n
[
2
C 0 + C (2o ) − 2b T ω + ω T A ω
]
µˆ g (1) × ~ ~ ~
p p
µ0 ω * 2 µ 0 ω * ω *T ω* b T ω * 2b T ω * ω * A ω *
T
ω~ * Aω~
T
− ~ ~ ~
− ~
µ0 µ 02 2 ~
T − T~ C 0 + C (20 ) − ~
+ ~T
µˆ g (3) ωT µ ω µ T
~
ωT µ µ ω T
~ n ~ω µ µ T
ω ω µ n ω T
µ ω µ µ T
ω
~ ~
~ ~ ~ ~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
where ω~ *T =
(ω1,*ω2,*...,ωp*) with
(ωi,*=ωi µi)
(i=1,2,...,p)
µ0 ω O ω µ0 b ω~
T
2b T ω ωT Aω
~ ~ p× p ~
T µ 02 2
C 0 + C (0 ) + T ~ + ~T
2 ~
µˆ g ( 4 ) ωT µ ω µ ω~ µ
T
~
(null matrix) ~
n n ω µ ω µ µ T
ω
~ ~ ~
~ ~ ~ ~ ~ ~
(5 )
− µ0 ω
~ µ 0 ω ω T + W~ p× p
~ ~
−ω
~
µ0
2n
ω T A ω + C T W − 2b T ω
~ ~
~
p× p ~
µ02
n
[
2
C 0 + C (2o ) − 2b T ω + ω T A ω
~ ~ ~
]
µˆ g
µˆ g (6 ) − µ0 ω 2µ 0 ω ω −ω µ0 T
[ ] µ02
[
2
]
T
~ ~ ~ ~ ω Aω − b T ω C 0 + C (2o ) − 2b T ω + ω T A ω
n ~ ~ ~ n ~ ~ ~
µˆ g (7 ) µ0 ω
~ µ 0 ω ω T − W~ p× p
~ ~
ω
~
µ0
2n
ω T A ω − C T W + 2b T ω
~ ~
~
p× p ~
µ02
n
[
2
C 0 + C (2o ) + 2b T ω + ω T A ω
~ ~ ~
]
µˆ g (8 ) µ0 ω
~ 2 µ 0 ω ω − W~ p× p
~ ~
T
ω
~
µ0 T
ω A ω − C T W~
n ~ ~ p× p
+ bT ω
~
µ02
n
[
2
C 0 + C (2o ) + 2b T ω + ω T A ω
~ ~ ~
]
54
Table 3.1 Biases and mean squared errors of various estimators of µ0
µˆ g (9 ) − µ0 ω
~
2 µ 0 W~ p× p −ω
~
µ0 T
C W~
n p × p
− b T ω
~
µ 02
n [
2
C 0 + C (20 ) + 2b T ω + ω T A ω
~ ~ ~
]
µˆ g (10 ) µ0 ω
~
O
~
ω
~
µ0 T
b ω
n ~
µ 02
[
2
C 0 + C (20 ) + 2b T ω + ω T A ω
]
n ~ ~ ~
ω (1) µ 0 2W µ0 ω (1)
~
~ (1) p× p ~
µˆ g (11) µ0 T µ02 2
where ω C * W~ − b T ω C 0 + C (2o ) − 2b T ω + ω T (1) A ω
(1) =(-ω1,-ω2,..., -ωq, -ωq+1,...,ωp)1×p n (1) ~ (1) n ~ (1) ~ ~ (1)
~
ω 1 0 0 " 0 0 " 0
0 ω
2 0 " 0 0 " 0
0 0 ω3 " 0 0 " 0
# # # " # # " #
W (1) p× p = ,
~ 0 0 0 " ωq 0 " 0
0 0 0 " 0 0 " 0
# # # " # # " # C*T=(C12+C(1)2,…,
0 0 0 " 0 0 " 0 p× p
Cq2+C(q)2;…0)
55
α µ0
~
µ 0 α α T − ∝~ p× p
~ ~
−α
~
µ0
2n
α T Aα − C T ∝ + 2b T α
~ ~
~
~
µ02
n
[
2
C 0 + C (2o ) + 2b T α + α T A α ]
µˆ g (12 )
p× p ~ ~ ~
where α T
=(α1,α2,...,αp)1×p ∝ =diag(α1,α2,...,αp)
~ ~
µˆ g (13) − α µ0
~ − µ 0 α α − ∝
~ ~
T
~ p× p
−α
~
µ0 T
C ∝
2
n ~
p× p
− α T Aα − 2b T α
~ ~ ~
µ02
n
[
2
C 0 + C (2o ) − 2b T α + α T A α
~ ~ ~
]
µˆ g (14 ) − α µ0
~
2µ 0 α α
~ ~
−α
~
µ0 T
n ~
[
α Aα − C T − b T α
~ ~
] µ02
n
[
2
C 0 + C (2o ) − 2b T α + α T A α
~ ~ ~
]
56
Table 3.1 Biases and mean squared errors of various estimators of µ0
µ0 θ , µ 0 θ θ T − Θ
~ p× p , θ
~ ~ ~ v0 ~
[ ]
(15 )
µˆ g µ0 T µ 02 2
θ Aθ − C T Θ + 2b T θ C 0 + C (20 ) + 2b T θ + θ T A θ
where Θ =diag{θ1,θ2,… θp} 2n ~ ~
~
p × p ~ n ~ ~ ~
~ p× p
µˆ g (16 ) µ0 θ µ0 θ θ θ
[
µ0 T
] µ 02
[
2
]
T
~ ~ ~ ~ θ Aθ + 2b T θ
n C 0 + C (20 ) + 2b T θ + θ T A θ
2n ~ ~ ~
~ ~ ~
µˆ g (17 ) µ0 θ
~
~ * p× p µ 0 ,
Θ θ
~
2
n
[
µ0 T
C Θ ~ * p× p +2b θ
T
~
] µ02
n
[
2
C 0 + C (2o ) + 2b T θ + θ T A θ
~ ~ ~
]
θ θ
where Θ* p× p =diag{θ1 1 −1 …,θp p −1 }
~ ω ω
1 p
α* O O Unbiased 1 2
C 0 + C ( o ) + 2 µ 0 b α *+ α * A α *
~ p× p ~ p× p 2 T T
~
n
~ ~
µˆ g (18 )
~
57
4. ESTIMATORS BASED ON ESTIMATED OPTIMUM
It may be noted that the minimum MSE (2.6) is obtained only when the optimum values of constants
involved in the estimator, which are functions of the unknown population parameters µ0, b and A, are
To use such estimators in practice, one has to use some guessed values of the parameters µ0, b and A, either
through past experience or through a pilot sample survey. Das and Tripathi (1978, sec.3) have illustrated
that even if the values of the parameters used in the estimator are not exactly equal to their optimum values
as given by (2.5) but are close enough, the resulting estimator will be better than the conventional unbiased
estimator y . For further discussion on this issue, the reader is referred to Murthy (1967), Reddy (1973),
On the other hand if the experimenter is unable to guess the values of population parameters due to lack of
experience, it is advisable to replace the unknown population parameters by their consistent estimators. Let
optimum µ̂ g resulting in the estimator µ̂ g (est ) , say, which will now be a function of y , u and φ. Thus we
(
µˆ g (est ) = g * * y , u T , φˆ T )
(4.1)
(
where g**(⋅) is a function of y , u , φˆ
T T
) such that
( )
g * * µ 0 , e T , φ T = µ 0 for all µ 0 ,
∂g * *(⋅)
⇒ =1
∂y (µ T
0 , e ,φ
T
)
∂g * *(⋅) ∂g (⋅)
= = − µ 0 A −1b = − µ 0φ
∂u (µ0 ,eT φ T ) ∂u ( µ0 ,e )T
(4.2)
and
∂g * *(⋅)
=0
∂φˆ (µ 0 ,e
T
,φ T
)
With these conditions and following Srivastava and Jhajj (1983), it can be shown to the first degree of
approximation that
Thus if the optimum values of constants involved in the estimator are replaced by their consistent
estimators and conditions (4.2) hold true, the resulting estimator µ̂ g (est ) will have the same asymptotic
mean square error, as that of optimum µ̂ g . Our work needs to be extended and future research will explore
REFERENCES
Biermer, P.P., Groves, R.M., Lyberg, L.E., Mathiowetz, N.A. and Sudman, S. (1991): Measurement errors
Das, A.K. and Tripathi, T.P. (1978): Use of auxiliary information in estimating the finite population
Fuller, W.A. (1995): Estimation in the presence of measurement error. International Statistical Review, 63,
2, 121-147.
Manisha and Singh, R.K. (2001): An estimation of population mean in the presence of measurement errors.
59
Mohanty, S. and Pattanaik, L.M. (1984): Alternative multivariate ratio estimators using geometric and
Murthy, M.N. (1967): Sampling Theory and Methods, Statistical Publishing Society, Calcutta.
Olkin, I. (1958): Multivariate ratio estimation for finite population. Biometrika, 45, 154-165.
Rao, P.S.R.S. and Mudholkar, G.S. (1967): Generalized multivariate estimators for the mean of a finite
Reddy, V.N. and Rao, T.J. (1977): Modified PPS method of estimation, Sankhya, C, 39, 185-197.
Reddy, V.N. (1973): On ratio and product methods of estimation. Sankhya, B, 35, 307-316.
Salabh (1997): Ratio method of estimation in the presence of measurement error, Jour. Ind. Soc. Agri.
Sahai, A. and Ray, S.K. (1980): An efficient estimator using auxiliary information. Metrika, 27, 271-275.
Sahai, A., Chander, R. and Mathur, A.K. (1980): An alternative multivariate product estimator. Jour. Ind.
Sahai, A. and Sahai, A. (1985): On efficient use of auxiliary information. Jour. Statist. Plann. Inference, 12,
203-212.
Shukla, G. K. (1966): An alternative multivariate ratio estimate for finite population. Cal. Statist. Assoc.
Singh, M. P. (1967): Multivariate product method of estimation for finite population. Jour. Ind. Soc. Agri.
Srivastava, S.K. (1965): An estimator of the mean of a finite population using several auxiliary characters.
Srivastava, S.K. (1967): An estimator using auxiliary information in sample surveys. Cal. Statist. Assoc.
Srivastava, S.K. (1971): A generalized estimator for the mean of a finite population using multiauxiliary
Srivastava, S.K. (1980): A class of estimators using auxiliary information in sample surveys. Canad. Jour.
Statist., 8, 253-254.
60
Srivastava, S.K. and Jhajj, H.S. (1983): A class of estimators of the population mean using multi-auxiliary
Srivenkataramana, T. and Tracy, D.S. (1984):: Positive and negative valued auxiliary variates in Surveys.
Sud, U.C. and Srivastava, S.K. (2000): Estimation of population mean in repeat surveys in the presence of
Sukhatme, P.V., Sukhatme, B.V., Sukhatme, S. and Ashok, C. (1984): Sampling theory of surveys with
Tuteja, R.K. and Bahl, Shashi (1991): Multivariate product estimators. Cal. Statist. Assoc. Bull., 42, 109-
115.
Tankou, V. and Dharmadlikari, S. (1989): Improvement of ratio-type estimators. Biom. Jour. 31 (7), 795-
802.
Walsh, J.E. (1970): Generalization of ratio estimate for population total. Sankhya, A, 32, 99-106.
61