Professional Documents
Culture Documents
Mse
Mse
= r(X
1
, , X
n
). There are several method to obtain an estimator for , such as the MLE,
method of moment, and Bayesian method.
A diculty that arises is that since we can usually apply more than one of these methods
in a particular situation, we are often face with the task of choosing between estimators. Of
course, it is possible that dierent methods of nding estimators will yield the same answer
(as we have see in the MLE handout), which makes the evaluation a bit easier, but, in many
cases, dierent methods will lead to dierent estimators. We need, therefore, some criteria
to choose among them.
We will study several measures of the quality of an estimator, so that we can choose the
best. Some of these measures tell us the quality of the estimator with small samples, while
other measures tell us the quality of the estimator with large samples. The latter are also
known as asymptotic properties of estimators.
1 Mean Square Error (MSE) of an Estimator
Let
be the estimator of the unknown parameter from the random sample X
1
, X
2
, , X
n
.
Then clearly the deviation from
to the true value of , |
)
2
for the ease of computation. Since
is a
random variable, we should take average to evaluation the quality of the estimator. Thus,
we introduce the following
Denition: The mean square error (MSE) of an estimator
of a parameter is the function
of dened by E(
)
2
, and this is denoted as MSE
.
This is also called the risk function of an estimator, with (
)
2
called the quadratic loss
function. The expectation is with respect to the random variables X
1
, , X
n
since they are
the only random components in the expression.
Notice that the MSE measures the average squared dierence between the estimator
and
the parameter , a somewhat reasonable measure of performance for an estimator. In general,
any increasing function of the absolute distance |
= E(
)
2
= V ar(
) + (E(
) )
2
= V ar(
) + (Bias of
)
2
This is so because
E(
)
2
= E(
2
) +E(
2
) 2E(
)
= V ar(
) + [E(
)]
2
+
2
2E(
)
= V ar(
) + [E(
) ]
2
Denition: The bias of an estimator
of a parameter is the dierence between the
expected value of
and ; that is, Bias(
) = E(
) = for all .
Thus, MSE has two components, one measures the variability of the estimator (precision)
and the other measures the its bias (accuracy). An estimator that has good MSE properties
has small combined variance and bias. To nd an estimator with good MSE properties, we
need to nd estimators that control both variance and bias.
For an unbiased estimator
, we have
MSE
= E(
)
2
= V ar(
)
and so, if an estimator is unbiased, its MSE is equal to its variance.
Example 1: Suppose X
1
, X
2
, , X
n
are i.i.d. random variables with density function
f(x|) =
1
2
exp
_
|x|
_
, the maximum likelihood estimator for
=
n
i=1
|X
i
|
n
is unbiased.
Solution: Let us rst calculate E(|X|) and E(|X|
2
) as
E(|X|) =
_
|x|f(x|)dx =
_
|x|
1
2
exp
_
|x|
_
dx
=
_
0
x
exp
_
_
d
x
=
_
0
ye
y
dy = (2) =
and
E(|X|
2
) =
_
|x|
2
f(x|)dx =
_
|x|
2
1
2
exp
_
|x|
_
dx
=
2
_
0
x
2
2
exp
_
_
d
x
=
2
_
0
y
2
e
y
dy = (3) = 2
2
3
Therefore,
E( ) = E
_
|X
1
| + +|X
n
|
n
_
=
E(|X
1
|) + +E(|X
n
|)
n
=
So is an unbiased estimator for .
Thus the MSE of is equal to its variance, i.e.
MSE
= E( )
2
= V ar( ) = V ar
_
|X
1
| + +|X
n
|
n
_
=
V ar(|X
1
|) + +V ar(|X
n
|)
n
2
=
V ar(|X|)
n
=
E(|X|
2
) (E(|X|))
2
n
=
2
2
2
n
=
2
n
The Statistic S
2
: Recall that if X
1
, , X
n
come from a normal distribution with variance
2
, then the sample variance S
2
is dened as
S
2
=
n
i=1
(X
i
X)
2
n 1
It can be shown that
(n1)S
2
2
2
n1
. From the properties of
2
distribution, we have
E
_
(n 1)S
2
2
_
= n 1 E(S
2
) =
2
and
V ar
_
(n 1)S
2
2
_
= 2(n 1) V ar(S
2
) =
2
4
n 1
Example 2: Let X
1
, X
2
, , X
n
be i.i.d. from N(,
2
) with expected value and variance
2
, then
X is an unbiased estimator for , and S
2
is an unbiased estimator for
2
.
Solution: We have
E(
X) = E
_
X
1
+ +X
n
n
_
=
E(X
1
) + +E(X
n
)
n
=
Therefore,
X is an unbiased estimator. The MSE of
X is
MSE
X
= E(
X )
2
= V ar(
X) =
2
n
This is because
V ar(
X) = V ar
_
X
1
+ +X
n
n
_
=
V ar(X
1
) + +V ar(X
n
)
n
2
=
2
n
4
Similarly, as we showed above, E(S
2
) =
2
, S
2
is an unbiased estimator for
2
, and the MSE
of S
2
is given by
MSE
S
2 = E(S
2
2
) = V ar(S
2
) =
2
4
n 1
.
Although many unbiased estimators are also reasonable from the standpoint of MSE, be
aware that controlling bias does not guarantee that MSE is controlled. In particular, it is
sometimes the case that a trade-o occurs between variance and bias in such a way that
a small increase in bias can be traded for a larger decrease in variance, resulting in an
improvement in MSE.
Example 3: An alternative estimator for
2
of a normal population is the maximum likeli-
hood or method of moment estimator
2
=
1
n
n
i=1
(X
i
X)
2
=
n 1
n
S
2
It is straightforward to calculate
E(
2
) = E
_
n 1
n
S
2
_
=
n 1
n
2
so
2
is a biased estimator for
2
. The variance of
2
can also be calculated as
V ar(
2
) = V ar
_
n 1
n
S
2
_
=
(n 1)
2
n
2
V ar(S
2
) =
(n 1)
2
n
2
2
4
n 1
=
2(n 1)
4
n
2
.
Hence the MSE of
2
is given by
E(
2
)
2
= V ar(
2
) + (Bias)
2
=
2(n 1)
4
n
2
+
_
n 1
n
2
2
_
2
=
2n 1
n
2
4
We thus have (using the conclusion from Example 2)
MSE
2
=
2n 1
n
2
4
<
2n
n
2
4
=
2
4
n
<
2
4
n 1
= MSE
S
2.
This shows that
2
has smaller MSE than S
2
. Thus, by trading o variance for bias, the
MSE is improved.
The above example does not imply that S
2
should be abandoned as an estimator of
2
. The
above argument shows that, on average,
2
will be closer to
2
than S
2
if MSE is used as a
measure. However,
2
is biased and will, on the average, underestimate
2
. This fact alone
may make us uncomfortable about using
2
as an estimator for
2
.
In general, since MSE is a function of the parameter, there will not be one best estimator
in terms of MSE. Often, the MSE of two estimators will cross each other, that is, for some
5
parameter values, one is better, for other values, the other is better. However, even this
partial information can sometimes provide guidelines for choosing between estimators.
One way to make the problem of nding a best estimator tractable is to limit the class of
estimators. A popular way of restricting the class of estimators, is to consider only unbiased
estimators and choose the estimator with the lowest variance.
If
1
and
2
are both unbiased estimators of a parameter , that is, E(
1
) = and E(
2
) = ,
then their mean squared errors are equal to their variances, so we should choose the estimator
with the smallest variance.
A property of Unbiased estimator: Suppose both A and B are unbiased estimator for
an unknown parameter , then the linear combination of A and B: W = aA+(1 a)B, for
any a is also an unbiased estimator.
Example 4: This problem is connected with the estimation of the variance of a normal
distribution with unknown mean from a sample X
1
, X
2
, , X
n
of i.i.d. normal random
variables. For what value of does
n
i=1
(X
i
X)
2
have the minimal MSE?
Please note that if =
1
n1
, we get S
2
in example 2; when =
1
n
, we get
2
in example 3.
Solution:
As in above examples, we dene
S
2
=
n
i=1
(X
i
X)
2
n 1
Then,
E(S
2
) =
2
and Var(S
2
) =
2
4
n 1
Let
e
=
n
i=1
(X
i
X)
2
= (n 1)S
2
and let t = (n 1) Then
E(e
) = (n 1)E(S
2
) = (n 1)
2
= t
2
and
V ar(e
) =
2
(n 1)
2
V ar(S
2
) =
2t
2
n 1
4
We can Calculate the MSE of e
as
MSE(e
) = V ar(e
) + [Bias]
2
= V ar(e
) +
_
E(e
)
2
_
2
= V ar(e
) + (t
2
2
)
2
= V ar(e
) + (t 1)
2
4
.
6
Plug in the results before, we have
MSE(e
) =
2t
2
n 1
4
+ (t 1)
2
4
= f(t)
4
where
f(t) =
2t
2
n 1
+ (t 1)
2
=
_
n + 1
n 1
t
2
2t + 1
_
when t =
n1
n+1
, f(t) achieves its minimal value, which is
2
n+1
. That is the minimal value of
MSE(e
) =
2
4
n+1
, with (n 1) = t =
n1
n+1
, i.e. =
1
n+1
.
From the conclusion in example 3, we have
MSE
2
=
2n 1
n
2
4
<
2
4
n 1
= MSE
S
2.
It is straightforward to verify that
MSE
2
=
2n 1
n
2
4
2
4
n + 1
= MSE(e
)
when =
1
n+1
.
2 Eciency of an Estimator
As we pointed out earlier, Fisher information can be used to bound the variance of an
estimator. In this section, we will dene some quantity measures for an estimator using
Fisher information.
2.1 Ecient Estimator
Suppose
= r(X
1
, , X
n
) is an estimator for , and suppose E(
) = m(), a function of ,
then T is an unbiased estimator of m(). By information inequality,
Var(
)
[m
()]
2
nI()
when the equality holds, the estimator
is said to be an ecient estimator of its expectation
m(). Of course, if m() = , then T is an unbiased estimator for .
Example 5: Suppose that X
1
, , X
n
form a random sample from a Bernoulli distribution
for which the parameter p is unknown. Show that
X is an ecient estimator of p.
7
Proof: If X
1
, , X
n
Bernoulli(p), then E(
X) = p, and Var(
X) = p(1p)/n. By example
3 from the sher information lecture note, the sher information is I(p) = 1/[p(1 p)].
Therefore the variance of
X is equal to the lower bound 1/[nI(p)] provided by the information
inequality, and
X is an ecient estimator of p.
Recall that in the proof of information inequality, we used the Cauchy-Schwartz inequality,
_
Cov
, l
n
(X|)]
_
2
Var
]Var
[l
n
(X|)].
From the proof procedure, we know that if the equality holds in Cauchy-Schwartz inequality,
then the equality will hold in information inequality. We know that if and only if there
is a linear relation between
and l
n
(X|), the Cauchy-Schwartz inequality will become an
equality, and hence the information inequality will become an equality. In other words,
will be an ecient estimator if and only if there exist functions u() and v() such that
= u()l
n
(X|) +v().
The functions u() and v() may depend on but not depend on the observations X
1
, , X
n
.
Because
is an estimator, it cannot involve the parameter . Therefore, in order for
to
be ecient, it must be possible to nd functions u() and v() such that the parameter
will actually be canceled from the right side of the above equation, and the value of
will
depend on the observations X
1
, , X
n
and not on .
Example 6: Suppose that X
1
, , X
n
form a random sample from a Poisson distribution
for which the parameter is unknown. Show that
X is an ecient estimator of .
Proof: The joint p.m.f. of X
1
, , X
n
is
f
n
(x|) =
n
i=1
f(x
i
|) =
e
n
n x
n
i=1
x
i
!
.
Then
l
n
(X|) = n +n
X log
n
i=1
log(X
i
!),
and
l
n
(X|) = n +
n
X
.
If we now let u() = /n and v() = , then
X = u()l
n
(X|) +v().
Since the statistic
X has been represented as a linear function of l
n
(X|), it follows that
X
will be an ecient estimator of its expectation . In other words, the variance of
X will
attain the lower bound given by the information inequality.
8
Suppose
is an ecient estimator for its expectation E(
()]
2
nI()
.
We also have
Var(T) = Var(a
+b) = a
2
Var(
) = a
2
[m
()]
2
nI()
,
since
is an ecient estimator for m(), Var(
i=1
log f(X
i
|) =
n
i=1
[c()T(X
i
) +d() +S(X
i
)] = c()
n
i=1
T(X
i
)+nd()+
n
i=1
S(X
i
),
and
l
n
(X|) = c
()
n
i=1
T(X
i
) +nd
().
Therefore, there is a linear relation between
n
i=1
T(X
i
) and l
n
(X|):
n
i=1
T(X
i
) =
1
c
()
l
n
(X|)
nd
()
c
()
.
Thus, the sucient statistic
n
i=1
T(X
i
) is an ecient estimator of its expectation. Any linear
function of
n
i=1
T(X
i
) is a sucient statistic and is an ecient estimator of its expectation.
Specically, if the MLE of is a linear function of sucient statistic, then MLE is ecient
estimator of .
Example 7. Suppose that X
1
, , X
n
form a random sample from a normal distribution for
which the mean is known and the variance
2
is unknown. Construct an ecient estimator
for
2
.
9
Solution: Let =
2
be the unknown variance. Then the p.d.f. is
f(x|) =
1
2
exp
_
1
2
(x )
2
_
,
which can be recognized as a member of exponential family with T(x) = (x )
2
. So
n
i=1
(X
i
)
2
is an ecient estimator for its expectation. Since E[(X
i
)
2
] =
2
,
E[
n
i=1
(X
i
)
2
] = n
2
. Therefore,
n
i=1
(X
i
)
2
/n is an ecient estimator for
2
.
2.2 Eciency and Relative Eciency
For an estimator
, if E(
) = m(), then the ratio between the CR lower bound and Var(
)
is called the eciency of the estimator
, denoted as e(
), i.e.
e(
) =
[m
()]
2
/[nI()]
Var(
)
.
By the information inequality, we have e(
) = Var(
) + (E(
) )
2
If the estimator is unbiased, then MSE(
) = Var(
relative to
is dened as
e(
,
) =
Var(
)
Var(
)
.
Thus, if the eciency is smaller than 1,
has a larger variance than
has. This comparison
is most meaningful when both
and
are unbiased or when both have the same bias.
Frequently, the variances of
and
are of the form
var(
) =
c
1
n
and var(
) =
c
2
n
where n is the sample size. If this is the case, the eciency can be interpreted as the ratio
of sample sizes necessary to obtain the same variance for both
and
.
10
Example 8: Let Y
1
, , Y
n
denote a random sample from the uniform distribution on the
interval (0, ). Consider two estimators,
1
= 2
Y and
2
=
n + 1
n
Y
(n)
,
where Y
(n)
= max(Y
1
, , Y
n
). Find the eciency of
1
relative to
2
.
Solution: Because each Y
i
follows a uniform distribution on the interval (0, ), = E(Y
i
) =
/2, and
2
= Var(Y
i
) =
2
/12. Therefore,
E(
1
) = E(2
Y ) = 2E(
Y ) = 2 = ,
so
1
is unbiased. Furthermore
Var(
1
) = Var(2
Y ) = 4Var(
Y ) = 4
2
n
=
2
3n
.
To nd the mean and variance of
2
, recall that the density function of Y
(n)
is given by
g
(n)
(y) = n[F
Y
(y)]
n1
f
Y
(y) =
_
_
_
n
_
y
_
n1
1
for 0 y
0 otherwise
Thus,
E(Y
(n)
) =
n
n
_
0
y
n
dy =
n
n + 1
,
it follows that E(
2
) = E{[(n + 1)/n]Y
(n)
} = , i.e.
2
is an unbiased estimator for .
E(Y
2
(n)
) =
n
n
_
0
y
n+1
dy =
n
n + 2
2
,
therefore, the variance of Y
(n)
is
Var(Y
(n)
) = E(Y
2
(n)
) E(Y
(n)
)
2
=
n
n + 2
_
n
n + 1
_
2
.
Thus,
Var(
2
) = Var
_
n + 1
n
Y
(n)
_
=
_
n + 1
n
_
2
_
n
n + 2
_
n
n + 1
_
2
_
=
2
n(n + 2)
.
Finally, the eciency of
1
relative to
2
is given by
e(
1
,
2
) =
Var(
2
)
Var(
1
)
=
2
/[n(n + 2)]
2
/(3n)
=
3
n + 2
.
11
3 Exercises
Exercise 1. X, the cosine of the angle at which electrons are emitted in muon decay has a
density
f(x) =
1 +X
2
1 x 1 1 1
The parameter is related to polarization. Show that E(X) =
3
. Consider an estimator
for the parameter , = 3
X. Computer the variance, the bias, and the mean square error
of this estimator.
Exercise 2. Suppose that X
1
, , X
n
form a random sample from a normal distribution
for which the mean is unknown and the variance
2
is known. Show that
X is an ecient
estimator of .
Exercise 3. Suppose that X
1
, , X
n
form a random sample of size n from a Poisson
distribution with mean . Consider
1
= (X
1
+ X
2
)/2 and
2
=
X. Find the eciency of
1
relative to
2
.
Exercise 4. Suppose that Y
1
, , Y
n
denote a random sample of size n form an exponential
distribution with density function given by
f(y) =
_
1
e
y/
fory > 0
0 otherwise
Consider two estimators
1
= nY
(1)
, and
2
=
Y , where Y
(1)
= min(Y
1
, , Y
n
). Please show
that both
1
and
2
are unbiased estimator of , nd their MSE, and nd the eciency of
1
relative to
2
.