Mse

Math 541: Statistical Theory II
Methods of Evaluating Estimators

Instructor: Songfeng Zheng
Let X
1
, X
2
, , X
n
be n i.i.d. random variables, i.e., a random sample from f(x|), where
is unknown. An estimator of is a function of (only) the n random variables, i.e., a statistic
= r(X
1
, , X
n
). There are several method to obtain an estimator for , such as the MLE,
method of moment, and Bayesian method.
A diculty that arises is that since we can usually apply more than one of these methods
in a particular situation, we are often face with the task of choosing between estimators. Of
course, it is possible that dierent methods of nding estimators will yield the same answer
(as we have see in the MLE handout), which makes the evaluation a bit easier, but, in many
cases, dierent methods will lead to dierent estimators. We need, therefore, some criteria
to choose among them.
We will study several measures of the quality of an estimator, so that we can choose the
best. Some of these measures tell us the quality of the estimator with small samples, while
other measures tell us the quality of the estimator with large samples. The latter are also
known as asymptotic properties of estimators.
1 Mean Square Error (MSE) of an Estimator
Let

be the estimator of the unknown parameter from the random sample X
1
, X
2
, , X
n
.
Then clearly the deviation from

to the true value of , |
|, measures the quality of

the estimator, or equivalently, we can use (
)
2
for the ease of computation. Since

is a
random variable, we should take average to evaluation the quality of the estimator. Thus,
we introduce the following
Denition: The mean square error (MSE) of an estimator

of a parameter is the function
of dened by E(
)
2
, and this is denoted as MSE
.
This is also called the risk function of an estimator, with (
)
2
called the quadratic loss
function. The expectation is with respect to the random variables X
1
, , X
n
since they are
the only random components in the expression.
Notice that the MSE measures the average squared dierence between the estimator

and
the parameter , a somewhat reasonable measure of performance for an estimator. In general,
any increasing function of the absolute distance |
| would serve to measure the goodness

1
2
of an estimator (mean absolute error, E(|
|), is a reasonable alternative. But MSE has

at least two advantages over other distance measures: First, it is analytically tractable and,
secondly, it has the interpretation
MSE
= E(
)
2
= V ar(
) + (E(
) )
2
= V ar(
) + (Bias of

)
2
This is so because
E(
)
2
= E(
2
) +E(
2
) 2E(
)
= V ar(
) + [E(
)]
2
+
2
2E(
)
= V ar(
) + [E(
) ]
2
Denition: The bias of an estimator

of a parameter is the dierence between the
expected value of

and ; that is, Bias(
) = E(
). An estimator whose bias is identically

equal to 0 is called unbiased estimator and satises E(
) = for all .
Thus, MSE has two components, one measures the variability of the estimator (precision)
and the other measures the its bias (accuracy). An estimator that has good MSE properties
has small combined variance and bias. To nd an estimator with good MSE properties, we
need to nd estimators that control both variance and bias.
For an unbiased estimator

, we have
MSE
= E(
)
2
= V ar(
)
and so, if an estimator is unbiased, its MSE is equal to its variance.
Example 1: Suppose X
1
, X
2
, , X
n
are i.i.d. random variables with density function
f(x|) =
1
2
exp
_
|x|
_
, the maximum likelihood estimator for
=
n
i=1
|X
i
|
n
is unbiased.
Solution: Let us rst calculate E(|X|) and E(|X|
2
) as
E(|X|) =
_

|x|f(x|)dx =
_

|x|
1
2
exp
_
|x|
_
dx
=
_

0
x
exp
_
_
d
x
=
_

0
ye
y
dy = (2) =
and
E(|X|
2
) =
_

|x|
2
f(x|)dx =
_

|x|
2
1
2
exp
_
|x|
_
dx
=
2
_

0
x
2
2
exp
_
_
d
x
=
2
_

0
y
2
e
y
dy = (3) = 2
2
3
Therefore,
E( ) = E
_
|X
1
| + +|X
n
|
n
_
=
E(|X
1
|) + +E(|X
n
|)
n
=
So is an unbiased estimator for .
Thus the MSE of is equal to its variance, i.e.
MSE

= E( )
2
= V ar( ) = V ar
_
|X
1
| + +|X
n
|
n
_
=
V ar(|X
1
|) + +V ar(|X
n
|)
n
2
=
V ar(|X|)
n
=
E(|X|
2
) (E(|X|))
2
n
=
2
2
2
n
=

2
n
The Statistic S
2
: Recall that if X
1
, , X
n
come from a normal distribution with variance
2
, then the sample variance S
2
is dened as
S
2
=
n
i=1
(X
i

X)
2
n 1
It can be shown that
(n1)S
2
2

2
n1
. From the properties of
2
distribution, we have
E
_
(n 1)S
2
2
_
= n 1 E(S
2
) =
2
and
V ar
_
(n 1)S
2
2
_
= 2(n 1) V ar(S
2
) =
2
4
n 1
Example 2: Let X
1
, X
2
, , X
n
be i.i.d. from N(,
2
) with expected value and variance
2
, then

X is an unbiased estimator for , and S
2
is an unbiased estimator for
2
.
Solution: We have
E(

X) = E
_
X
1
+ +X
n
n
_
=
E(X
1
) + +E(X
n
)
n
=
Therefore,

X is an unbiased estimator. The MSE of

X is
MSE
X
= E(

X )
2
= V ar(

X) =

2
n
This is because
V ar(

X) = V ar
_
X
1
+ +X
n
n
_
=
V ar(X
1
) + +V ar(X
n
)
n
2
=

2
n
4
Similarly, as we showed above, E(S
2
) =
2
, S
2
is an unbiased estimator for
2
, and the MSE
of S
2
is given by
MSE
S
2 = E(S
2
2
) = V ar(S
2
) =
2
4
n 1
.
Although many unbiased estimators are also reasonable from the standpoint of MSE, be
aware that controlling bias does not guarantee that MSE is controlled. In particular, it is
sometimes the case that a trade-o occurs between variance and bias in such a way that
a small increase in bias can be traded for a larger decrease in variance, resulting in an
improvement in MSE.
Example 3: An alternative estimator for
2
of a normal population is the maximum likeli-
hood or method of moment estimator
2
=
1
n
n
i=1
(X
i

X)
2
=
n 1
n
S
2
It is straightforward to calculate
E(

2
) = E
_
n 1
n
S
2
_
=
n 1
n

2
so

2
is a biased estimator for
2
. The variance of

2
can also be calculated as
V ar(

2
) = V ar
_
n 1
n
S
2
_
=
(n 1)
2
n
2
V ar(S
2
) =
(n 1)
2
n
2
2
4
n 1
=
2(n 1)
4
n
2
.
Hence the MSE of

2
is given by
E(

2
)
2
= V ar(

2
) + (Bias)
2
=
2(n 1)
4
n
2
+
_
n 1
n

2
2
_
2
=
2n 1
n
2

4
We thus have (using the conclusion from Example 2)
MSE
2
=
2n 1
n
2

4
<
2n
n
2
4
=
2
4
n
<
2
4
n 1
= MSE
S
2.
This shows that

2
has smaller MSE than S
2
. Thus, by trading o variance for bias, the
MSE is improved.
The above example does not imply that S
2
should be abandoned as an estimator of
2
. The
above argument shows that, on average,

2
will be closer to
2
than S
2
if MSE is used as a
measure. However,

2
is biased and will, on the average, underestimate
2
. This fact alone
may make us uncomfortable about using

2
as an estimator for
2
.
In general, since MSE is a function of the parameter, there will not be one best estimator
in terms of MSE. Often, the MSE of two estimators will cross each other, that is, for some
5
parameter values, one is better, for other values, the other is better. However, even this
partial information can sometimes provide guidelines for choosing between estimators.
One way to make the problem of nding a best estimator tractable is to limit the class of
estimators. A popular way of restricting the class of estimators, is to consider only unbiased
estimators and choose the estimator with the lowest variance.
If

1
and

2
are both unbiased estimators of a parameter , that is, E(
1
) = and E(
2
) = ,
then their mean squared errors are equal to their variances, so we should choose the estimator
with the smallest variance.
A property of Unbiased estimator: Suppose both A and B are unbiased estimator for
an unknown parameter , then the linear combination of A and B: W = aA+(1 a)B, for
any a is also an unbiased estimator.
Example 4: This problem is connected with the estimation of the variance of a normal
distribution with unknown mean from a sample X
1
, X
2
, , X
n
of i.i.d. normal random
variables. For what value of does
n
i=1
(X
i

X)
2
have the minimal MSE?
Please note that if =
1
n1
, we get S
2
in example 2; when =
1
n
, we get

2
in example 3.
Solution:
As in above examples, we dene
S
2
=
n
i=1
(X
i

X)
2
n 1
Then,
E(S
2
) =
2
and Var(S
2
) =
2
4
n 1
Let
e
=
n
i=1
(X
i

X)
2
= (n 1)S
2
and let t = (n 1) Then
E(e
) = (n 1)E(S
2
) = (n 1)
2
= t
2
and
V ar(e
) =
2
(n 1)
2
V ar(S
2
) =
2t
2
n 1
4
We can Calculate the MSE of e
as
MSE(e
) = V ar(e
) + [Bias]
2
= V ar(e
) +
_
E(e
)
2
_
2
= V ar(e
) + (t
2
2
)
2
= V ar(e
) + (t 1)
2
4
.
6
Plug in the results before, we have
MSE(e
) =
2t
2
n 1
4
+ (t 1)
2
4
= f(t)
4
where
f(t) =
2t
2
n 1
+ (t 1)
2
=
_
n + 1
n 1
t
2
2t + 1
_
when t =
n1
n+1
, f(t) achieves its minimal value, which is
2
n+1
. That is the minimal value of
MSE(e
) =
2
4
n+1
, with (n 1) = t =
n1
n+1
, i.e. =
1
n+1
.
From the conclusion in example 3, we have
MSE
2
=
2n 1
n
2

4
<
2
4
n 1
= MSE
S
2.
It is straightforward to verify that
MSE
2
=
2n 1
n
2

4
2
4
n + 1
= MSE(e
)
when =
1
n+1
.
2 Eciency of an Estimator
As we pointed out earlier, Fisher information can be used to bound the variance of an
estimator. In this section, we will dene some quantity measures for an estimator using
Fisher information.
2.1 Ecient Estimator
Suppose

= r(X
1
, , X
n
) is an estimator for , and suppose E(
) = m(), a function of ,
then T is an unbiased estimator of m(). By information inequality,
Var(
)
[m
()]
2
nI()
when the equality holds, the estimator

is said to be an ecient estimator of its expectation
m(). Of course, if m() = , then T is an unbiased estimator for .
Example 5: Suppose that X
1
, , X
n
form a random sample from a Bernoulli distribution
for which the parameter p is unknown. Show that

X is an ecient estimator of p.
7
Proof: If X
1
, , X
n
Bernoulli(p), then E(

X) = p, and Var(

X) = p(1p)/n. By example
3 from the sher information lecture note, the sher information is I(p) = 1/[p(1 p)].
Therefore the variance of

X is equal to the lower bound 1/[nI(p)] provided by the information
inequality, and

X is an ecient estimator of p.
Recall that in the proof of information inequality, we used the Cauchy-Schwartz inequality,
_
Cov
, l
n
(X|)]
_
2
Var
]Var
[l
n
(X|)].
From the proof procedure, we know that if the equality holds in Cauchy-Schwartz inequality,
then the equality will hold in information inequality. We know that if and only if there
is a linear relation between

and l
n
(X|), the Cauchy-Schwartz inequality will become an
equality, and hence the information inequality will become an equality. In other words,

will be an ecient estimator if and only if there exist functions u() and v() such that
= u()l
n
(X|) +v().
The functions u() and v() may depend on but not depend on the observations X
1
, , X
n
.
Because

is an estimator, it cannot involve the parameter . Therefore, in order for

to
be ecient, it must be possible to nd functions u() and v() such that the parameter
will actually be canceled from the right side of the above equation, and the value of

will
depend on the observations X
1
, , X
n
and not on .
Example 6: Suppose that X
1
, , X
n
form a random sample from a Poisson distribution
for which the parameter is unknown. Show that

X is an ecient estimator of .
Proof: The joint p.m.f. of X
1
, , X
n
is
f
n
(x|) =
n
i=1
f(x
i
|) =
e
n
n x
n
i=1
x
i
!
.
Then
l
n
(X|) = n +n

X log
n
i=1
log(X
i
!),
and
l
n
(X|) = n +
n

X
.
If we now let u() = /n and v() = , then
X = u()l
n
(X|) +v().
Since the statistic

X has been represented as a linear function of l
n
(X|), it follows that

X
will be an ecient estimator of its expectation . In other words, the variance of

X will
attain the lower bound given by the information inequality.
8
Suppose

is an ecient estimator for its expectation E(
) = m(). Let a statistic T be a

linear function of

, i.e. T = a
+ b, where a and b are constants. Then T is an ecient

estimator for E(T), i.e., a linear function of an ecient estimator is an ecient estimator
for its expectation.
Proof: We can see that E(T) = aE(
) +b = am() +b, by information inequality

Var(T)
a
2
[m
()]
2
nI()
.
We also have
Var(T) = Var(a
+b) = a
2
Var(
) = a
2
[m
()]
2
nI()
,
since

is an ecient estimator for m(), Var(
) attains its lower bound. Our computation

shows that the variance of T can attain its lower bound, which implies that T is an ecient
estimator for E(T).
Now, let us consider the exponential family distribution
f(x|) = exp[c()T(x) +d() +S(x)],
and we suppose there is a random sample X
1
, , X
n
from this distribution. We will show
that the sucient statistic

n
i=1
T(X
i
) is an ecient estimator of its expectation.
Clearly,
l
n
(X|) =
n
i=1
log f(X
i
|) =
n
i=1
[c()T(X
i
) +d() +S(X
i
)] = c()
n
i=1
T(X
i
)+nd()+
n
i=1
S(X
i
),
and
l
n
(X|) = c
()
n
i=1
T(X
i
) +nd
().
Therefore, there is a linear relation between

n
i=1
T(X
i
) and l
n
(X|):
n
i=1
T(X
i
) =
1
c
()
l
n
(X|)
nd
()
c
()
.
Thus, the sucient statistic
n
i=1
T(X
i
) is an ecient estimator of its expectation. Any linear
function of

n
i=1
T(X
i
) is a sucient statistic and is an ecient estimator of its expectation.
Specically, if the MLE of is a linear function of sucient statistic, then MLE is ecient
estimator of .
Example 7. Suppose that X
1
, , X
n
form a random sample from a normal distribution for
which the mean is known and the variance
2
is unknown. Construct an ecient estimator
for
2
.
9
Solution: Let =
2
be the unknown variance. Then the p.d.f. is
f(x|) =
1
2
exp
_
1
2
(x )
2
_
,
which can be recognized as a member of exponential family with T(x) = (x )
2
. So
n
i=1
(X
i
)
2
is an ecient estimator for its expectation. Since E[(X
i
)
2
] =
2
,
E[
n
i=1
(X
i
)
2
] = n
2
. Therefore,

n
i=1
(X
i
)
2
/n is an ecient estimator for
2
.
2.2 Eciency and Relative Eciency
For an estimator

, if E(
) = m(), then the ratio between the CR lower bound and Var(
)
is called the eciency of the estimator

, denoted as e(
), i.e.
e(
) =
[m
()]
2
/[nI()]
Var(
)
.
By the information inequality, we have e(
) 1 for any estimator

.
Note: some textbooks or materials dene ecient estimator and eciency of an estimator
only for unbiased estimator, which is a special case of m() = in our denitions.
If an estimator is unbiased and and its variance attains the Cramer-Rao lower bound, then
it is called the minimum variance unbiased estimator (MVUE).
To evaluate an estimator

, we dened the mean squared error as
MSE(
) = Var(
) + (E(
) )
2
If the estimator is unbiased, then MSE(
) = Var(
). When two estimators are both unbiased,

comparison of their MSEs reduces to comparison of their variances.
Given two unbiased estimators,

and

, of a parameter , the relative eciency of

relative to

is dened as
e(
,

) =
Var(
)
Var(
)
.
Thus, if the eciency is smaller than 1,

has a larger variance than

has. This comparison
is most meaningful when both

and

are unbiased or when both have the same bias.
Frequently, the variances of

and

are of the form
var(
) =
c
1
n
and var(
) =
c
2
n
where n is the sample size. If this is the case, the eciency can be interpreted as the ratio
of sample sizes necessary to obtain the same variance for both

and

.
10
Example 8: Let Y
1
, , Y
n
denote a random sample from the uniform distribution on the
interval (0, ). Consider two estimators,
1
= 2
Y and

2
=
n + 1
n
Y
(n)
,
where Y
(n)
= max(Y
1
, , Y
n
). Find the eciency of

1
relative to

2
.
Solution: Because each Y
i
follows a uniform distribution on the interval (0, ), = E(Y
i
) =
/2, and
2
= Var(Y
i
) =
2
/12. Therefore,
E(
1
) = E(2
Y ) = 2E(
Y ) = 2 = ,
so

1
is unbiased. Furthermore
Var(
1
) = Var(2
Y ) = 4Var(
Y ) = 4
2
n
=

2
3n
.
To nd the mean and variance of

2
, recall that the density function of Y
(n)
is given by
g
(n)
(y) = n[F
Y
(y)]
n1
f
Y
(y) =
_
_
_
n
_
y
_
n1
1
for 0 y
0 otherwise
Thus,
E(Y
(n)
) =
n
n
_

0
y
n
dy =
n
n + 1
,
it follows that E(
2
) = E{[(n + 1)/n]Y
(n)
} = , i.e.

2
is an unbiased estimator for .
E(Y
2
(n)
) =
n
n
_

0
y
n+1
dy =
n
n + 2
2
,
therefore, the variance of Y
(n)
is
Var(Y
(n)
) = E(Y
2
(n)
) E(Y
(n)
)
2
=
n
n + 2
_
n
n + 1
_
2
.
Thus,
Var(
2
) = Var
_
n + 1
n
Y
(n)
_
=
_
n + 1
n
_
2
_
n
n + 2
_
n
n + 1
_
2
_
=

2
n(n + 2)
.
Finally, the eciency of

1
relative to

2
is given by
e(
1
,

2
) =
Var(
2
)
Var(
1
)
=

2
/[n(n + 2)]
2
/(3n)
=
3
n + 2
.
11
3 Exercises
Exercise 1. X, the cosine of the angle at which electrons are emitted in muon decay has a
density
f(x) =
1 +X
2
1 x 1 1 1
The parameter is related to polarization. Show that E(X) =

3
. Consider an estimator
for the parameter , = 3

X. Computer the variance, the bias, and the mean square error
of this estimator.
Exercise 2. Suppose that X
1
, , X
n
form a random sample from a normal distribution
for which the mean is unknown and the variance
2
is known. Show that

X is an ecient
estimator of .
Exercise 3. Suppose that X
1
, , X
n
form a random sample of size n from a Poisson
distribution with mean . Consider

1
= (X
1
+ X
2
)/2 and

2
=

X. Find the eciency of
1
relative to

2
.
Exercise 4. Suppose that Y
1
, , Y
n
denote a random sample of size n form an exponential
distribution with density function given by
f(y) =
_
1
e
y/
fory > 0
0 otherwise
Consider two estimators

1
= nY
(1)
, and

2
=

Y , where Y
(1)
= min(Y
1
, , Y
n
). Please show
that both

1
and

2
are unbiased estimator of , nd their MSE, and nd the eciency of
1
relative to

2
.

Mse

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mse

Uploaded by

Copyright:

Available Formats

Math 541: Statistical Theory II

Methods of Evaluating Estimators

|, measures the quality of

| would serve to measure the goodness

|), is a reasonable alternative. But MSE has

). An estimator whose bias is identically

) = m(). Let a statistic T be a

+ b, where a and b are constants. Then T is an ecient

) +b = am() +b, by information inequality

) attains its lower bound. Our computation

) 1 for any estimator

). When two estimators are both unbiased,

You might also like