Professional Documents
Culture Documents
by Howard G. Tucker
P ([j Xn Xj ]) ! 0 as n ! 1.
216
Let > 0 be arbitrary, with < 1. Then for n > 1 , we have
0 P ([j Xn 0j ]) = P ([Xn ])
1
P ([Xn n ])
= n1 ! 0 as n ! 0.
[j f (Xn ; Yn ) f (a; b) j ] [j Xn aj ] [ [j Yn bj ].
Taking the probabilities of both sides, and using Boole’s inequality, we obtain
217
i.e., f (Xn ; Yn ) !P f (a; b) as n ! 1.
De…nition. If X is an observable random variable, we say it is an
unbiased estimate of the constant if E(X) = .
The next de…nition appears to be a little odd, but this is the way a lot of
mathematical statisticians speak and write.
De…nition. If fXn g is an in…nite sequence of observable random vari-
ables, then we say “Xn is a consistent estimate of ”if Xn !P as n ! 1.
There are many examples of unbiased estimates. If Xn is Bin(n; p), then
E(Xn ) = np, or E(Xn =n) = p. Thus Xn =n is an unbiased estimate of p.
Also, by Bernoulli’s theorem and the de…nition of convergence in probability
given above, Xn =n is a consistant estimate of p.
Here is an example where we can obtain three di¤erent consistent esti-
mates of a particular parameter. Let X be a random variable with absolutely
continuous distribution with density given by
1 if < x < + 1
fX (x) =
0 otherwise.
0 P ([j Un j ])
= P ([U
Tn + ])
=QP ( ni=1 [Xj + ])
n
= i=1 P ([Xj + ])
= (1 )n ! 0 as n ! 1.
X1 + + Xn
P ([j Vn j ]) = P ([j E(X) j ]) ! 0
n
as n ! 1, for all > 0. Note also that if the observable random variable
218
Wn is de…ned by
Wn = maxfXi ; 1 i ng 1,
then one can easily prove that Wn !P as n ! 1.
EXERCISES
219
of a consistent estimate of the parameter, is that one is assured that the far-
ther out one goes in the sequence the closer one will get to this parameter in
a certain sense. Especially if one does not have an unbiased estimate of the
parameter but does have a consistant estimate, then one can take comfort
in the fact that the larger n is, the closer the estimate is to the value of
the parameter. A very useful method of …nding consistant estimates is by
the method of moments which is developed in this section. We …rst need an
important lemma.
Lemma 1. If fXn g are independent and identically distributed random
variables with …nite 2rth moment, where r is a positive integer, then
1X r P
n
X ! E(X1r )as n ! 1.
n i=1 i
Proof: We …rst observe that the rth powers, fXnr g, are independent and
identically distributed with common …nite second moment. Hence by the law
of large numbers we obtain the conclusion.
Lemma 2. If X; X1 ; X2 ; are random variables that are constant, i.e.,
there exists constants a; a1 ; a2 ; such that P ([X = a]) = 1 and P ([Xn =
an ]) = 1 for all n, and if an ! a as n ! 1, then Xn !P X = a as n ! 1.
Proof: Let > 0 be arbitrary. Since by hypothesis, an ! a as n ! 1,
there exists a positive integer N such that j an a j< for all values of n
that satisfy n > N . Hence P ([j Xn X j< ]) = 1 for all n > N , from which
it follows that P ([j Xn X j ]) = 0 for all n > N , i.e., Xn !P X = a as
n ! 1.
We shall consider the case of a sequence of independent observations
X1 ; X2 ; on a random variable X. This means that each random variable
in the sequence has the same distribution function as does X and is an inde-
pendent observation on the same population that X is. To keep things simple,
suppose the distribution function of X depends on two unknown parameters,
1 and 2 ; and suppose we wish to estimate both of these parameters with
consistent estimates. Let us suppose that this sequence of random variables
has a …nite common fourth moment. Then by the above lemma,
1X r P
n
Xi ! E(X1r )as n ! 1 for r = 1; 2.
n i=1
220
m1 = u( 1 ; 2 ) and m2 = v( 1 ; 2 ). Further we assume that a contiunuous
inverse of this mapping exists, namely, that one can solve these two equations
for 1 and 2 as continuous functions, 1 = g(m1 ; m2 ) and 2 = h(m1 ; m2 ),
P
n Pn
of m1 and m2 . Let us denote X n = n1 Xi and Vn = n1 Xi2 . Then by
i=1 i=1
theorem 1 in section 9:1 and theorem 1 above,
g(X n ; Vn ) !P g(m1 ; m2 ) = 1
and
h(X n ; Vn ) !P h(m1 ; m2 ) = 2
and
1 X
n
s2n = (Xk X n )2 .
n 1 k=1
Theorem 1. If X1 ; X2 ; are independent observations on a random
variable X which has a …nite fourth moment, then both b2n and s2n are con-
sistent estimates of V ar(X), but b2n is not an unbiased estimate of V ar(X).
Pn 2
Proof: We …rst show that b2n = n1 Xk nX n . Indeed,
k=1
P
n 2
b2n = 1
n
(Xi2 2X n Xi + X n )
i=1
1
Pn 2 2
= n
Xi2 2X n + X n
i=1
1
Pn 2
= n
Xi2 X n.
i=1
Now, we apply the law of large numbers and theorem 2 of section 9:1 and
lemma 1 above. Observing that the function f (x; y) = y x2 is continuous
in (x; y), and since V ar(X) = E(X 2 ) (E(X))2 , we may conclude that
b2n !P V ar(X) as n ! 1. Since s2n = nn 1 b2n and since by lemma 2 the
constant random variable nn 1 !P 1, we again obtain by theorem 2 in section
9:1, by means of the continuous function g(x; y) = xy, that s2n !P V ar(X).
Now we compute
!
2 P n P
E(X n ) = n12 E 2
i=1 Xi + Xj Xk
j6=k
!
P
= n12 nE(X 2 ) + E(Xj )E(Xk )
j6=k
= n12 (nE(X 2 ) + n(n 1)(E(X))2 )
= n1 E(X 2 ) + nn 1 (E(X))2 .
222
Thus,
n 1 n 1
E(b2n ) = (E(X 2 ) (E(X))2 ) = V ar(X),
n n
which implies that b2n is not an unbiased estimate of V ar(X). However, since
E(s2n ) = nn 1 E(b2n ), it follows that s2n is an unbiased estimate of V ar(X).
Example 3. . Let X be a random variable with an absolutely continuous
distribution with density given by
1
b a
if a < x < b
fX (x) =
0 if x a or x b,
E(X) = a+b 2
E(X 2 ) = 13 (a2 + ab + b2 ).
EXERCISES
224
In brief, a maximum likelihood estimate is a function g of the observable
random variables X that maximizes the probability of the value of X that is
observed. If the random variables X have a joint discrete density, a density
that is sometimes written as fX (xj ) in order to show its dependence on the
parameter , a maximum likelihood estimate of is a function = (X) that
maximizes fX (Xj ). _ Note that x is replaced by X; this is because X_ is the
value of X that you get, so fX (Xj ) is the probability of getting the value
that you get. If this is di¢ cult to understand, have patience, and wait for the
examples that follow. In the case where the joint distribution is absolutely
continuous and has density fX (xj ), then it is loosely said that fX (xj )dx
is the probability of X taking a value in a small neighborhood of x, and so
the maximum likelihood estimate of in this case is that value of that
maximizes the probability of getting close to, or in a neighborhood of, the
value of X that one observes, i.e., = (X) in this case is the value of
that maximizes the joint density. At this point it is perhaps best to consider
examples.
Example 1. The normal distribution. Suppose X1 ; ; Xn is a sample of
independent observations on a random variable X which is N ( ; 2 ), where
and 2 > 0 are unknown.We know that their joint density is
2 n=2 1 Xn
fX1 ; ;Xn (x1 ; ; xn ) = (2 ) expf 2
(Xi )2 g.
2 i=1
2
This value of maximizes the density no matter what the value of is. So
if instead looking only for the maximum likelihood estimate fo we are also
interested in the value of the ( ; 2 ) that maximizes the joint density. To
do this, let = X n and look for the value of 2 that maximizes the joint
225
2
density when = X n . Taking the derivative with respect to we …nd that
the joint density is maximized when
1 Xn
2
= b2 = (Xi X n )2 .
n i=1
From this expression of the joint density, it is clear that if we take b such
that b < maxfx1 ; ; xn g, then
When this joint density is positive, the smaller one takes b the larger the den-
sity becomes. The smallest value that b can be then is b = maxfx1 ; ; xn g.
b
Thus (X1 ; ; Xn ) = maxfX1 ; ; Xn g is a maximum likelihood estimate
of . It is not di¢ cult to prove that b(X1 ; ; Xn ) is not an unbiased esti-
mate of but is a consistent estimate of .
Example 3. The hypergeometric distribution. In a lake there are N …sh,
where N is unknown. The problem is to estimate the value of N . Someone
catches r …sh (all at the same time), marks each with a spot and returns
all of them to the lake.After a reasonable length of time, during which these
“tagged ” …sh are assumed to have distributed themselves “at random” in
the lake, someone catches s …sh (again, all at once). (Note: r and s are
considered to be …xed, predetermined constants.) Among these s …sh caught
226
there will be X tagged …sh, where X is a random variable. The discrete
density of X is given by
r N r
x s x
fX (xjN ) = N
s
and fX (xjN ) = 0 for all other values of x. The problem of …nding a maximum
likelihood estimate of N is to …nd that value N b = Nb (x) of N for which
fX (xjN ) is maximized. In order to accomplish this we consider the ratio
fX (xjN )
R(N ) = .
fX (xjN 1)
For those values of N for which R(N ) > 1 we know that fX (xjN ) > fX (xjN
1), and for those values of N for which R(N ) < 1 we know that fX (xjN ) is
a decreasing function of N . Using the formula for the density of X we note
(after a certain amount of algebra) that R(N ) > 1 if and only if N < rs=x,
and R(N ) < 1 if and only if N > rs=x. We see that fX (xjN ) reaches its
maximum value(as a function of N ) when N = [rs=x]. Thus a maximum
likelihood estimate of N is Nb = [rs=X]. (This is a method used in wildlife
estimation and is usually referred to as the capture-recapture metnod.)
Example 4. The general linear model. In the general linear model pre-
sented in chapter 8, Y = X + Z, we saw that the joint distribution of Y
was N (X ; 2 ). Thus its joint density is
1 1
fY (y) = 2 )n=2
expf 2
(y X )t (y X )g.
(2 2
227
substitute b = (X t X) 1 X t Y for in the formula for the joint density and
di¤erentiate with respect to 2 . Upon so doing, and solving for 2 , we ob-
tain that the joint density is maximized when b = (X t X) 1 X t Y and 2 = b2 ,
where
1
b2 = jj Y X b jj2 .
n
2
However, this value of is not an unbiased estimate of 2 ; this was shown
in chapter 8.
EXERCISES
(1 x) 1 if x 2 (0; 1)
fX (xj ) =
0 otherwise,
228