Professional Documents
Culture Documents
17 Statistics I Methods of Estimation
17 Statistics I Methods of Estimation
Having modelled the data or the real life situation we often assume a family of distributions or models indexed by
one or more parameters. We would like to choose a member of the family prompted by the available data. The
methods used to guess these unknown parameters are called methods of estimation.
= h( 1; 2; ; k) :
Here we assume that the …rst 2k moments of the distribution exist, E( X 2k ) < 1: (If 2k moments exist then one
can …nd the mean squared error of the MOM estimator.)
r = E (X r ) ; 1 r 2k:
Remark 1 This method is very simple and intuitive. For large samples MOM estimators exhibit good properties.
P P
Under suitable assumptions we can show that mk ! k ; k 1 and bn ! as n ! 1: This convergence in
probability is often referred to as consistency. Here n denotes the sample size. MOM estimators, however, need not
be unique. But again for large samples various MOM estimators for the parameter say, are essentially same. For
P
instance if b1 and b2 are two di¤ erent MOM estimators for then b1n b2n ! 0 as n ! 1: These properties
are studied in greater details in large sample theory or asymptotic theory.
Example 2 In a factory with a large number of workers a typical worker earns monthly overtime remunerations
anywhere between Rs 1000 to Rs 2000. It may be assumed that the monthly overtime earned by a worker (in
thousand Rs.) is uniformly distributed on [1; 2] : In the month of October on account of Deepaawalee all workers got
equal amount of festival bonus. If n randomly chosen workers got their remuneration packets as X1 ; X2 ; ; Xn that
consisted of overtime + Deepaawalee bonus (in thousand Rs.) then obtain method of moments (MOM ) estimator as
well as maximum likelihood estimator (MLE ) for ; the festival bonus.
Solution 3 Let denote the common value of the bonus (in thousand Rs). Let X denote the remuneration packet
of a typical worker then we argue that X is uniformly distributed on [1 + ; 2 + ] : The mean = E (X) of such a
uniform distribution is + 23 : We thus solve = + 32 for and obtain = 3
2 : Thus the rationale of method of
Pn
moments suggests that MOM estimator for is given by bM O M = X 2 : Here we use the sample mean X = n1 i=1 Xi
3
1
Remark 4 Note that E bM O M = since E X = = + 32 : Thus our MOM estimator bM O M is unbiased for :
It is easy to see that V ar (X) the variance of X is given by 1
12 : Hence V ar bM O M = 1
12n : Since V ar bM O M = 1
12n
P
goes to 0 as n ! 1 we have bM O M ! : In this case we have bM O M having desirable properties. However, if X < 32
then bM O M takes negative values. One way to get around this drawback is to truncate the estimator at 0 from below.
called the likelihood function. We try to maximize L ( jx) as a function of for given data x = (x1 ; x2 ; ; xn ) : If
the max 2 L ( jx1 ; x2 ; ; xn ) is attained at b = b (x) then it is called MLE for :
Such a b may not always exist. Even if it does we may not know its explicit form. One can however, under some
regularity conditions prove the existence and good large sample properties of b:
fX (xj ) = 1 if x 2 [1 + ; 2 + ]
= 0 otherwise
L ( jx) = 1 if 1 + xi 2+ 8i
= 0 otherwise
Rewriting we get
Or
Thus any value in the interval [xmax 2; xmin 1] would be a candidate for bM L E : Here xmax = max1 i n xi and
xmin = min xi :
2
Remark 8 In this example MLE is not very satisfactory however MOM estimator has good properties. However, if
< 12 then bM O M may take negative values. In such situation we can think of truncating the MOM estimator from
below as bM O M T = bM O M whenever bM O M is nonnegative and bM O M T = 0 whenever bM O M is negative. It is easy to see that
M SE bM O M T M SE bM O M : Here if T is an estimator for then the mean squared error (MSE) of T is de…ned
2
as M SE (T ) = E (T )
n
1X
2
xi
L ( jx) = log L ( jx) = Const :
2 i=1
2
Di¤erentiating w:r:t: treating x and …xed we get
n
@L ( jx) 1 X
= 2 (xi )
@ i=1
Remark 12 The above example would also work for the laboratory experiment for determination of g; as mentioned
earlier on.
Qn
Remark 13 (Consistency of mles:) Let X1 ; X2 ; ; Xn be a random sample from f (xj ): Let L( jx) = i=1 f (xi j )
be the likelihood function. Let b denote the mle of : Let ( ) be a continuous function of : Under appropriate
regularity conditions, b is a consistent estimator of ( ) : Recall that consistency means convergence in probability.
Remark 14 (Asymptotic e¢ ciency of mles:) Let X1 ; X2 ; ; Xn be a random sample from f (xj ): Let L( jx) =
Qn b
i=1 f (xi j ) be the likelihood function. Let denote the mle of : Let ( ) be a continuous function of : (In
p b d
particular we have ( ) = :) Under appropriate regularity conditions n ( ) ! N (0; ( )) ; where
2 h 2 i
[ 0 ( )] 2
( ) = I1 ( ) is the Cramér-Rao Lower Bound (CRLB): Here I1 ( ) = E @@ log f (xj ) = E @@ 2 log f (xj ) :
I1 ( ) is a measure of information about in a single observation. Thus, b is a consistent and asymptotically
e¢ cient estimator of ( ) : These aspects would be studied later in large sample theory.
Example 15 Finally let us recall the example of predicting the outcome of the elections where we have either Bino-
mial model or Hypergeometric model. In either case we can show that the MLE for the unknown proportion is the
sample proportion.
3
Example 16 The angle at which the electrons are emitted in muon decay has a distribution with pdf given by
1
f (xj ) = (1 + x) ; 1 x 1 and 1 1
2
1
where x = cos : The parameter is related to polarization. Physical considerations dictate that j j 3 ; but we note
that f (xj ) is a pdf for j j 1: Let X1 ; X2 ; ; Xn be a random sample of measurements on cos : Obtain method
of moments estimator for : Also obtain maximum likelihood estimator for :
Remark 17 The muon is one of nature’s fundamental “building blocks of matter” and acts in many ways as if it
were an unstable heavy electron, for reasons no one fully understands.
Example 18 Let X1 ; X2 ; ; Xn be a random sample from Gamma( ; ); > 0; > 0: Find MOM estimators
and MLEs for and :
If X has Gamma( ; ) distribution then the pdf (probability density function) is given by
1
fX (xj ; ) = exp ( x) x I(0;1) (x) :
( )
Therefore, the joint density function of the iid random variables X1 ; X2 ; ; Xn is given by
Q
n
fX (xj ; ) = fX (xi j ; )
i=1
Q
n
1
= exp ( xi ) xi I(0;1) (xi )
i=1 ( )
n
P
n Q
n
1
= n exp xi xi I(0;1) (xi ) :
[ ( )] i=1 i=1
@L ( ; jx) n @ [ ( )] P n
= n log + log xi
@ [ ( )] @ i=1
@L ( ; jx) n P
n
= xi
@ i=1
=
x
and
n @ [ ( )] P n
n log + log xi = 0
x [ ( )] @ i=1
or
1 @ [ ( )]
log = log x log x (1)
[ ( )] @
1
P
n
where log x = n log xi ; the mean of log xi values:
i=1
4
Since the function log x is concave, by Jensen’s inequality log x log x 0:
We thus have explicit closed form expression for ; which of course involves ; however we may need to make do
with an implicit solution for : There is no closed-form solution for . The function is numerically very well behaved,
so if a numerical solution is desired, it can be found using, for example, Newton’s method. An initial value of may
be found either using the method of moments or some other appropriate methods.
Thus
bmle = b mle
X
and b mle = : available only as a numerical solution to (1).
Note that this is how we should present our estimators.
Remark: This example is to illustrate that the MLE may exist but need not be available in a closed form.
at (b mle ; bmle ):
2
Note that for any given value of > 0; @ L(@ ;2 jx) = n 2 < 0; the MLE for is given by bmle = X
: Further the
algorithm for …nding b mle would take care of the fact that it is a maximum and not a minimum.
2 2
Example 19 Let X1 ; X2 ; ; Xn be a random sample from N( ; ); 2 ( 1; 1) ; > 0: Find MOM estimators
and MLEs for and 2 :
2
For N( ; ) distribution pdf (probability density function) is given by
2 1 1 2
fX xj ; =p exp 2
(x ) :
2 2
Therefore, the joint density function of the iid random variables X1 ; X2 ; ; Xn is given by
2 Q
n
2
fX xj ; = fX xi j ;
i=1
Q
n 1 1 2
= p exp 2
(x )
i=1 2 2
2
n
2
1 P
n
2
= 2 exp 2
(xi ) :
2 i=1
5
Case One : 2 > 0 is known.
2
We now …nd the partial derivatives of L ; jx :
2
@L ; jx 1 P
n
= 2
(xi )
@ i=1
@2L ; 2
jx 1
2
= 2
< 0 for all :
@
Equating the …rst order partial derivative to 0 and solving we get
= x:
Therefore
bmle = X:
Note that bmle = X does not depend on 2 > 0 which is assumed to be known. We shall use this observation in the
following case.
Case Two : 2 > 0 is unknown.
Earlier in Case One we observed that bmle = X does not depend on 2 > 0 which was assumed to be known.
Therefore X would continue to be MLE for : We now …nd MLE for 2 :
2
@L ; jx n 1 P
n
2
2
= 2
+ 2 (xi )
@ 2 2 ( 2 ) i=1
Pn
2
(xi )
@2L ; 2
jx n i=1
=
@( 2 )2 2( 2 )2 ( 2 )3
2 P
n 3
1 2
(xi )
n 6 n 17
= 6 i=1 7:
( 2 )2 4 2 25
2 1 Pn
2
= (xi ) :
n i=1
@ 2 L( ; 2 jx) 2 1
P
n
2
Further note that @( 2 )2
at = n (xi ) is given by
i=1
@2L ; 2
jx n
= <0
@( 2 )2 2( 2 )2
bmle = X
1 Pn 2
and c2 mle = Xi X :
n i=1
Note that this is how we should present our estimators.
2 2
Remark : We could have followed the routine method of simultaneously optimizing L ; jx w.r.t. and :
Method of Moments:
6
2
If both and are unknown we note that E (X) = ; and E X 2 = 2
+ 2
as V ar (X) = 2
: Using MOM
Pn Pn 2
rationale we get bM O M = X = n1 i=1 Xi and b2M O M = n1 i=1 Xi2 X :
bmom = X
1 Pn 2
and c2 mom = Xi X :
n i=1
Note that the MLEs and MOMEs agree. It can be checked that bM O M is unbiased and consistent. b2M O M is asymp-
totically unbiased and consistent.
Example 20 If we have a random sample of size n from N ; 2 with 2 = 20 known. We know that the sample
p
Pn 2 n(X )
mean X = n1 i=1 Xi is unbiased for : The distribution of X is N ; n0 : Therefore Z = 0
has N (0; 1)
distribution. This may be used to suggest a con…dence interval (CI) for unknown :
Let z be the -quantile of N (0; 1) so that Pr (Z < z ) = : Note therefore that Pr jZj < z =1 :
(1 2 )
Equivalently Pr X p0n z(1 )< <X+ p0 z
n (1 ) =1 :
2 2
Therefore we may suggest
X p0 z(1 0
) ; X + pn z(1 )
n 2 2
S S
X p t((n 1);(1 )) ; X + pn t((n 1);(1 ))
n 2 2
1
Pn 2
Here S 2 = n 1 i=1 Xi X and
Pr jT j < t((n 1);(1 )) = 1 :
2
Remark 22 It is important to interpret 100 (1 ) % con…dence interval (CI) correctly, i.e., to interpret the state-
ments hlike i
Pr 2 X p0n z(1 ) ; X + p0n z(1 ) = 1 ; correctly. Here is not a random quantity, it is unknown
2 2
but …xed. Whereas the interval estimator or the con…dence interval is random and it varies from sample to sample.
Suppose we repeat the experiment of drawing a random sample of size n from N ; 20 ; say, a large number of times,
then we can expect unknown to belong to the 95% CI in 95% of the trials.