You are on page 1of 7

1 Methods of Estimation

Having modelled the data or the real life situation we often assume a family of distributions or models indexed by
one or more parameters. We would like to choose a member of the family prompted by the available data. The
methods used to guess these unknown parameters are called methods of estimation.

1.1 Method of Moments


This method is based on a simple rationale. Since we have a random sample it would act like a miniature population.
So any feature of the population may be guessed using an appropriate similar feature of the sample. Suppose we have
a random sample from a distribution with probability mass function or probability density function fX (xj ); 2 :
is called the indexing parameter and ; the set of all possible values of ; is called the parametric space. In practice
is unknown. In other words we have a family of distributions speci…ed by ffX (xj ) : 2 g : Our random sample
or the data set comes from a member of this family. If we know that member then we have complete knowledge
about the probabilistic behavior of the data. Thereby we can spell out various properties of the statistics based on
the data. Hence it is important to know the parametric value. However, we can only guess or estimate that value
from the given data.
We …rst express the parameter of interest as a function of …rst few, say, k; moments of the distribution as

= h( 1; 2; ; k) :

Here we assume that the …rst 2k moments of the distribution exist, E( X 2k ) < 1: (If 2k moments exist then one
can …nd the mean squared error of the MOM estimator.)

r = E (X r ) ; 1 r 2k:

r is the rth population moment.


We then de…ne
b = h (m1 ; m2 ; ; mk )
where
n
1X r
mr = X ;1 r k:
n i=1 i
mr is the rth sample moment.
b is called method of moments (MOM) estimator for :

Remark 1 This method is very simple and intuitive. For large samples MOM estimators exhibit good properties.
P P
Under suitable assumptions we can show that mk ! k ; k 1 and bn ! as n ! 1: This convergence in
probability is often referred to as consistency. Here n denotes the sample size. MOM estimators, however, need not
be unique. But again for large samples various MOM estimators for the parameter say, are essentially same. For
P
instance if b1 and b2 are two di¤ erent MOM estimators for then b1n b2n ! 0 as n ! 1: These properties
are studied in greater details in large sample theory or asymptotic theory.

Example 2 In a factory with a large number of workers a typical worker earns monthly overtime remunerations
anywhere between Rs 1000 to Rs 2000. It may be assumed that the monthly overtime earned by a worker (in
thousand Rs.) is uniformly distributed on [1; 2] : In the month of October on account of Deepaawalee all workers got
equal amount of festival bonus. If n randomly chosen workers got their remuneration packets as X1 ; X2 ; ; Xn that
consisted of overtime + Deepaawalee bonus (in thousand Rs.) then obtain method of moments (MOM ) estimator as
well as maximum likelihood estimator (MLE ) for ; the festival bonus.

Solution 3 Let denote the common value of the bonus (in thousand Rs). Let X denote the remuneration packet
of a typical worker then we argue that X is uniformly distributed on [1 + ; 2 + ] : The mean = E (X) of such a
uniform distribution is + 23 : We thus solve = + 32 for and obtain = 3
2 : Thus the rationale of method of
Pn
moments suggests that MOM estimator for is given by bM O M = X 2 : Here we use the sample mean X = n1 i=1 Xi
3

as an estimator for the population mean = + 32 :

1
Remark 4 Note that E bM O M = since E X = = + 32 : Thus our MOM estimator bM O M is unbiased for :
It is easy to see that V ar (X) the variance of X is given by 1
12 : Hence V ar bM O M = 1
12n : Since V ar bM O M = 1
12n
P
goes to 0 as n ! 1 we have bM O M ! : In this case we have bM O M having desirable properties. However, if X < 32
then bM O M takes negative values. One way to get around this drawback is to truncate the estimator at 0 from below.

Let us consider one more example.


2 2
Example 5 Let X1 ; X2 ; ; Xn be a random sample from N ; ; 1 < < 1; and > 0: Find MOM
estimators and MLEs for and 2 :

Solution 6 If X has N ; 2 distribution then E (X) = ; E X 2 = 2 + 2 as V ar (X) = 2 : Using MOM


Pn Pn 2
rationale we get bM O M = X = n1 i=1 Xi and b2M O M = n1 i=1 Xi2 X : It can be checked that bM O M is unbiased and
consistent. b2M O M is asymptotically unbiased and consistent.

1.2 Maximum Likelihood Method


This is the most commonly used method for estimation. Suppose again that we have a random sample from a
distribution with probability mass function (pmf ) or probability density function (pdf ) fX (xj ); 2 : To …x the
ideas let us …rst consider the discrete case. We actually have a data set from the distribution fX (xj ); 2 : The
chance of getting a data set depends on the parametric value : We now turn around and ask the question that when
does the data set that we actually observed has maximum chance of being observed?. Maximum likelihood estimator
(MLE ) is an outcome of this rationale.
To …x the ideas suppose we have a random sample X1 ; X2 ; ; Xn from fX (xj ); 2 IR: Suppose we are
interested in obtaining an MLE for : We de…ne
n
L ( jx1 ; x2 ; ; xn ) = L ( jx) = i=1 fX (xi ; )

called the likelihood function. We try to maximize L ( jx) as a function of for given data x = (x1 ; x2 ; ; xn ) : If
the max 2 L ( jx1 ; x2 ; ; xn ) is attained at b = b (x) then it is called MLE for :
Such a b may not always exist. Even if it does we may not know its explicit form. One can however, under some
regularity conditions prove the existence and good large sample properties of b:

Example 7 Continuation of Example (2). The pdf of X is given by

fX (xj ) = 1 if x 2 [1 + ; 2 + ]
= 0 otherwise

The likelihood function is therefore given by

L ( jx) = 1 if 1 + xi 2+ 8i
= 0 otherwise

Rewriting we get

L ( jx) = 1 if 1 + xmin and xmax 2+


= 0 otherwise

Or

L ( jx) = 1 if xmax 2 xmin 1


= 0 otherwise

Thus any value in the interval [xmax 2; xmin 1] would be a candidate for bM L E : Here xmax = max1 i n xi and
xmin = min xi :

2
Remark 8 In this example MLE is not very satisfactory however MOM estimator has good properties. However, if
< 12 then bM O M may take negative values. In such situation we can think of truncating the MOM estimator from
below as bM O M T = bM O M whenever bM O M is nonnegative and bM O M T = 0 whenever bM O M is negative. It is easy to see that
M SE bM O M T M SE bM O M : Here if T is an estimator for then the mean squared error (MSE) of T is de…ned
2
as M SE (T ) = E (T )

Let us now consider an example where MLE has good properties.

Example 9 This is continuation of Example (5).


2 2
Solution 10 We assume …rst that in N ; is unknown and is known. The normal pdf is given by
!
2
2 1 1 x
fX xj ; =p exp
2 2

Therefore the likelihood function is given by


n
!
1X
2
2
n
2
xi
L ( jx) = 2 exp
2 i=1

Often it is useful to deal with loglikelihood function L ( jx) = L.

n
1X
2
xi
L ( jx) = log L ( jx) = Const :
2 i=1
2
Di¤erentiating w:r:t: treating x and …xed we get
n
@L ( jx) 1 X
= 2 (xi )
@ i=1

Equating to 0 and solving we get bM L E = X:


2
Also note that @ L(
@ 2
jx)
= n2 < 0:
2
Example 11 We can deal with the case unknown.

Remark 12 The above example would also work for the laboratory experiment for determination of g; as mentioned
earlier on.
Qn
Remark 13 (Consistency of mles:) Let X1 ; X2 ; ; Xn be a random sample from f (xj ): Let L( jx) = i=1 f (xi j )
be the likelihood function. Let b denote the mle of : Let ( ) be a continuous function of : Under appropriate
regularity conditions, b is a consistent estimator of ( ) : Recall that consistency means convergence in probability.

Remark 14 (Asymptotic e¢ ciency of mles:) Let X1 ; X2 ; ; Xn be a random sample from f (xj ): Let L( jx) =
Qn b
i=1 f (xi j ) be the likelihood function. Let denote the mle of : Let ( ) be a continuous function of : (In
p b d
particular we have ( ) = :) Under appropriate regularity conditions n ( ) ! N (0; ( )) ; where
2 h 2 i
[ 0 ( )] 2
( ) = I1 ( ) is the Cramér-Rao Lower Bound (CRLB): Here I1 ( ) = E @@ log f (xj ) = E @@ 2 log f (xj ) :
I1 ( ) is a measure of information about in a single observation. Thus, b is a consistent and asymptotically
e¢ cient estimator of ( ) : These aspects would be studied later in large sample theory.

Example 15 Finally let us recall the example of predicting the outcome of the elections where we have either Bino-
mial model or Hypergeometric model. In either case we can show that the MLE for the unknown proportion is the
sample proportion.

3
Example 16 The angle at which the electrons are emitted in muon decay has a distribution with pdf given by
1
f (xj ) = (1 + x) ; 1 x 1 and 1 1
2
1
where x = cos : The parameter is related to polarization. Physical considerations dictate that j j 3 ; but we note
that f (xj ) is a pdf for j j 1: Let X1 ; X2 ; ; Xn be a random sample of measurements on cos : Obtain method
of moments estimator for : Also obtain maximum likelihood estimator for :

Remark 17 The muon is one of nature’s fundamental “building blocks of matter” and acts in many ways as if it
were an unstable heavy electron, for reasons no one fully understands.

Example 18 Let X1 ; X2 ; ; Xn be a random sample from Gamma( ; ); > 0; > 0: Find MOM estimators
and MLEs for and :

If X has Gamma( ; ) distribution then the pdf (probability density function) is given by

1
fX (xj ; ) = exp ( x) x I(0;1) (x) :
( )
Therefore, the joint density function of the iid random variables X1 ; X2 ; ; Xn is given by
Q
n
fX (xj ; ) = fX (xi j ; )
i=1
Q
n
1
= exp ( xi ) xi I(0;1) (xi )
i=1 ( )
n
P
n Q
n
1
= n exp xi xi I(0;1) (xi ) :
[ ( )] i=1 i=1

We can write down the likelihood as well as loglikelihood functions.


n
P
n Q
n
1
L ( ; jx) = n exp xi xi
[ ( )] i=1 i=1
L ( ; jx) = log L ( ; jx)
P
n P
n
= n log n log [ ( )] xi + ( 1) log xi :
i=1 i=1

We now …nd the partial derivatives of L ( ; jx) :

@L ( ; jx) n @ [ ( )] P n
= n log + log xi
@ [ ( )] @ i=1
@L ( ; jx) n P
n
= xi
@ i=1

Equating the partial derivatives to 0 and solving we get

=
x
and
n @ [ ( )] P n
n log + log xi = 0
x [ ( )] @ i=1
or
1 @ [ ( )]
log = log x log x (1)
[ ( )] @
1
P
n
where log x = n log xi ; the mean of log xi values:
i=1

4
Since the function log x is concave, by Jensen’s inequality log x log x 0:
We thus have explicit closed form expression for ; which of course involves ; however we may need to make do
with an implicit solution for : There is no closed-form solution for . The function is numerically very well behaved,
so if a numerical solution is desired, it can be found using, for example, Newton’s method. An initial value of may
be found either using the method of moments or some other appropriate methods.
Thus
bmle = b mle
X
and b mle = : available only as a numerical solution to (1).
Note that this is how we should present our estimators.
Remark: This example is to illustrate that the MLE may exist but need not be available in a closed form.

Remark: The second order partial derivatives are


2
@ 2 L ( ; jx) n @ [ ( )]
= 2
@ 2 [ ( )] @
n @ 2 [ ( )]
[ ( )] @ 2
@L ( ; jx) n
=
@ @
@ 2 L ( ; jx) n
2 = 2
@
There should be some way of checking the negative de…niteness of the matrix of second order derivatives
2 2
!
n @[ ( )] n @ [ ( )] n
[ ( )]2 @ [ ( )] @ 2
n n
2

at (b mle ; bmle ):
2
Note that for any given value of > 0; @ L(@ ;2 jx) = n 2 < 0; the MLE for is given by bmle = X
: Further the
algorithm for …nding b mle would take care of the fact that it is a maximum and not a minimum.
2 2
Example 19 Let X1 ; X2 ; ; Xn be a random sample from N( ; ); 2 ( 1; 1) ; > 0: Find MOM estimators
and MLEs for and 2 :
2
For N( ; ) distribution pdf (probability density function) is given by

2 1 1 2
fX xj ; =p exp 2
(x ) :
2 2
Therefore, the joint density function of the iid random variables X1 ; X2 ; ; Xn is given by
2 Q
n
2
fX xj ; = fX xi j ;
i=1
Q
n 1 1 2
= p exp 2
(x )
i=1 2 2
2
n
2
1 P
n
2
= 2 exp 2
(xi ) :
2 i=1

We can write down the likelihood as well as loglikelihood functions.


2 2
n
2
1 P
n
2
L ; jx = 2 exp 2
(xi )
2 i=1
2
L ; jx = log L ( ; jx)
n n 2 1 P
n
2
= log 2 log 2
(xi ) :
2 2 2 i=1

5
Case One : 2 > 0 is known.
2
We now …nd the partial derivatives of L ; jx :
2
@L ; jx 1 P
n
= 2
(xi )
@ i=1
@2L ; 2
jx 1
2
= 2
< 0 for all :
@
Equating the …rst order partial derivative to 0 and solving we get

= x:

Therefore
bmle = X:
Note that bmle = X does not depend on 2 > 0 which is assumed to be known. We shall use this observation in the
following case.
Case Two : 2 > 0 is unknown.
Earlier in Case One we observed that bmle = X does not depend on 2 > 0 which was assumed to be known.
Therefore X would continue to be MLE for : We now …nd MLE for 2 :
2
@L ; jx n 1 P
n
2
2
= 2
+ 2 (xi )
@ 2 2 ( 2 ) i=1
Pn
2
(xi )
@2L ; 2
jx n i=1
=
@( 2 )2 2( 2 )2 ( 2 )3
2 P
n 3
1 2
(xi )
n 6 n 17
= 6 i=1 7:
( 2 )2 4 2 25

Equating the …rst order partial derivative to 0 and solving we get

2 1 Pn
2
= (xi ) :
n i=1

@ 2 L( ; 2 jx) 2 1
P
n
2
Further note that @( 2 )2
at = n (xi ) is given by
i=1

@2L ; 2
jx n
= <0
@( 2 )2 2( 2 )2

But bmle = X. Therefore


c2 mle = 1 P Xi
n 2
X :
n i=1

Thus in Case Two

bmle = X
1 Pn 2
and c2 mle = Xi X :
n i=1
Note that this is how we should present our estimators.
2 2
Remark : We could have followed the routine method of simultaneously optimizing L ; jx w.r.t. and :

Method of Moments:

6
2
If both and are unknown we note that E (X) = ; and E X 2 = 2
+ 2
as V ar (X) = 2
: Using MOM
Pn Pn 2
rationale we get bM O M = X = n1 i=1 Xi and b2M O M = n1 i=1 Xi2 X :

bmom = X
1 Pn 2
and c2 mom = Xi X :
n i=1

Note that the MLEs and MOMEs agree. It can be checked that bM O M is unbiased and consistent. b2M O M is asymp-
totically unbiased and consistent.

1.3 Interval Estimation and Con…dence Intervals.


We have just seen some methods and examples of point estimation. One can also talk of interval estimation. In
estimation problem it is as important, if not more, to give a measure of accuracy of the estimator used as the choice
of estimator itself. We therefore report the estimates of variance or standard error of the estimator used, as measure
of the accuracy, along with the estimator itself.
Another method is to suggest interval estimators rather than point estimators. This is closely linked to con…dence
interval. Let us consider a simple example.

Example 20 If we have a random sample of size n from N ; 2 with 2 = 20 known. We know that the sample
p
Pn 2 n(X )
mean X = n1 i=1 Xi is unbiased for : The distribution of X is N ; n0 : Therefore Z = 0
has N (0; 1)
distribution. This may be used to suggest a con…dence interval (CI) for unknown :

Let z be the -quantile of N (0; 1) so that Pr (Z < z ) = : Note therefore that Pr jZj < z =1 :
(1 2 )
Equivalently Pr X p0n z(1 )< <X+ p0 z
n (1 ) =1 :
2 2
Therefore we may suggest
X p0 z(1 0
) ; X + pn z(1 )
n 2 2

as 100 (1 ) % con…dence interval for :


p
n(X )
Example 21 This is continuation of Example (20). If 2 is unknown then we can use the fact that T = S
has t distribution with (n 1) degrees of freedom. Let t(n 1; ) be the -quantile of t distribution with (n 1)
degrees of freedom so that Pr T < t(n 1; ) = : Thus Pr jT j < t =1
((n 1); 1 ( 2 ))
Therefore we may suggest

S S
X p t((n 1);(1 )) ; X + pn t((n 1);(1 ))
n 2 2

as 100 (1 ) % con…dence interval for :

1
Pn 2
Here S 2 = n 1 i=1 Xi X and
Pr jT j < t((n 1);(1 )) = 1 :
2

Remark 22 It is important to interpret 100 (1 ) % con…dence interval (CI) correctly, i.e., to interpret the state-
ments hlike i
Pr 2 X p0n z(1 ) ; X + p0n z(1 ) = 1 ; correctly. Here is not a random quantity, it is unknown
2 2
but …xed. Whereas the interval estimator or the con…dence interval is random and it varies from sample to sample.
Suppose we repeat the experiment of drawing a random sample of size n from N ; 20 ; say, a large number of times,
then we can expect unknown to belong to the 95% CI in 95% of the trials.

You might also like