You are on page 1of 20

Bayes and Minimax Estimation

Prior Distribution
Let f   be the probability distribution of the parameter  which is also summarizes the objective

information about  prior to obtaining sample observation. We will choose f   with sampler

variance, so that f   is the prior distribution of  .

Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f  x |   . This distribution

depends on  , where  is unknown parameter.


Let x1 , x2 , , xn be a random sample, then the joint distribution can be written as
f  x1, x2 , , xn |    f  x1 |   f  xn |  

The posterior distribution of  as the conditional distribution of  given the sample values or sample
measures. So,
f  x1 , x2 , , xn ,   Where,
f  x1 , x2 , , xn  
f  x1 , x2 , , xn  f  x1 , x2 , , xn ,    joint distribution
f  x1 , x2 , , xn |   f   of sample & 

f  x1 , x2 , , xn   f   f  x1 , x2 , , xn |  

Thus f  x1 , x2 , , xn  is known as the posterior distribution of  .

Example: A time failure of a transistor is known to be exponentially distributed with parameter 


having the density function:
f  x     e x ; x0

Assume that the prior distribution of  is given by


g    ke k ;  0

That is,  is also exponentially distributed over the interval  0, . Find the posterior distribution of  .

Solution:
We know that the posterior distribution of  is given by

Bayes and Minimax Estimation ~ 1 of 20


f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
f X1 , X 2 ,
, X n ,

 x1 , x2 , , xn 
, Xn
n
  f  xi   g  
i 1
 n
 f  xi    g    d

i 1
 
n  n 
  xi    xi  k 
 
 ne i 1
ke k  e
n  i 1 
 
n
 n 
   xi     xi  k 
 
 ke  k d
  e  d
n i 1 n  i 1
e
0 0
 n 
   xi  k 
 
 n e  i1 

 n 
   xi  k 
 
e  i 1   n 11d

0
n 1
 n   n 
   xi  k 
    xi  k   n 
  xi  k 
 e
n  i 1 
 
i 1  
 e  i1   n 11
n 1 n 1
n 1
 n 
  xi  k 
 i 1 
 n 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   Gamma  n  1,  xi  k 
 
;  0
 i 1 

Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
 1 2
f x   
1
exp   x     ;   x  
2  2 
Assume that the prior distribution of  is given by
1  1 
g    exp   2  ;    
2  2 

That is,  is standard normal. Find the posterior distribution of  .


Solution:
We know that the posterior distribution of  is given by
f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
, X n ,
f X1 , X 2 ,  x1 , x2 , , xn 
, Xn
n
  f  xi   g  
i 1
 n
 f  xi    g   d

i 1
 

 1 n 2
n
 1   1 
 exp     xi    
1
 exp    2 
 2   2 i 1  2  2 


 1 n 
n
 1   1 
  2  exp  2 
1
 xi   2  exp    2  d
 i 1  2  2 

Bayes and Minimax Estimation ~ 2 of 20


 1 n n 
exp     xi2  2  xi  n 2   2  
 2  i 1 i 1


 
 1 n 2 n 
 exp  2   xi  2  xi  n 2   2   d


  i 1 i 1

 1
 
exp    n  1 2  2 nx 
 2 

 

 exp  2   n  1 


 1 
2
 2 nx  d
 
 n 1  2 n 
exp      2 n  1 x  
 2  
 
 n 1  2 n 
 exp  2   2 n  1 x  d


 n 1  n  n  
2
exp     2  2 x  x  
 2  n 1  n  1   

  n 1  n  n   
2

  2 
        d
2
exp 2 x  x 
   n 1  n  1   

  n  
2
  x 
1 n 1  
exp   
 2 1  
  

  n  1  

  n  
2
    x 
 exp  2  n1 1   d
1
   
  n  1  
  n  
2
   x 
1
exp   
1 n 1  
1  2  1  
2   
n 1   n  1  

  n  
2
  x 
1

1     d
 1
exp 
 2
 n
1
1
 
 2   
n 1   n  1  
 1 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   N   ;
n

x,
 1 
;    
 n 1 n

Posterior Bayes estimator


Let X1 , X 2 , , Xn be a random sample from a density f  x |   , where  is the value of a random

variable  with known density g  . The posterior Bayes estimator of    with respect to the prior

density g  is defined to be

E    | X1 , X 2 , , X n 

Here, it is given that

Bayes and Minimax Estimation ~ 3 of 20


E    | X1  x1 , , X n  xn       f | X1  x1 , , X n  xn  x1 , x2 , , xn  d
n

      f  xi   g   d
i 1
 n
 f  xi    g   d

i 1
 

One might note the similarity between the posterior Bayes estimator of      and the Pitman
estimator of a location parameter.

Example:
Let X1 , X 2 , , X n denote a random sample from Bernoulli density

f  x     x 1   
1 x
I  0, 1  x  for 0  1

Assume that the prior distribution of  is given by


g    I 0, 1  

That is,  is uniformly distributed over the interval  0,1 . Find the posterior distribution of  and find

the Bayes estimator of  and  1    .

Solution:
We know that the posterior distribution of  is given by
f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
f X1 , X 2 ,
, X n ,

 x1 , x2 , , xn 
, Xn
n n

  f  xi    g     xi n

i 1
   i1 1   n  x i 1
i
I  0, 1  
 n
 n

 f  xi    g    d  xi

n
n   xi
i 1
   i 1
1    i 1 I  0, 1   d
n
 xi 11 n
n   xi 11

 i 1
1    i 1

 n n 
   xi  1, n   xi  1
 i 1 i 1 
 n n 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   Beta 1st   ;
  xi  1, n   xi  1 ; 0  1
 i 1 i 1 

Again, we have that the posterior Bayes estimator of  with respect to the prior distribution
g    I 0, 1   is given by

Bayes and Minimax Estimation ~ 4 of 20


E    X1  x1 , , X n  xn      f  X1  x1 ,
 , X n  xn  x1 , x2 , , xn  d

   f X  x , 1 1 , X n  xn  x1 , x2 , , xn  d
n n

    f  xi   g  d  xi n

i 1   i 1
1   n  x i 1
i
I  0, 1   d
 n
 n

 f  xi    g   d  xi

n
n   xi
i 1
   i 1
1    i 1 I  0, 1   d
n
1  xi 1 n
n   xi
  i 1
1    i 1 d
 0
n
1  xi n

  i1 1   n 
i 1
xi
d
0

 n n   n 
   xi  2, n   xi  1 
 xi  1 n  2
  i 1 i 1  
i 1 
 n n  n3

  xi  1, n  xi  1 
 i 1 i 1 
n
 xi  1
 E    X1  x1 , , X n  xn   i 1
n2

Hence, the posterior Bayes estimator of  with respect to the uniform prior distribution is given by
n n n
 xi  1  xi  xi
i 1 i 1 i 1
. Contrast this to the maximum likelihood estimator of  , which is . We know that
n2 n n
is unbiased and UMVUE , whereas the posterior Bayes estimator is not unbiased.

Again, we have that the posterior Bayes estimator of  1    with respect to the prior distribution

g    I 0,1   is given by

Bayes and Minimax Estimation ~ 5 of 20


E    X1  x1 , , X n  xn      f  X1  x1 ,
 , X  x  x1 , x2 , , xn  d
n n

   1    f  X  x , , X  x  x1 , x2 , , xn  d
1 1 n n

n
 f  xi    g  d
  1    
i 1
    1    f X , X , , X ,  x1 , x2 , , xn ,   d
 
1 2 n

 f X , X , , X ,  x1 , x2 , , xn ,   d
n

   f  xi   g  d
i 1
1 2 n

n
 xi n

  1    i1 1   n  x i 1
i
I  0, 1   d
 n
 xi n   xi
n

 i 1
1    i 1 I  0, 1   d
n
1  xi 1 n
n   xi 1  n n 
 i 1
1    i 1 d    xi  2, n   xi  2 
 0
  i 1 i 1 
 n 
n
n
 xi
 
1 n
  xi  1, n  xi  1
 i 1
1   n  x i 1
i
d
 i 1 i 1 
0

 n  n

 xi  1 n  xi  1 n  2 
 
i 1 i 1
n4
 n  n

 xi  1 n  xi  1 
E    X1  x1 , , X n  xn    
i 1 i 1

  2 
n  3 n 

Hence, the posterior Bayes estimator of  1    with respect to the uniform prior distribution is given

 n  n
  xi  1 n   xi  1
by  i 1  i 1
. We noted in the above example that the posterior Bayes estimator that we
 n  3 n  2 
obtained was not unbiased.
The following remark states that in general a posterior Bayes estimator is not unbiased.

Remark: Let TG*  tG*  X1 , X 2 , , Xn  denote the posterior Bayes estimator of    with respect to a

prior distribution G  . If both TG* and    have finite variance, then either

var TG*    0
 

Or, TG* is not an unbiased estimator of    . That is, either TG* estimates    correctly with

probability 1 or TG* is not an unbiased estimator.


Proof:
Let us suppose that TG* is an unbiased estimator of    . That is

 
E TG*     

By the definition, we have that


TG*  tG*  X1 , X 2 , , X n   E    X1 , X 2 , , X n 

Now, we have that


Bayes and Minimax Estimation ~ 6 of 20
   
 
Var TG*  E Var  TG*     Var  E  TG*   
   
 
 

 
 
 E Var  TG*     Var    
 
1

And
Var       E Var     X1 , X 2 , , X n    Var  E     X 1 , X 2 , , X n  
   
 E Var     X1 , X 2 , , X n    Var TG* 
   
Var [T*G]  Var       E Var     X1 , X 2 , , X n    2
 

Now, from equation 1 and  2  we have that

  
E Var  TG*     Var      Var       E Var     X 1 , X 2 ,
  
, X n  


  
E Var  TG*     E Var     X 1 , X 2 , , X n    0
   

Now, since both E Var TG*     and E Var     X1 , X 2 , , X n   are non-negative and their sum is

  

zero, then both are zero.

In particular, E Var TG*     0 and since Var TG*   is non-negative and has zero expectation, then
    


 
Var  TG*    0 .


Loss Function
Consider estimating g   , let t  t  x1, x2 , , xn  denote an estimate of g   . The loss function, denoted

by l t ;   is defined to be a real valued function satisfying

1) l t ;    0 for all possible estimates t and all  in  .

2) l t ;    0 for t  g   .

l t ;   equals the loss incurred if one estimates g   to be t when  is the true parameter value.

The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.
Several Possible Loss Function

1) l1  t ;    t  g   . It is called the squared error loss function.


2

2) l2  t ;    t  g   . It is called the absolute error loss function.

A if t  g    
3) l3  t ;     .
0 if t  g     , where A  0

4) l4  t ;      t  g   for     0 and r  0 .
r

Note that both l1 and l2 increases as the error T  g   increases in magnitude. l3 says that we loss

nothing if the estimate t is within  units of g   and otherwise we loss the amount A . l4 is a

general loss function that includes both l1 and l2 as special cases.


Bayes and Minimax Estimation ~ 7 of 20
Risk Function
For a given loss function l  ;  , the risk function, denoted by Rt   , of an estimator

T  t  X1 , X 2 , , X n  is defined to be

Rt    E l T ; 

The risk function is the average loss. The expectation in the above equation can be taken in two ways.
For example, if the density f  x ;  from which we sampled is a probability density function, then

R  , t   E l T ;    E l  t  X 1 , X 2 , , X n  ;  
n
   l  t  X1 , X 2 , , X n  ;   f  xi ;  dxi
i 1
Or, we can consider the random variable T and the density of T is fT  t  then

R  , t   E l T ;    l T ;  fT  t  dt

Where, fT  t  is the density of the estimator T . In either case, the expectation averages out the values

of x1 , x2 , , xn . Since ˆ is consider to be a random so that risk itself a random variable.

Possible Risk Functions

1) Corresponding to the loss function l1  t ;   t  g   


2
the risk function is given by

Rt    E l1 T ;    E t  g   
2

2) Corresponding to the loss function l2  t ;    t  g   the risk function is given by

Rt    E l2 T ;    E t  g   . It is called the mean absolute error.

3) Corresponding to the loss function


A
 if t  g    
l3  t ;    
0 if t  g     , where A  0

the risk function is given by Rt    E l3 T ;   AP  t  g      .

4) Corresponding to the loss function l4  t ;       t  g   for     0 and r  0 the risk function


r

is given by Rt    E l4 T ;      E  t  g   z r  .

When a loss function is said to be Convex and Strictly Convex?


A real valued function L t ;   defined over an open interval I   a, b with   t  t *  b and any
0   1

 
L  t  1    t *    L  t   1    L t * 1
The function is said to be strictly convex if strict inequality holds in 1 , for all indicated values of

t , t* and  .

Convexity is a vary strong condition which implies, for example, that L is continuous in  a, b  and has

a left and right derivative at every point of  a, b  .

Bayes and Minimax Estimation ~ 8 of 20


Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following
two criteria.
a) If L is defined and differentiable on  a, b  , then a necessary and sufficient condition for L to be
convex is that

 
L  t   L  t * for all a  t  t *  b 1
The function is strictly convex iff 1 is strict for all t  t * .

b) If L is twice differentiable then the necessary and sufficient condition 1 is equivalent to
L  t   0 for all a  t  b

with strict inequality sufficient for strict convexity.

Bayes Estimator With Respect to Loss Function


The Bayes estimator of the parameter  to be the function d of the sample observation x1 , x2 , , xn

that minimizes the expected risk, were expected risk is defined as


B  d   E  R  d ,   
 R  d ,   f   d

   l d  x ,
 
 1 , xn  ;   f  x1 , , xn |   dx1 dxn  f   d

*

Now, interchanging the order of integration we can write * as

B d      l d  x , 1 , xn  ;   f  x1 , , xn |   f   d  dx1

dxn **

The function B  d  will be minimized if we can find the minimized function d i.e. minimizes the

quantity within the third bracket of the equation ** for every set of x values. That is, the Bayes

estimator of  is a function of d of x1 , x2 , , xn that minimizes

 l d  x , , x  ;   f  x , , x |   f   d
1 n 1 n
Since f  x1 , , xn ,    f  x1 , , xn |   f  
  l d  x  ;   f  x , , x ,   d f  x1 , , xn |   f  
f  | x1 , , xn  
1 n

f  x1 , , xn 
 f  x , , x   l d  x  ;   f  | x , , x  d
1 n 1 n

Thus the Bayes estimator of  is the value ˆ that minimizes

 l d  x  ;   f  | x ,
1 , xn  d  Y  say 

If the loss function is the squared error i.e. d  x     then


2


Y   d  x     f  | x1 , 
, xn  d  d  x  f  | x1 , , xn  d
2 2


 2  d  x  f  | x1 , 
, xn  d   2 f  | x1 , , xn  d

Thus minimizing Y with respect to d  x  is

Bayes and Minimax Estimation ~ 9 of 20


Y
0
  d  x  


 2 d  x  f  | x1 , 
, xn  d  2  f  | x1 , , xn  d  0

 d  x 
  f  | x ,
1 , xn  d
 Expected Posterior
 f  | x ,
1 , xn  d

Hence, d  x  is the Bayes estimate for  if the loss function is in squared error.

Advantages of Bayesian Approach


Bayesian approach has the following advantages over classical approach.
a) We make inferences about the unknown parameters given the data whereas in the classical
approach we look at the long run behavior e.g. in 95% of experiments p will lie between
p  and p  .

b) The posterior distribution tells the whole story and if a point estimate or confidence interval be
desired they can immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical
point of view.
Note:
a) A decision Rule  is said to be uniformly better than a decision rule  if
R  ,    R  ,      with strict inequality holding for some  .

b) A decision rule  * is said to be uniformly best in a class of decision rules D if  * is uniformly


better than any other decision rule   D .
c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D
which is uniformly better that that  .
Example: Let X1 , X 2 , , Xn be independent N   ,  2  variables where  is unknown but  2 is

known. Let the prior distribution of  be N  ,  2  . Find the Bayes estimate of  .

Solution:
The joint conditional distribution of the sample given  is
n
 1   1 2
  xi   
2
f  x1 , , xn |      exp   2 
 2 2   2 
n
 1  2  n 2
  xi  x 
1
exp   2  x     2
2
 2  
 2   2 2 
 1 2
 f  x1 , , xn |    exp   2  x    
 2 

The posterior distribution of  given x is

Bayes and Minimax Estimation ~ 10 of 20


f   |   f  x1 , , xn |  
g   | x1 , , xn    f   |   f  x1 , , xn |  
f  x1 , , xn 
 n 1 2
 exp   2  x     2      
2

 2 2 0 
 1  n 2   2  nx 02   02 
2

 exp    02 2     
 2   0  
 n 02   2  
 
 nx 02   02  02 2 
 f  | x ~ N  , 
 n 0   n 02   2 
2 2

nx 02   02
If the loss function is squared error, the Bayes estimator of  is
n 02   2

Theorem
Let X1 , X 2 , , X n be a random sample from the density f  x |   and let g   be the density of  .
Further, let l t ;   be the loss function for estimating    . The Bayes estimator of    is that
estimator t*  ; ,  which minimizes

 l t  x , x ,

1 2 , xn  ;   f X1  x1 , , X n  xn  x1, x2 , , xn  d

as a function of t  ; ,  .

Proof
For a general loss function l t ;   , we seek that estimator, say t*  ; ,  , which minimizes the
expression
 Rt   g   d   E l  t ;   g   d
 

 E l  t  x1 , x2 ,
 , xn  ;    g    d

 n 
  l  t  x1 ,
 , xn  ;   f X1 , X 2 , , Xn   x1 , x2 , , xn    dxi  g   d
R
 i 1 
 f X , , X n   x1 , , xn   g    d  n
  l  t  x1 , , xn  ;   1
  f X , , X  x1 , , xn   dxi

R 
f X 1 , , X n
 x1 , , xn   1

n
i 1

  n
  l  t  x1 , , xn  ;   f  X1  x1 ,...., X n  xn  x1 , , xn  d  f X1 , , X n  x1 , , xn 
 dxi 

R 

 i 1

Since, the integral is non-negative, the double integral can be minimized if the expression within the
braces, which is sometimes called the posterior risk, is minimized for each x1 , x2 , , xn .

So, in general, the Bayes estimator of    with respect to the loss function l  ;  and prior density

g  is that estimator which minimizes the posterior risk, which is the expected loss with respect to the

posterior distribution of  given the observations x1 , x2 , , xn .

That is, the Bayes estimator of    is that estimator which minimizes


 l  t  x1 , , xn  ;   f  X1  x1 , , X n  xn  x1 , , xn  d

Hence, the theorem is proved.


Theorem
Bayes and Minimax Estimation ~ 11 of 20
Let X1 , X 2 , , Xn be a random sample from the density f  x |   and let g   be the density of  .

Further, let l t ;   be the squared-error loss function for estimating    . That is,

l  t ;    t  x1 , , xn      
2

Then the Bayes estimator of    is given by


n
 f  xi   g   d
     
E    | X1  x1 , , X n  xn   i 1
n
 f  xi    g   d

i 1
 

Proof
We, know that the Bayes estimator of    with respect to the loss function l  ;  and prior density

g  is that estimator which minimizes the posterior risk, which is the expected loss with respect to the

posterior distribution of  given the observations x1 , x2 , , xn .

That is, the Bayes estimator of    is that estimator which minimizes

 l  t  x1 , , xn  ;   f  X1  x1 , , X n  xn  x1 , , xn  d

Here, the loss function is squared error loss function. So, we have that the Bayes estimator of    is
that estimator which minimizes

 t  x1 , , xn       f | X1  x1 ,  x1 , , xn  d
2
, X n  xn

    t  x1 , , xn   f | X1  x1 ,  x1 , , xn  d
2
 , X n  xn

But the expression in the above is the conditional expectation of

    t  x1 , , xn  
2

with respect to the posterior distribution of  given X1  x1 , , X n  xn , which is minimized as a

function of t  x1 , , xn  for t *  x1 , , xn  equal to the conditional expectation of    with respect to the

posterior distribution of  given X1  x1 , , X n  xn .

 Recall that E  Z  a 2 is minimized as a function of a for a*  E  Z 


 

Hence, the Bayes estimator of    with respect to the squared-error loss function is given by
n

     f  xi   g   d
E    X1  x1 , , X n  xn   n
i 1

 f  xi    g   d

i 1
 

Hence, the theorem is proved.

Theorem

Bayes and Minimax Estimation ~ 12 of 20


Let X1 , X 2 , , Xn be a random sample from the density f  x |   and let g   be the density of  .

Further, let l t ;   be the absolute-error loss function for estimating    . That is,

l  t ;    t  x1 , , xn     

Then the Bayes estimator of    is given by the median of the posterior distribution of  given
X 1  x1 , , X n  xn .

Proof
We know that the Bayes estimator of    with respect to the loss function l  ;  and prior density

g  is that estimator which minimizes the posterior risk, which is the expected loss with respect to the

posterior distribution of  given the observations x1 , x2 , , xn .

That is, the Bayes estimator of    is that estimator which minimizes

 l  t  x1 , , xn  ;   f  X1  x1 , , X n  xn  x1 , , xn  d

Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of    is
that estimator which minimizes

 t  x1 , , xn      f | X1  x1 , , X n  xn  x1 , , xn  d

      t  x1 , , xn  f | X1  x1 , , X n  xn  x1 , , xn  d

But the expression in the above is the conditional expectation of


t  x1 , , xn     

with respect to the posterior distribution of  given X1  x1 , , X n  xn , which is minimized as a

function of t  x1 , , xn  for t *  x1 , , xn  equal to the conditional median with respect to the posterior

distribution of  given X1  x1 , , X n  xn .

 Recall that E Z  a is minimized as a function of a for a*  median of Z 


 

Hence, the Bayes estimator of    with respect to the squared-error loss function is given by the

median of the posterior distribution of  given X1  x1 , , X n  xn . (Proved)


Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
1  1 2
f x |   exp   x     ;   x  
2  2 

Assume that the prior distribution of  is given by


1  1 2
g    exp    0   ;    
2  2 

That is,  is standard normal. Write 0  x0 when convenient. Find the Bayes estimator of    with
respect to the squared error loss function.
Solution
Bayes and Minimax Estimation ~ 13 of 20
We know that the Bayes estimator of    with respect to the squared error loss function is given by
n

     f  xi   g   d
E    X1  x1 , , X n  xn   n
i 1

 f  xi    g   d

i 1
 

We know that the posterior distribution of  is given by


n
  f  xi   g  
f  X1  x1 , , X n  xn  | x1 , , xn   i 1
n
 f  xi    g    d

i 1
 
n
 1   1 n   1 2

1
  exp    xi   2  exp     0  
 2   2 i 1  2  2 

 n
 1   1 n   1 2

1
   exp    xi   2  exp     0   d
  2   2 i 1  2  2 
  1  
2


 xi 
   i 0  
 
 1 
1 n 1  
 f  X1  x1 , , X n  xn  | x1 , , xn   exp    
1
2  2 1  
n 1   n 1  
   
 
   

So, the Bayes estimator of  with respect to the squared error loss function is given by
n
 f  xi    g   d
     
E    | X1  x1 , , X n  xn   i 1
n
     f  X1  x1 , , X n  xn  x1 , , xn  d
 f  xi    g   d

i 1
 

  1 
2


 
   i 0
xi 



  1  
1 n 1
  1
exp   
 2 1   d

 2  
n 1   n 1  
   
   
1

x 1

x
i
 i 0 i
Now, let n 1  z   i 0 
1
z  d 
1
dz
1 n 1 n 1 n 1
n 1

Now, we have that,


 1 
   xi 
  1  1
z 
1 1
E    | X1  x1 , , X n  xn     i 0
 exp   z 2  dz
 
n 1 n 1 1
2  2  n 1

  n 1
 
1
 xi 1

1  1 
n 1 
i 0
  z exp   z 2  dz
n 1 
2  2 

Bayes and Minimax Estimation ~ 14 of 20


1
 xi 1 0 1  1 

1  1  
 i 0
   z exp   z 2  dz   z exp   z 2  dz 
n 1 n  1   2  2  0
2  2  
1
 xi 1   1  1 

1  1  
 i 0
   z exp   z 2  dz   z exp   z 2  dz 
n 1 n  1  0 2  2  0
2  2  
1
 xi
 E    | X1  x1 , , X n  xn   i 0
n 1
So, here we have that the Bayes estimator of  with respect to the squared error loss is given by
1 1 1
 xi x0   xi 0   xi
i 0 i 1 i 1
 
n 1 n 1 n 1
Since, the posterior distribution of  is normal, its mean and median are the same. Hence,
1 1 1
 xi x0   xi 0   xi
i 0 i 1 i 1
 
n 1 n 1 n 1
is also the Bayes estimator with respect to the absolute-error loss function.

Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
1
f x |  
I  x
  0,  
Assume that the prior distribution of  is given by g    I0,1  
That is,  is standard uniform. Find the Bayes estimator of    with respect to the squared error loss
function
 t   2 .
l t ;    2

Solution:
We know that the Bayes estimator of    with respect to any general loss function such as
 t   2
l t ;   
2
is that estimator which minimizes
 l  t  x1 , , xn  ;   f  X1  x1 , , X n  xn  x1 , , xn  d

We know that the posterior distribution of  is given by

Bayes and Minimax Estimation ~ 15 of 20


n n n
 f  xi    g    1
    
 
 I0,   xi  I0,1  
f  X1  x1 , , X n  xn  | x1 , , xn   i 1
n
 n
i 1
n
 f  xi    g    d 1

i 1
       I  0,   xi  I  0,1  d
i 1
n n
1
 
 
 I0,   xi 
i 1
 1 n n
1
   I0,   xi d
0  i 1
n
1
   I  yn ,1  
  
1 n
1
    I  yn ,1   d
yn  
n n
1 1
   I  yn ,1      I  yn ,1  
  1
   1
   n 1  1   n 1 
    n  1   yn
 n  1  yn
n n
1 1
   I  yn ,1      I  yn ,1  
     
1 1  y  n 1  1  1 
  n  1 
n
  n 1  1
 n  1  yn 

Now the Bayes estimator of    with respect to any general loss function such as

 t   2
l t ;   
2
is that estimator which minimizes
n
1
t    2
t    2    I  yn ,1  
 
 f  X1  x1 , , X n  xn  x1 , , xn  d   d
 2 1  1 
2
 
  1
 n  1  ynn 1 
n
1
 t  yn      
1 2
 
  2 1  1 
d
yn
  1
 n  1  ynn 1 
Or, that estimator which minimizes

 t  yn      t  yn    
1 2 n 1 2
1
 2
   d 
 
  n2
d
yn yn
1 1 1
1 1 1
 t  yn     n2 d  2t  yn     n d  A
2
n 1
d 
yn yn yn

Here, equation  A is a quadratic equation in t  . This quadratic equation assumes its minimum for

Bayes and Minimax Estimation ~ 16 of 20


1
1
  n1 d 1 

1  yn n 

t *  yn  
yn
  n
1 1   n 1 
d   n  1 1  yn
1
 n2 
yn

n  1 ynn  1 y n 1
  n  n n1
n yn yn  1
n  1 ynn  1
 t *  yn    n 1  yn
n yn  1

So, the Bayes estimator of    with respect to the squared error loss function

 t   2
l t ;   
2
is given by
n  1 ynn  1
t *  yn    n 1  yn .
n yn  1

Admissible Estimator
For two estimators T1  t1  X1, X 2 , , X n  and T2  t2  X1 , X 2 , , X n  , the estimator T1 is defined to be a

better estimator than T2 if and only if


Rt1    Rt2 for all  in  and
Rt1    Rt2 for at least one  in 

An estimator T  t  X1, X 2 , , X n  is defined to be admissible if and only if there is no better estimator.

Example: Using the squared error loss function l  t ,    t   2 , estimators for the location parameters

of a Normal distribution given a sample of size n are the sample mean t1  x   x , sample median

t2  x   m , the weighted mean t3  x    wi xi ;  wi  1 . Their respective risk functions are


2 2 1 2
R1 
n
, R2  1.57
n
; R3   2  
n
  wi  w 

Since R1  R2 or , R1  R3 . So x is an admissible estimator of the location parameter  of normal


distribution.
Inadmissibility of an Estimator
An estimator t is said to be inadmissible if there exists another estimator t  which dominates it such
that
R  , t    R  , t  for all  in  and
R  , t    R  , t   for some  in 

Finding Inadmissible Estimator


To find the inadmissibility of an estimator t , we may use the following lemma.

Bayes and Minimax Estimation ~ 17 of 20


Let the range of estimator    be a, b and the loss function L  , t   0 and for any fixed  , L  , t  is

increasing and for any fixed  , L  , t  is increasing as t moves away from    in either direction.

Then any estimator taking on values outside the closed interval a, b with positive probability is
inadmissible.

Properties of Admissible Estimator


The properties of admissible estimators are as follows.
a) If the loss function L is strictly convex, then every admissible estimator must be non-

randomized.

b) If L is strictly convex and t is an admissible estimator of    and if t  is another estimator with

the same risk function i.e. R  , t   R  , t  for all  then t  t  with probability 1 .

c) Any unique Bayes estimator is admissible.

Minimax Estimator
An estimator T * is defined to be a minimax estimator if and only if


 
Sup R  , t *  Sup R  , t 

for every estimator t

Properties of Minimax Estimator


The properties of minimax estimator are given below.
a) One appealing feature of the minimax estimator is that it does not depend on the particular

parameterization.

b) If g  be a prior distribution of  such that  R  , t g  g   d  sup R  , t g  then


i. t g is minimax

ii. if t g is the unique Bayes solution with respect to g  , it is unique minimax procedure.

c) If a Bayes estimator t g has constant risk then it is minimax.

d) If t  dominates by a minimax estimator t , then t  is also minimax.

e) If an estimator has constant risk and is admissible, it is minimax.

f) The best equivalent estimator may be frequently minimax.

Example: Suppose that   1, 2  , where 1 corresponds to oil and  2 to no oil. Let A  a1, a2 , a3

where ai corresponds to the choice i , i  1, 2, 3 . Suppose that the following table gives the losses for
the decision problem.
Bayes and Minimax Estimation ~ 18 of 20
Partial
Drill a1 Sell a2
a3
Oil 1 0 10 5
No oil  2 12 1 6

If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.
An esperiment is conducted to obtain the information about  , resulting is the random variable X
with possible values coded as 0 and 1 given by
x
0 1
Oil 1 0.3 0.7
No oil  2 0.6 0.4

When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7
 P  x  0 | 1   0.3 and P  x  1| 1   0.7

Now the possible decision rules i  x  are.

i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3

Here, 1  Take action a1 regurdless of the value of X


Take action a1 if X  0
2  
Take action a2 if X  1
and so on.

Then the risk of  at  is


R  ,    E l  ,   x   
 l  , a1  P   x   a1   l  , a2  P   x   a2   l  , a3  P   x   a3 

Now, R 1 ,  2   0  0.3  10  0.7  7


R  2 ,  2   12  0.6  1 0.4  7.06

Thus we get,
1 2 3 4 5 6 7 8 9
R 1 , i  0 7 3.5 3 10 6.5 1.5 8.5 5

R 2 , i  12 7.06 9.6 5.4 1 3 8.4 4.0 6

Max  R 1 , i  , R 2 , i  12 7.06 9.6 5.4 10 6.5 8.4 8.5 6

min  Max  R 1 , i  , R 2 , i   5.4

a1 if x0
Thus minimax solution is  4  x   
a2 if x 1

Again, R 2 , 4   5.4  R 2 , 2   7.6 , so  2 is inadmissible.

Bayes and Minimax Estimation ~ 19 of 20


Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat
the parameter as a random variable  with possible values 1 ,  2 and the frequency function is
 1   0.2,  2   0.8

So that Baye’s risk is


R    E  R  ,     0.2 R 1 ,    0.8R  2 ,  
 R 1   0.2  0  0.8  12  9.6
R  2   0.2  7  0.8  7.6  7.46
R  3   0.2  3.5  0.8  9.6  8.38
and so on.

So we compute the following table

i 1 2 3 4 5 6 7 8 9

R  i  9.6 7.48 8.38 4.92 2.8 3.7 7.02 4.9 5.8

In the Bayesian framework  is preferable to   if and only if it has smaller Bayes risk. If there is a
rule  * which attains the minimum Bayes risk i.e. such that R  *   min R    2.8 then it is called a

Bayes rule. From this example we say that rule  5  2.8 is the unique Bayes rule for our prior
distribution.

Example: Let X ~ b 1, p  , p     ,  and Aa1 , a2  . Let the loss function be defined as follows.
1 1
4 2

a1 a2
1
p1  1 4
4
1
p2  3 2
2

The set of decision rules includes four functions: 1 ,  2 ,  3 ,  4 , defined by


1  0   1 1  a1
 2  0   a1  2 1  a2
 3  0   a2  3 1  a1
 4  0    4 1  a2

The risk function takes the following values:

i R  p1 , i  R  p2 , i  Max  R  p,  i   min Max  R  p,  i 


p1 , p2 i p1 , p2

1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4
Thus the minimax solution is
a1 if x0
2  x  
a2 if x 1
Bayes and Minimax Estimation ~ 20 of 20

You might also like