Bayes and Minimax Estimation: Prior Distribution

Bayes and Minimax Estimation
Prior Distribution
Let f   be the probability distribution of the parameter  which is also summarizes the objective
information about  prior to obtaining sample observation. We will choose f   with sampler
variance, so that f   is the prior distribution of  .
Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f  x |   . This distribution
depends on  , where  is unknown parameter.

Let x1 , x2 , , xn be a random sample, then the joint distribution can be written as
f  x1, x2 , , xn |    f  x1 |   f  xn |  
The posterior distribution of  as the conditional distribution of  given the sample values or sample
measures. So,
f  x1 , x2 , , xn ,   Where,
f  x1 , x2 , , xn  
f  x1 , x2 , , xn  f  x1 , x2 , , xn ,    joint distribution
f  x1 , x2 , , xn |   f   of sample & 

f  x1 , x2 , , xn   f   f  x1 , x2 , , xn |  
Thus f  x1 , x2 , , xn  is known as the posterior distribution of  .
Example: A time failure of a transistor is known to be exponentially distributed with parameter 

having the density function:
f  x     e x ; x0
Assume that the prior distribution of  is given by

g    ke k ;  0
That is,  is also exponentially distributed over the interval  0, . Find the posterior distribution of  .
Solution:
We know that the posterior distribution of  is given by
Bayes and Minimax Estimation ~ 1 of 20

f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
f X1 , X 2 ,
, X n ,
 x1 , x2 , , xn 
, Xn
n
  f  xi   g  
i 1
 n
 f  xi    g    d

i 1
 
n  n 
  xi    xi  k 
 
 ne i 1
ke k  e
n  i 1 
 
n
 n 
   xi     xi  k 
 
 ke  k d
  e  d
n i 1 n  i 1
e
0 0
 n 
   xi  k 
 
 n e  i1 

 n 
   xi  k 
 
e  i 1   n 11d
0
n 1
 n   n 
   xi  k 
    xi  k   n 
  xi  k 
 e
n  i 1 
 
i 1  
 e  i1   n 11
n 1 n 1
n 1
 n 
  xi  k 
 i 1 
 n 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   Gamma  n  1,  xi  k 
 
;  0
 i 1 
Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
 1 2
f x   
1
exp   x     ;   x  
2  2 
1  1 
g    exp   2  ;    
2  2 
That is,  is standard normal. Find the posterior distribution of  .

Solution:
f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
, X n ,
f X1 , X 2 ,  x1 , x2 , , xn 
, Xn
n
  f  xi   g  
i 1
 n
 f  xi    g   d

i 1
 
 1 n 2
n
 1   1 
 exp     xi    
1
 exp    2 
 2   2 i 1  2  2 


 1 n 
n
 1   1 
  2  exp  2 
1
 xi   2  exp    2  d
 i 1  2  2 

 1 n n 
exp     xi2  2  xi  n 2   2  
 2  i 1 i 1


 
 1 n 2 n 
 exp  2   xi  2  xi  n 2   2   d


  i 1 i 1
 1
 
exp    n  1 2  2 nx 
 2 

 
 exp  2   n  1 

 1 
2
 2 nx  d
 
 n 1  2 n 
exp      2 n  1 x  
 2  
 
 n 1  2 n 
 exp  2   2 n  1 x  d

 n 1  n  n  
2
exp     2  2 x  x  
 2  n 1  n  1   

  n 1  n  n   
2
  2 
        d
2
exp 2 x  x 
   n 1  n  1   
  n  
2
  x 
1 n 1  
exp   
 2 1  
  

  n  1  

  n  
2
    x 
 exp  2  n1 1   d
1
   
  n  1  
  n  
2
   x 
1
exp   
1 n 1  
1  2  1  
2   
n 1   n  1  

  n  
2
  x 
1

1     d
 1
exp 
 2
 n
1
1
 
 2   
n 1   n  1  
 1 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   N   ;
n

x,
 1 
;    
 n 1 n
Posterior Bayes estimator

Let X1 , X 2 , , Xn be a random sample from a density f  x |   , where  is the value of a random
variable  with known density g  . The posterior Bayes estimator of    with respect to the prior
density g  is defined to be
E    | X1 , X 2 , , X n 
Here, it is given that

E    | X1  x1 , , X n  xn       f | X1  x1 , , X n  xn  x1 , x2 , , xn  d
n
      f  xi   g   d
i 1
 n
 f  xi    g   d

i 1
 
One might note the similarity between the posterior Bayes estimator of      and the Pitman
estimator of a location parameter.
Example:
Let X1 , X 2 , , X n denote a random sample from Bernoulli density
f  x     x 1   
1 x
I  0, 1  x  for 0  1

g    I 0, 1  
That is,  is uniformly distributed over the interval  0,1 . Find the posterior distribution of  and find
the Bayes estimator of  and  1    .
Solution:
f X1 , X 2 ,  x1 , x2 , , xn ,  
f  X1  x1 , , X n  xn  x1 , x2 , , xn  
f X1 , X 2 ,
, X n ,
 x1 , x2 , , xn 
, Xn
n n
  f  xi    g     xi n
i 1
   i1 1   n  x i 1
i
I  0, 1  
 n
 n
 f  xi    g    d  xi

n
n   xi
i 1
   i 1
1    i 1 I  0, 1   d
n
 xi 11 n
n   xi 11

 i 1
1    i 1
 n n 
   xi  1, n   xi  1
 i 1 i 1 
 n n 
 f  X1  x1 , , X n  xn  x1 , x2 , , xn   Beta 1st   ;
  xi  1, n   xi  1 ; 0  1
 i 1 i 1 
Again, we have that the posterior Bayes estimator of  with respect to the prior distribution
g    I 0, 1   is given by

E    X1  x1 , , X n  xn      f  X1  x1 ,
 , X n  xn  x1 , x2 , , xn  d
   f X  x , 1 1 , X n  xn  x1 , x2 , , xn  d
n n
    f  xi   g  d  xi n
i 1   i 1
1   n  x i 1
i
I  0, 1   d
 n
 n
 f  xi    g   d  xi

n
n   xi
i 1
   i 1
1    i 1 I  0, 1   d
n
1  xi 1 n
n   xi
  i 1
1    i 1 d
 0
n
1  xi n
  i1 1   n 
i 1
xi
d
0
 n n   n 
   xi  2, n   xi  1 
 xi  1 n  2
  i 1 i 1  
i 1 
 n n  n3

  xi  1, n  xi  1 
 i 1 i 1 
n
 xi  1
 E    X1  x1 , , X n  xn   i 1
n2
Hence, the posterior Bayes estimator of  with respect to the uniform prior distribution is given by
n n n
 xi  1  xi  xi
i 1 i 1 i 1
. Contrast this to the maximum likelihood estimator of  , which is . We know that
n2 n n
is unbiased and UMVUE , whereas the posterior Bayes estimator is not unbiased.
Again, we have that the posterior Bayes estimator of  1    with respect to the prior distribution
g    I 0,1   is given by

E    X1  x1 , , X n  xn      f  X1  x1 ,
 , X  x  x1 , x2 , , xn  d
n n
   1    f  X  x , , X  x  x1 , x2 , , xn  d
1 1 n n
n
 f  xi    g  d
  1    
i 1
    1    f X , X , , X ,  x1 , x2 , , xn ,   d
 
1 2 n
 f X , X , , X ,  x1 , x2 , , xn ,   d
n
   f  xi   g  d
i 1
1 2 n
n
 xi n
  1    i1 1   n  x i 1
i
I  0, 1   d
 n
 xi n   xi
n
 i 1
1    i 1 I  0, 1   d
n
1  xi 1 n
n   xi 1  n n 
 i 1
1    i 1 d    xi  2, n   xi  2 
 0
  i 1 i 1 
 n 
n
n
 xi
 
1 n
  xi  1, n  xi  1
 i 1
1   n  x i 1
i
d
 i 1 i 1 
0
 n  n

 xi  1 n  xi  1 n  2 
 
i 1 i 1
n4
 n  n

 xi  1 n  xi  1 
E    X1  x1 , , X n  xn    
i 1 i 1

  2 
n  3 n 
Hence, the posterior Bayes estimator of  1    with respect to the uniform prior distribution is given
 n  n
  xi  1 n   xi  1
by  i 1  i 1
. We noted in the above example that the posterior Bayes estimator that we
 n  3 n  2 
obtained was not unbiased.
The following remark states that in general a posterior Bayes estimator is not unbiased.
Remark: Let TG*  tG*  X1 , X 2 , , Xn  denote the posterior Bayes estimator of    with respect to a
prior distribution G  . If both TG* and    have finite variance, then either
var TG*    0
 
Or, TG* is not an unbiased estimator of    . That is, either TG* estimates    correctly with
probability 1 or TG* is not an unbiased estimator.

Proof:
Let us suppose that TG* is an unbiased estimator of    . That is
 
E TG*     
By the definition, we have that

TG*  tG*  X1 , X 2 , , X n   E    X1 , X 2 , , X n 
Now, we have that

   
 
Var TG*  E Var  TG*     Var  E  TG*   
   
 
 
 
 
 E Var  TG*     Var    
 
1
And
Var       E Var     X1 , X 2 , , X n    Var  E     X 1 , X 2 , , X n  
   
 E Var     X1 , X 2 , , X n    Var TG* 
   
Var [T*G]  Var       E Var     X1 , X 2 , , X n    2
 
Now, from equation 1 and  2  we have that
  
E Var  TG*     Var      Var       E Var     X 1 , X 2 ,
  
, X n  


  
E Var  TG*     E Var     X 1 , X 2 , , X n    0
   
Now, since both E Var TG*     and E Var     X1 , X 2 , , X n   are non-negative and their sum is

  
zero, then both are zero.
In particular, E Var TG*     0 and since Var TG*   is non-negative and has zero expectation, then
    

 
Var  TG*    0 .

Loss Function
Consider estimating g   , let t  t  x1, x2 , , xn  denote an estimate of g   . The loss function, denoted
by l t ;   is defined to be a real valued function satisfying
1) l t ;    0 for all possible estimates t and all  in  .
2) l t ;    0 for t  g   .
l t ;   equals the loss incurred if one estimates g   to be t when  is the true parameter value.
The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.
Several Possible Loss Function
1) l1  t ;    t  g   . It is called the squared error loss function.

2
2) l2  t ;    t  g   . It is called the absolute error loss function.
A if t  g    
3) l3  t ;     .
0 if t  g     , where A  0
4) l4  t ;      t  g   for     0 and r  0 .
r
Note that both l1 and l2 increases as the error T  g   increases in magnitude. l3 says that we loss
nothing if the estimate t is within  units of g   and otherwise we loss the amount A . l4 is a
general loss function that includes both l1 and l2 as special cases.

Risk Function
For a given loss function l  ;  , the risk function, denoted by Rt   , of an estimator
T  t  X1 , X 2 , , X n  is defined to be
Rt    E l T ; 
The risk function is the average loss. The expectation in the above equation can be taken in two ways.
For example, if the density f  x ;  from which we sampled is a probability density function, then
R  , t   E l T ;    E l  t  X 1 , X 2 , , X n  ;  
n
   l  t  X1 , X 2 , , X n  ;   f  xi ;  dxi
i 1
Or, we can consider the random variable T and the density of T is fT  t  then
R  , t   E l T ;    l T ;  fT  t  dt

Where, fT  t  is the density of the estimator T . In either case, the expectation averages out the values
of x1 , x2 , , xn . Since ˆ is consider to be a random so that risk itself a random variable.
Possible Risk Functions
1) Corresponding to the loss function l1  t ;   t  g   

2
the risk function is given by
Rt    E l1 T ;    E t  g   
2
2) Corresponding to the loss function l2  t ;    t  g   the risk function is given by
Rt    E l2 T ;    E t  g   . It is called the mean absolute error.
3) Corresponding to the loss function

A
 if t  g    
l3  t ;    
0 if t  g     , where A  0
the risk function is given by Rt    E l3 T ;   AP  t  g      .
4) Corresponding to the loss function l4  t ;       t  g   for     0 and r  0 the risk function

r
is given by Rt    E l4 T ;      E  t  g   z r  .
When a loss function is said to be Convex and Strictly Convex?

A real valued function L t ;   defined over an open interval I   a, b with   t  t *  b and any
0   1
 
L  t  1    t *    L  t   1    L t * 1
The function is said to be strictly convex if strict inequality holds in 1 , for all indicated values of
t , t* and  .
Convexity is a vary strong condition which implies, for example, that L is continuous in  a, b  and has
a left and right derivative at every point of  a, b  .

Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following
two criteria.
a) If L is defined and differentiable on  a, b  , then a necessary and sufficient condition for L to be
convex is that
 
L  t   L  t * for all a  t  t *  b 1
The function is strictly convex iff 1 is strict for all t  t * .
b) If L is twice differentiable then the necessary and sufficient condition 1 is equivalent to
L  t   0 for all a  t  b
with strict inequality sufficient for strict convexity.
Bayes Estimator With Respect to Loss Function

The Bayes estimator of the parameter  to be the function d of the sample observation x1 , x2 , , xn
that minimizes the expected risk, were expected risk is defined as

B  d   E  R  d ,   
 R  d ,   f   d
   l d  x ,
 
 1 , xn  ;   f  x1 , , xn |   dx1 dxn  f   d

*
Now, interchanging the order of integration we can write * as
B d      l d  x , 1 , xn  ;   f  x1 , , xn |   f   d  dx1

dxn **
The function B  d  will be minimized if we can find the minimized function d i.e. minimizes the
quantity within the third bracket of the equation ** for every set of x values. That is, the Bayes
estimator of  is a function of d of x1 , x2 , , xn that minimizes
 l d  x , , x  ;   f  x , , x |   f   d
1 n 1 n
Since f  x1 , , xn ,    f  x1 , , xn |   f  
  l d  x  ;   f  x , , x ,   d f  x1 , , xn |   f  
f  | x1 , , xn  
1 n

f  x1 , , xn 
 f  x , , x   l d  x  ;   f  | x , , x  d
1 n 1 n
Thus the Bayes estimator of  is the value ˆ that minimizes
 l d  x  ;   f  | x ,
1 , xn  d  Y  say 
If the loss function is the squared error i.e. d  x     then

2

Y   d  x     f  | x1 , 
, xn  d  d  x  f  | x1 , , xn  d
2 2

 2  d  x  f  | x1 , 
, xn  d   2 f  | x1 , , xn  d
Thus minimizing Y with respect to d  x  is

Y
0
  d  x  

 2 d  x  f  | x1 , 
, xn  d  2  f  | x1 , , xn  d  0
 d  x 
  f  | x ,
1 , xn  d
 Expected Posterior
 f  | x ,
1 , xn  d
Hence, d  x  is the Bayes estimate for  if the loss function is in squared error.
Advantages of Bayesian Approach

Bayesian approach has the following advantages over classical approach.
a) We make inferences about the unknown parameters given the data whereas in the classical
approach we look at the long run behavior e.g. in 95% of experiments p will lie between
p  and p  .
b) The posterior distribution tells the whole story and if a point estimate or confidence interval be
desired they can immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical
point of view.
Note:
a) A decision Rule  is said to be uniformly better than a decision rule  if
R  ,    R  ,      with strict inequality holding for some  .
b) A decision rule  * is said to be uniformly best in a class of decision rules D if  * is uniformly

better than any other decision rule   D .
c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D
which is uniformly better that that  .
Example: Let X1 , X 2 , , Xn be independent N   ,  2  variables where  is unknown but  2 is
known. Let the prior distribution of  be N  ,  2  . Find the Bayes estimate of  .
Solution:
The joint conditional distribution of the sample given  is
n
 1   1 2
  xi   
2
f  x1 , , xn |      exp   2 
 2 2   2 
n
 1  2  n 2
  xi  x 
1
exp   2  x     2
2
 2  
 2   2 2 
 1 2
 f  x1 , , xn |    exp   2  x    
 2 
The posterior distribution of  given x is

f   |   f  x1 , , xn |  
g   | x1 , , xn    f   |   f  x1 , , xn |  
f  x1 , , xn 
 n 1 2
 exp   2  x     2      
2
 2 2 0 
 1  n 2   2  nx 02   02 
2
 exp    02 2     
 2   0  
 n 02   2  
 
 nx 02   02  02 2 
 f  | x ~ N  , 
 n 0   n 02   2 
2 2
nx 02   02
If the loss function is squared error, the Bayes estimator of  is
n 02   2
Theorem
Let X1 , X 2 , , X n be a random sample from the density f  x |   and let g   be the density of  .
Further, let l t ;   be the loss function for estimating    . The Bayes estimator of    is that
estimator t*  ; ,  which minimizes
 l t  x , x ,

1 2 , xn  ;   f X1  x1 , , X n  xn  x1, x2 , , xn  d
as a function of t  ; ,  .
Proof
For a general loss function l t ;   , we seek that estimator, say t*  ; ,  , which minimizes the
expression
 Rt   g   d   E l  t ;   g   d
 
 E l  t  x1 , x2 ,
 , xn  ;    g    d

 n 
  l  t  x1 ,
 , xn  ;   f X1 , X 2 , , Xn   x1 , x2 , , xn    dxi  g   d
R
 i 1 
 f X , , X n   x1 , , xn   g    d  n
  l  t  x1 , , xn  ;   1
  f X , , X  x1 , , xn   dxi

R 
f X 1 , , X n
 x1 , , xn   1

n
i 1
  n
  l  t  x1 , , xn  ;   f  X1  x1 ,...., X n  xn  x1 , , xn  d  f X1 , , X n  x1 , , xn 
 dxi 

R 

 i 1
Since, the integral is non-negative, the double integral can be minimized if the expression within the
braces, which is sometimes called the posterior risk, is minimized for each x1 , x2 , , xn .
So, in general, the Bayes estimator of    with respect to the loss function l  ;  and prior density
g  is that estimator which minimizes the posterior risk, which is the expected loss with respect to the
posterior distribution of  given the observations x1 , x2 , , xn .
That is, the Bayes estimator of    is that estimator which minimizes

 l  t  x1 , , xn  ;   f  X1  x1 , , X n  xn  x1 , , xn  d

Hence, the theorem is proved.

Theorem
Let X1 , X 2 , , Xn be a random sample from the density f  x |   and let g   be the density of  .
Further, let l t ;   be the squared-error loss function for estimating    . That is,
l  t ;    t  x1 , , xn      
2
Then the Bayes estimator of    is given by

n
 f  xi   g   d
     
E    | X1  x1 , , X n  xn   i 1
n
 f  xi    g   d

i 1
 
Proof
We, know that the Bayes estimator of    with respect to the loss function l  ;  and prior density

Here, the loss function is squared error loss function. So, we have that the Bayes estimator of    is
that estimator which minimizes
 t  x1 , , xn       f | X1  x1 ,  x1 , , xn  d
2
, X n  xn

    t  x1 , , xn   f | X1  x1 ,  x1 , , xn  d
2
 , X n  xn

But the expression in the above is the conditional expectation of
    t  x1 , , xn  
2
with respect to the posterior distribution of  given X1  x1 , , X n  xn , which is minimized as a
function of t  x1 , , xn  for t *  x1 , , xn  equal to the conditional expectation of    with respect to the
posterior distribution of  given X1  x1 , , X n  xn .
 Recall that E  Z  a 2 is minimized as a function of a for a*  E  Z 

 
Hence, the Bayes estimator of    with respect to the squared-error loss function is given by
n
     f  xi   g   d
E    X1  x1 , , X n  xn   n
i 1
 f  xi    g   d

i 1
 
Hence, the theorem is proved.
Theorem

Let X1 , X 2 , , Xn be a random sample from the density f  x |   and let g   be the density of  .
Further, let l t ;   be the absolute-error loss function for estimating    . That is,
l  t ;    t  x1 , , xn     
Then the Bayes estimator of    is given by the median of the posterior distribution of  given
X 1  x1 , , X n  xn .
Proof
We know that the Bayes estimator of    with respect to the loss function l  ;  and prior density

Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of    is
that estimator which minimizes
 t  x1 , , xn      f | X1  x1 , , X n  xn  x1 , , xn  d

      t  x1 , , xn  f | X1  x1 , , X n  xn  x1 , , xn  d

But the expression in the above is the conditional expectation of

t  x1 , , xn     
with respect to the posterior distribution of  given X1  x1 , , X n  xn , which is minimized as a
function of t  x1 , , xn  for t *  x1 , , xn  equal to the conditional median with respect to the posterior
distribution of  given X1  x1 , , X n  xn .
 Recall that E Z  a is minimized as a function of a for a*  median of Z 

 
Hence, the Bayes estimator of    with respect to the squared-error loss function is given by the
median of the posterior distribution of  given X1  x1 , , X n  xn . (Proved)

1  1 2
f x |   exp   x     ;   x  
2  2 

1  1 2
g    exp    0   ;    
2  2 
That is,  is standard normal. Write 0  x0 when convenient. Find the Bayes estimator of    with
respect to the squared error loss function.
Solution
We know that the Bayes estimator of    with respect to the squared error loss function is given by
n
     f  xi   g   d
E    X1  x1 , , X n  xn   n
i 1
 f  xi    g   d

i 1
 

n
  f  xi   g  
f  X1  x1 , , X n  xn  | x1 , , xn   i 1
n
 f  xi    g    d

i 1
 
n
 1   1 n   1 2

1
  exp    xi   2  exp     0  
 2   2 i 1  2  2 

 n
 1   1 n   1 2

1
   exp    xi   2  exp     0   d
  2   2 i 1  2  2 
  1  
2


 xi 
   i 0  
 
 1 
1 n 1  
 f  X1  x1 , , X n  xn  | x1 , , xn   exp    
1
2  2 1  
n 1   n 1  
   
 
   
So, the Bayes estimator of  with respect to the squared error loss function is given by
n
 f  xi    g   d
     
E    | X1  x1 , , X n  xn   i 1
n
     f  X1  x1 , , X n  xn  x1 , , xn  d
 f  xi    g   d

i 1
 
  1 
2


 
   i 0
xi 



  1  
1 n 1
  1
exp   
 2 1   d

 2  
n 1   n 1  
   
   
1
x 1
x
i
 i 0 i
Now, let n 1  z   i 0 
1
z  d 
1
dz
1 n 1 n 1 n 1
n 1
Now, we have that,

 1 
   xi 
  1  1
z 
1 1
E    | X1  x1 , , X n  xn     i 0
 exp   z 2  dz
 
n 1 n 1 1
2  2  n 1

  n 1
 
1
 xi 1

1  1 
n 1 
i 0
  z exp   z 2  dz
n 1 
2  2 

1
 xi 1 0 1  1 

1  1  
 i 0
   z exp   z 2  dz   z exp   z 2  dz 
n 1 n  1   2  2  0
2  2  
1
 xi 1   1  1 

1  1  
 i 0
   z exp   z 2  dz   z exp   z 2  dz 
n 1 n  1  0 2  2  0
2  2  
1
 xi
 E    | X1  x1 , , X n  xn   i 0
n 1
So, here we have that the Bayes estimator of  with respect to the squared error loss is given by
1 1 1
 xi x0   xi 0   xi
i 0 i 1 i 1
 
n 1 n 1 n 1
Since, the posterior distribution of  is normal, its mean and median are the same. Hence,
1 1 1
 xi x0   xi 0   xi
i 0 i 1 i 1
 
n 1 n 1 n 1
is also the Bayes estimator with respect to the absolute-error loss function.
1
f x |  
I  x
  0,  
Assume that the prior distribution of  is given by g    I0,1  
That is,  is standard uniform. Find the Bayes estimator of    with respect to the squared error loss
function
 t   2 .
l t ;    2

Solution:
We know that the Bayes estimator of    with respect to any general loss function such as
 t   2
l t ;   
2
is that estimator which minimizes


n n n
 f  xi    g    1
    
 
 I0,   xi  I0,1  
f  X1  x1 , , X n  xn  | x1 , , xn   i 1
n
 n
i 1
n
 f  xi    g    d 1

i 1
       I  0,   xi  I  0,1  d
i 1
n n
1
 
 
 I0,   xi 
i 1
 1 n n
1
   I0,   xi d
0  i 1
n
1
   I  yn ,1  
  
1 n
1
    I  yn ,1   d
yn  
n n
1 1
   I  yn ,1      I  yn ,1  
  1
   1
   n 1  1   n 1 
    n  1   yn
 n  1  yn
n n
1 1
   I  yn ,1      I  yn ,1  
     
1 1  y  n 1  1  1 
  n  1 
n
  n 1  1
 n  1  yn 
Now the Bayes estimator of    with respect to any general loss function such as
 t   2
l t ;   
2
is that estimator which minimizes
n
1
t    2
t    2    I  yn ,1  
 
 f  X1  x1 , , X n  xn  x1 , , xn  d   d
 2 1  1 
2
 
  1
 n  1  ynn 1 
n
1
 t  yn      
1 2
 
  2 1  1 
d
yn
  1
 n  1  ynn 1 
Or, that estimator which minimizes
 t  yn      t  yn    
1 2 n 1 2
1
 2
   d 
 
  n2
d
yn yn
1 1 1
1 1 1
 t  yn     n2 d  2t  yn     n d  A
2
n 1
d 
yn yn yn
Here, equation  A is a quadratic equation in t  . This quadratic equation assumes its minimum for

1
1
  n1 d 1 

1  yn n 

t *  yn  
yn
  n
1 1   n 1 
d   n  1 1  yn
1
 n2 
yn
n  1 ynn  1 y n 1
  n  n n1
n yn yn  1
n  1 ynn  1
 t *  yn    n 1  yn
n yn  1
So, the Bayes estimator of    with respect to the squared error loss function
 t   2
l t ;   
2
is given by
n  1 ynn  1
t *  yn    n 1  yn .
n yn  1
Admissible Estimator
For two estimators T1  t1  X1, X 2 , , X n  and T2  t2  X1 , X 2 , , X n  , the estimator T1 is defined to be a
better estimator than T2 if and only if

Rt1    Rt2 for all  in  and
Rt1    Rt2 for at least one  in 
An estimator T  t  X1, X 2 , , X n  is defined to be admissible if and only if there is no better estimator.
Example: Using the squared error loss function l  t ,    t   2 , estimators for the location parameters
of a Normal distribution given a sample of size n are the sample mean t1  x   x , sample median
t2  x   m , the weighted mean t3  x    wi xi ;  wi  1 . Their respective risk functions are

2 2 1 2
R1 
n
, R2  1.57
n
; R3   2  
n
  wi  w 

Since R1  R2 or , R1  R3 . So x is an admissible estimator of the location parameter  of normal

distribution.
Inadmissibility of an Estimator
An estimator t is said to be inadmissible if there exists another estimator t  which dominates it such
that
R  , t    R  , t  for all  in  and
R  , t    R  , t   for some  in 
Finding Inadmissible Estimator

To find the inadmissibility of an estimator t , we may use the following lemma.

Let the range of estimator    be a, b and the loss function L  , t   0 and for any fixed  , L  , t  is
increasing and for any fixed  , L  , t  is increasing as t moves away from    in either direction.
Then any estimator taking on values outside the closed interval a, b with positive probability is
inadmissible.
Properties of Admissible Estimator

The properties of admissible estimators are as follows.
a) If the loss function L is strictly convex, then every admissible estimator must be non-
randomized.
b) If L is strictly convex and t is an admissible estimator of    and if t  is another estimator with
the same risk function i.e. R  , t   R  , t  for all  then t  t  with probability 1 .
c) Any unique Bayes estimator is admissible.
Minimax Estimator
An estimator T * is defined to be a minimax estimator if and only if

 
Sup R  , t *  Sup R  , t 

for every estimator t
Properties of Minimax Estimator

The properties of minimax estimator are given below.
a) One appealing feature of the minimax estimator is that it does not depend on the particular
parameterization.
b) If g  be a prior distribution of  such that  R  , t g  g   d  sup R  , t g  then


i. t g is minimax
ii. if t g is the unique Bayes solution with respect to g  , it is unique minimax procedure.
c) If a Bayes estimator t g has constant risk then it is minimax.
d) If t  dominates by a minimax estimator t , then t  is also minimax.
e) If an estimator has constant risk and is admissible, it is minimax.
f) The best equivalent estimator may be frequently minimax.
Example: Suppose that   1, 2  , where 1 corresponds to oil and  2 to no oil. Let A  a1, a2 , a3
where ai corresponds to the choice i , i  1, 2, 3 . Suppose that the following table gives the losses for
the decision problem.
Partial
Drill a1 Sell a2
a3
Oil 1 0 10 5
No oil  2 12 1 6
If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.
An esperiment is conducted to obtain the information about  , resulting is the random variable X
with possible values coded as 0 and 1 given by
x
0 1
Oil 1 0.3 0.7
No oil  2 0.6 0.4
When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7
 P  x  0 | 1   0.3 and P  x  1| 1   0.7
Now the possible decision rules i  x  are.
i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3
Here, 1  Take action a1 regurdless of the value of X

Take action a1 if X  0
2  
Take action a2 if X  1
and so on.
Then the risk of  at  is

R  ,    E l  ,   x   
 l  , a1  P   x   a1   l  , a2  P   x   a2   l  , a3  P   x   a3 
Now, R 1 ,  2   0  0.3  10  0.7  7

R  2 ,  2   12  0.6  1 0.4  7.06
Thus we get,
1 2 3 4 5 6 7 8 9
R 1 , i  0 7 3.5 3 10 6.5 1.5 8.5 5
R 2 , i  12 7.06 9.6 5.4 1 3 8.4 4.0 6
Max  R 1 , i  , R 2 , i  12 7.06 9.6 5.4 10 6.5 8.4 8.5 6
min  Max  R 1 , i  , R 2 , i   5.4
a1 if x0
Thus minimax solution is  4  x   
a2 if x 1
Again, R 2 , 4   5.4  R 2 , 2   7.6 , so  2 is inadmissible.

Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat
the parameter as a random variable  with possible values 1 ,  2 and the frequency function is
 1   0.2,  2   0.8
So that Baye’s risk is

R    E  R  ,     0.2 R 1 ,    0.8R  2 ,  
 R 1   0.2  0  0.8  12  9.6
R  2   0.2  7  0.8  7.6  7.46
R  3   0.2  3.5  0.8  9.6  8.38
and so on.
So we compute the following table
i 1 2 3 4 5 6 7 8 9
R  i  9.6 7.48 8.38 4.92 2.8 3.7 7.02 4.9 5.8
In the Bayesian framework  is preferable to   if and only if it has smaller Bayes risk. If there is a
rule  * which attains the minimum Bayes risk i.e. such that R  *   min R    2.8 then it is called a

Bayes rule. From this example we say that rule  5  2.8 is the unique Bayes rule for our prior
distribution.
Example: Let X ~ b 1, p  , p     ,  and Aa1 , a2  . Let the loss function be defined as follows.
1 1
4 2
a1 a2
1
p1  1 4
4
1
p2  3 2
2
The set of decision rules includes four functions: 1 ,  2 ,  3 ,  4 , defined by

1  0   1 1  a1
 2  0   a1  2 1  a2
 3  0   a2  3 1  a1
 4  0    4 1  a2
The risk function takes the following values:
i R  p1 , i  R  p2 , i  Max  R  p,  i   min Max  R  p,  i 

p1 , p2 i p1 , p2
1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4
Thus the minimax solution is
a1 if x0
2  x  
a2 if x 1

Bayes and Minimax Estimation: Prior Distribution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes and Minimax Estimation: Prior Distribution

Uploaded by

Copyright:

Available Formats

Bayes and Minimax Estimation

variance, so that f   is the prior distribution of  .

depends on  , where  is unknown parameter.

Thus f  x1 , x2 , , xn  is known as the posterior distribution of  .

Example: A time failure of a transistor is known to be exponentially distributed with parameter 

Assume that the prior distribution of  is given by

Bayes and Minimax Estimation ~ 1 of 20

That is,  is standard normal. Find the posterior distribution of  .

Bayes and Minimax Estimation ~ 2 of 20

 exp  2   n  1 

Posterior Bayes estimator

density g  is defined to be

Here, it is given that

Bayes and Minimax Estimation ~ 3 of 20

Assume that the prior distribution of  is given by

the Bayes estimator of  and  1    .

Bayes and Minimax Estimation ~ 4 of 20

Bayes and Minimax Estimation ~ 5 of 20

probability 1 or TG* is not an unbiased estimator.

By the definition, we have that

Now, we have that

Now, from equation 1 and  2  we have that

zero, then both are zero.

by l t ;   is defined to be a real valued function satisfying

1) l t ;    0 for all possible estimates t and all  in  .

1) l1  t ;    t  g   . It is called the squared error loss function.

2) l2  t ;    t  g   . It is called the absolute error loss function.

general loss function that includes both l1 and l2 as special cases.

of x1 , x2 , , xn . Since ˆ is consider to be a random so that risk itself a random variable.

Possible Risk Functions

1) Corresponding to the loss function l1  t ;   t  g   

2) Corresponding to the loss function l2  t ;    t  g   the risk function is given by

Rt    E l2 T ;    E t  g   . It is called the mean absolute error.

3) Corresponding to the loss function

the risk function is given by Rt    E l3 T ;   AP  t  g      .

4) Corresponding to the loss function l4  t ;       t  g   for     0 and r  0 the risk function

is given by Rt    E l4 T ;      E  t  g   z r  .

When a loss function is said to be Convex and Strictly Convex?

a left and right derivative at every point of  a, b  .

Bayes and Minimax Estimation ~ 8 of 20

with strict inequality sufficient for strict convexity.

Bayes Estimator With Respect to Loss Function

that minimizes the expected risk, were expected risk is defined as

Now, interchanging the order of integration we can write * as

estimator of  is a function of d of x1 , x2 , , xn that minimizes

Thus the Bayes estimator of  is the value ˆ that minimizes

If the loss function is the squared error i.e. d  x     then

Thus minimizing Y with respect to d  x  is

Bayes and Minimax Estimation ~ 9 of 20

Advantages of Bayesian Approach

b) A decision rule  * is said to be uniformly best in a class of decision rules D if  * is uniformly

known. Let the prior distribution of  be N  ,  2  . Find the Bayes estimate of  .

The posterior distribution of  given x is

Bayes and Minimax Estimation ~ 10 of 20

posterior distribution of  given the observations x1 , x2 , , xn .

That is, the Bayes estimator of    is that estimator which minimizes

Hence, the theorem is proved.

Then the Bayes estimator of    is given by

posterior distribution of  given the observations x1 , x2 , , xn .

That is, the Bayes estimator of    is that estimator which minimizes

But the expression in the above is the conditional expectation of

with respect to the posterior distribution of  given X1  x1 , , X n  xn , which is minimized as a