Professional Documents
Culture Documents
Prior Distribution
Let f be the probability distribution of the parameter which is also summarizes the objective
information about prior to obtaining sample observation. We will choose f with sampler
Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f x | . This distribution
The posterior distribution of as the conditional distribution of given the sample values or sample
measures. So,
f x1 , x2 , , xn , Where,
f x1 , x2 , , xn
f x1 , x2 , , xn f x1 , x2 , , xn , joint distribution
f x1 , x2 , , xn | f of sample &
f x1 , x2 , , xn f f x1 , x2 , , xn |
That is, is also exponentially distributed over the interval 0, . Find the posterior distribution of .
Solution:
We know that the posterior distribution of is given by
x1 , x2 , , xn
, Xn
n
f xi g
i 1
n
f xi g d
i 1
n n
xi xi k
ne i 1
ke k e
n i 1
n
n
xi xi k
ke k d
e d
n i 1 n i 1
e
0 0
n
xi k
n e i1
n
xi k
e i 1 n 11d
0
n 1
n n
xi k
xi k n
xi k
e
n i 1
i 1
e i1 n 11
n 1 n 1
n 1
n
xi k
i 1
n
f X1 x1 , , X n xn x1 , x2 , , xn Gamma n 1, xi k
; 0
i 1
Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
1 2
f x
1
exp x ; x
2 2
Assume that the prior distribution of is given by
1 1
g exp 2 ;
2 2
1 n 2
n
1 1
exp xi
1
exp 2
2 2 i 1 2 2
1 n
n
1 1
2 exp 2
1
xi 2 exp 2 d
i 1 2 2
1
exp n 1 2 2 nx
2
n 1 n n
2
exp 2 2 x x
2 n 1 n 1
n 1 n n
2
2
d
2
exp 2 x x
n 1 n 1
n
2
x
1 n 1
exp
2 1
n 1
n
2
x
exp 2 n1 1 d
1
n 1
n
2
x
1
exp
1 n 1
1 2 1
2
n 1 n 1
n
2
x
1
1 d
1
exp
2
n
1
1
2
n 1 n 1
1
f X1 x1 , , X n xn x1 , x2 , , xn N ;
n
x,
1
;
n 1 n
variable with known density g . The posterior Bayes estimator of with respect to the prior
E | X1 , X 2 , , X n
f xi g d
i 1
n
f xi g d
i 1
One might note the similarity between the posterior Bayes estimator of and the Pitman
estimator of a location parameter.
Example:
Let X1 , X 2 , , X n denote a random sample from Bernoulli density
f x x 1
1 x
I 0, 1 x for 0 1
That is, is uniformly distributed over the interval 0,1 . Find the posterior distribution of and find
Solution:
We know that the posterior distribution of is given by
f X1 , X 2 , x1 , x2 , , xn ,
f X1 x1 , , X n xn x1 , x2 , , xn
f X1 , X 2 ,
, X n ,
x1 , x2 , , xn
, Xn
n n
f xi g xi n
i 1
i1 1 n x i 1
i
I 0, 1
n
n
f xi g d xi
n
n xi
i 1
i 1
1 i 1 I 0, 1 d
n
xi 11 n
n xi 11
i 1
1 i 1
n n
xi 1, n xi 1
i 1 i 1
n n
f X1 x1 , , X n xn x1 , x2 , , xn Beta 1st ;
xi 1, n xi 1 ; 0 1
i 1 i 1
Again, we have that the posterior Bayes estimator of with respect to the prior distribution
g I 0, 1 is given by
f X x , 1 1 , X n xn x1 , x2 , , xn d
n n
f xi g d xi n
i 1 i 1
1 n x i 1
i
I 0, 1 d
n
n
f xi g d xi
n
n xi
i 1
i 1
1 i 1 I 0, 1 d
n
1 xi 1 n
n xi
i 1
1 i 1 d
0
n
1 xi n
i1 1 n
i 1
xi
d
0
n n n
xi 2, n xi 1
xi 1 n 2
i 1 i 1
i 1
n n n3
xi 1, n xi 1
i 1 i 1
n
xi 1
E X1 x1 , , X n xn i 1
n2
Hence, the posterior Bayes estimator of with respect to the uniform prior distribution is given by
n n n
xi 1 xi xi
i 1 i 1 i 1
. Contrast this to the maximum likelihood estimator of , which is . We know that
n2 n n
is unbiased and UMVUE , whereas the posterior Bayes estimator is not unbiased.
Again, we have that the posterior Bayes estimator of 1 with respect to the prior distribution
g I 0,1 is given by
1 f X x , , X x x1 , x2 , , xn d
1 1 n n
n
f xi g d
1
i 1
1 f X , X , , X , x1 , x2 , , xn , d
1 2 n
f X , X , , X , x1 , x2 , , xn , d
n
f xi g d
i 1
1 2 n
n
xi n
1 i1 1 n x i 1
i
I 0, 1 d
n
xi n xi
n
i 1
1 i 1 I 0, 1 d
n
1 xi 1 n
n xi 1 n n
i 1
1 i 1 d xi 2, n xi 2
0
i 1 i 1
n
n
n
xi
1 n
xi 1, n xi 1
i 1
1 n x i 1
i
d
i 1 i 1
0
n n
xi 1 n xi 1 n 2
i 1 i 1
n4
n n
xi 1 n xi 1
E X1 x1 , , X n xn
i 1 i 1
2
n 3 n
Hence, the posterior Bayes estimator of 1 with respect to the uniform prior distribution is given
n n
xi 1 n xi 1
by i 1 i 1
. We noted in the above example that the posterior Bayes estimator that we
n 3 n 2
obtained was not unbiased.
The following remark states that in general a posterior Bayes estimator is not unbiased.
Remark: Let TG* tG* X1 , X 2 , , Xn denote the posterior Bayes estimator of with respect to a
prior distribution G . If both TG* and have finite variance, then either
var TG* 0
Or, TG* is not an unbiased estimator of . That is, either TG* estimates correctly with
E TG*
E Var TG* Var
1
And
Var E Var X1 , X 2 , , X n Var E X 1 , X 2 , , X n
E Var X1 , X 2 , , X n Var TG*
Var [T*G] Var E Var X1 , X 2 , , X n 2
E Var TG* Var Var E Var X 1 , X 2 ,
, X n
E Var TG* E Var X 1 , X 2 , , X n 0
Now, since both E Var TG* and E Var X1 , X 2 , , X n are non-negative and their sum is
In particular, E Var TG* 0 and since Var TG* is non-negative and has zero expectation, then
Var TG* 0 .
Loss Function
Consider estimating g , let t t x1, x2 , , xn denote an estimate of g . The loss function, denoted
2) l t ; 0 for t g .
l t ; equals the loss incurred if one estimates g to be t when is the true parameter value.
The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.
Several Possible Loss Function
A if t g
3) l3 t ; .
0 if t g , where A 0
4) l4 t ; t g for 0 and r 0 .
r
Note that both l1 and l2 increases as the error T g increases in magnitude. l3 says that we loss
nothing if the estimate t is within units of g and otherwise we loss the amount A . l4 is a
T t X1 , X 2 , , X n is defined to be
Rt E l T ;
The risk function is the average loss. The expectation in the above equation can be taken in two ways.
For example, if the density f x ; from which we sampled is a probability density function, then
R , t E l T ; E l t X 1 , X 2 , , X n ;
n
l t X1 , X 2 , , X n ; f xi ; dxi
i 1
Or, we can consider the random variable T and the density of T is fT t then
R , t E l T ; l T ; fT t dt
Where, fT t is the density of the estimator T . In either case, the expectation averages out the values
Rt E l1 T ; E t g
2
L t 1 t * L t 1 L t * 1
The function is said to be strictly convex if strict inequality holds in 1 , for all indicated values of
t , t* and .
Convexity is a vary strong condition which implies, for example, that L is continuous in a, b and has
L t L t * for all a t t * b 1
The function is strictly convex iff 1 is strict for all t t * .
b) If L is twice differentiable then the necessary and sufficient condition 1 is equivalent to
L t 0 for all a t b
l d x ,
1 , xn ; f x1 , , xn | dx1 dxn f d
*
B d l d x , 1 , xn ; f x1 , , xn | f d dx1
dxn **
The function B d will be minimized if we can find the minimized function d i.e. minimizes the
quantity within the third bracket of the equation ** for every set of x values. That is, the Bayes
l d x , , x ; f x , , x | f d
1 n 1 n
Since f x1 , , xn , f x1 , , xn | f
l d x ; f x , , x , d f x1 , , xn | f
f | x1 , , xn
1 n
f x1 , , xn
f x , , x l d x ; f | x , , x d
1 n 1 n
l d x ; f | x ,
1 , xn d Y say
Y d x f | x1 ,
, xn d d x f | x1 , , xn d
2 2
2 d x f | x1 ,
, xn d 2 f | x1 , , xn d
2 d x f | x1 ,
, xn d 2 f | x1 , , xn d 0
d x
f | x ,
1 , xn d
Expected Posterior
f | x ,
1 , xn d
Hence, d x is the Bayes estimate for if the loss function is in squared error.
b) The posterior distribution tells the whole story and if a point estimate or confidence interval be
desired they can immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical
point of view.
Note:
a) A decision Rule is said to be uniformly better than a decision rule if
R , R , with strict inequality holding for some .
Solution:
The joint conditional distribution of the sample given is
n
1 1 2
xi
2
f x1 , , xn | exp 2
2 2 2
n
1 2 n 2
xi x
1
exp 2 x 2
2
2
2 2 2
1 2
f x1 , , xn | exp 2 x
2
2 2 0
1 n 2 2 nx 02 02
2
exp 02 2
2 0
n 02 2
nx 02 02 02 2
f | x ~ N ,
n 0 n 02 2
2 2
nx 02 02
If the loss function is squared error, the Bayes estimator of is
n 02 2
Theorem
Let X1 , X 2 , , X n be a random sample from the density f x | and let g be the density of .
Further, let l t ; be the loss function for estimating . The Bayes estimator of is that
estimator t* ; , which minimizes
l t x , x ,
1 2 , xn ; f X1 x1 , , X n xn x1, x2 , , xn d
as a function of t ; , .
Proof
For a general loss function l t ; , we seek that estimator, say t* ; , , which minimizes the
expression
Rt g d E l t ; g d
E l t x1 , x2 ,
, xn ; g d
n
l t x1 ,
, xn ; f X1 , X 2 , , Xn x1 , x2 , , xn dxi g d
R
i 1
f X , , X n x1 , , xn g d n
l t x1 , , xn ; 1
f X , , X x1 , , xn dxi
R
f X 1 , , X n
x1 , , xn 1
n
i 1
n
l t x1 , , xn ; f X1 x1 ,...., X n xn x1 , , xn d f X1 , , X n x1 , , xn
dxi
R
i 1
Since, the integral is non-negative, the double integral can be minimized if the expression within the
braces, which is sometimes called the posterior risk, is minimized for each x1 , x2 , , xn .
So, in general, the Bayes estimator of with respect to the loss function l ; and prior density
g is that estimator which minimizes the posterior risk, which is the expected loss with respect to the
Further, let l t ; be the squared-error loss function for estimating . That is,
l t ; t x1 , , xn
2
Proof
We, know that the Bayes estimator of with respect to the loss function l ; and prior density
g is that estimator which minimizes the posterior risk, which is the expected loss with respect to the
l t x1 , , xn ; f X1 x1 , , X n xn x1 , , xn d
Here, the loss function is squared error loss function. So, we have that the Bayes estimator of is
that estimator which minimizes
t x1 , , xn f | X1 x1 , x1 , , xn d
2
, X n xn
t x1 , , xn f | X1 x1 , x1 , , xn d
2
, X n xn
t x1 , , xn
2
Hence, the Bayes estimator of with respect to the squared-error loss function is given by
n
f xi g d
E X1 x1 , , X n xn n
i 1
f xi g d
i 1
Theorem
Further, let l t ; be the absolute-error loss function for estimating . That is,
l t ; t x1 , , xn
Then the Bayes estimator of is given by the median of the posterior distribution of given
X 1 x1 , , X n xn .
Proof
We know that the Bayes estimator of with respect to the loss function l ; and prior density
g is that estimator which minimizes the posterior risk, which is the expected loss with respect to the
l t x1 , , xn ; f X1 x1 , , X n xn x1 , , xn d
Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of is
that estimator which minimizes
t x1 , , xn f | X1 x1 , , X n xn x1 , , xn d
t x1 , , xn f | X1 x1 , , X n xn x1 , , xn d
function of t x1 , , xn for t * x1 , , xn equal to the conditional median with respect to the posterior
distribution of given X1 x1 , , X n xn .
Hence, the Bayes estimator of with respect to the squared-error loss function is given by the
That is, is standard normal. Write 0 x0 when convenient. Find the Bayes estimator of with
respect to the squared error loss function.
Solution
Bayes and Minimax Estimation ~ 13 of 20
We know that the Bayes estimator of with respect to the squared error loss function is given by
n
f xi g d
E X1 x1 , , X n xn n
i 1
f xi g d
i 1
So, the Bayes estimator of with respect to the squared error loss function is given by
n
f xi g d
E | X1 x1 , , X n xn i 1
n
f X1 x1 , , X n xn x1 , , xn d
f xi g d
i 1
1
2
i 0
xi
1
1 n 1
1
exp
2 1 d
2
n 1 n 1
1
x 1
x
i
i 0 i
Now, let n 1 z i 0
1
z d
1
dz
1 n 1 n 1 n 1
n 1
Example: Let X1 , X 2 , , X n denote a random sample from normal distribution with the density
1
f x |
I x
0,
Assume that the prior distribution of is given by g I0,1
That is, is standard uniform. Find the Bayes estimator of with respect to the squared error loss
function
t 2 .
l t ; 2
Solution:
We know that the Bayes estimator of with respect to any general loss function such as
t 2
l t ;
2
is that estimator which minimizes
l t x1 , , xn ; f X1 x1 , , X n xn x1 , , xn d
Now the Bayes estimator of with respect to any general loss function such as
t 2
l t ;
2
is that estimator which minimizes
n
1
t 2
t 2 I yn ,1
f X1 x1 , , X n xn x1 , , xn d d
2 1 1
2
1
n 1 ynn 1
n
1
t yn
1 2
2 1 1
d
yn
1
n 1 ynn 1
Or, that estimator which minimizes
t yn t yn
1 2 n 1 2
1
2
d
n2
d
yn yn
1 1 1
1 1 1
t yn n2 d 2t yn n d A
2
n 1
d
yn yn yn
Here, equation A is a quadratic equation in t . This quadratic equation assumes its minimum for
n 1 ynn 1 y n 1
n n n1
n yn yn 1
n 1 ynn 1
t * yn n 1 yn
n yn 1
So, the Bayes estimator of with respect to the squared error loss function
t 2
l t ;
2
is given by
n 1 ynn 1
t * yn n 1 yn .
n yn 1
Admissible Estimator
For two estimators T1 t1 X1, X 2 , , X n and T2 t2 X1 , X 2 , , X n , the estimator T1 is defined to be a
Example: Using the squared error loss function l t , t 2 , estimators for the location parameters
of a Normal distribution given a sample of size n are the sample mean t1 x x , sample median
increasing and for any fixed , L , t is increasing as t moves away from in either direction.
Then any estimator taking on values outside the closed interval a, b with positive probability is
inadmissible.
randomized.
the same risk function i.e. R , t R , t for all then t t with probability 1 .
Minimax Estimator
An estimator T * is defined to be a minimax estimator if and only if
Sup R , t * Sup R , t
for every estimator t
parameterization.
i. t g is minimax
ii. if t g is the unique Bayes solution with respect to g , it is unique minimax procedure.
Example: Suppose that 1, 2 , where 1 corresponds to oil and 2 to no oil. Let A a1, a2 , a3
where ai corresponds to the choice i , i 1, 2, 3 . Suppose that the following table gives the losses for
the decision problem.
Bayes and Minimax Estimation ~ 18 of 20
Partial
Drill a1 Sell a2
a3
Oil 1 0 10 5
No oil 2 12 1 6
If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.
An esperiment is conducted to obtain the information about , resulting is the random variable X
with possible values coded as 0 and 1 given by
x
0 1
Oil 1 0.3 0.7
No oil 2 0.6 0.4
When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7
P x 0 | 1 0.3 and P x 1| 1 0.7
i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3
Thus we get,
1 2 3 4 5 6 7 8 9
R 1 , i 0 7 3.5 3 10 6.5 1.5 8.5 5
Max R 1 , i , R 2 , i 12 7.06 9.6 5.4 10 6.5 8.4 8.5 6
a1 if x0
Thus minimax solution is 4 x
a2 if x 1
i 1 2 3 4 5 6 7 8 9
In the Bayesian framework is preferable to if and only if it has smaller Bayes risk. If there is a
rule * which attains the minimum Bayes risk i.e. such that R * min R 2.8 then it is called a
Bayes rule. From this example we say that rule 5 2.8 is the unique Bayes rule for our prior
distribution.
Example: Let X ~ b 1, p , p , and Aa1 , a2 . Let the loss function be defined as follows.
1 1
4 2
a1 a2
1
p1 1 4
4
1
p2 3 2
2
1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4
Thus the minimax solution is
a1 if x0
2 x
a2 if x 1
Bayes and Minimax Estimation ~ 20 of 20