You are on page 1of 6

Chapter 1 Basic Concepts

Thomas Bayes (1702-1761): two articles from his pen published


posthumously in 1764 by his friend
Richard Price.

Laplace (1774): stated the theorem on inverse probability in general


form.

Jeffreys (1939): rediscovered Laplace’s work.

Example 1:

y i , i  1, 2,  , n : the lifetime of batteries

Assume y i ~ N  ,  2  . Then,

 1 n 2
 
p y |  ,  2   n exp 2   yi    , y   y1 ,  , yn  .
t

 2 i1 

To obtain the information about the values of  and  2 , two methods are
available:

(a) Sampling theory (frequentist):


 and  2 are the hypothetical true values. We can use
 point estimation: finding some statistics ˆ y  and ˆ 2  y  to estimate
 and  2 , for example,
n n

 yi  y  y
2

, ˆ2 i
.
ˆ y   y  i 1
  y  i 1
n n 1

 interval estimation: finding an interval estimate ˆ1  y  , ˆ2  y  and  


ˆ 2
1  y  , ˆ 22  y   for  and  , for example, the interval estimate for  ,
2

 s s   
 y  z 2 , y  z  , P Z  z   , Z ~ N (0,1).
 n 2 n  2 2

1
(b) Bayesian approach:
Introduce a prior density   ,  2  for  and  2 . Then, after some
manipulations, the posterior density (conditional density given y)
f  ,  2 | y  can be obtained. Based on the posterior density, inferences
about  and  2 can be obtained.

Example 2:

X ~ b10, p  : the
number of wins for some gambler in 10 bets,
where p is the probability of winning.
Then,
10 
f  x | p     p x 1  p 
10 x
, x  1,2,  ,10.
x
 

(a) Sampling theory (frequentist):

To estimate the parameter p, we can employ the maximum likelihood


principle. That is, we try to find the estimate p̂ to maximize the
likelihood function
10 
l  p | x   f  x | p     p x 1  p 
10  x

x
.
 

For example, as x  10 ,
10 
l  p | x   l  p | 10    p 10 1  p 
10 10
 p 10 .
10
 
Thus, pˆ  1 . It is a sensible estimate. Since we can win all the time, the
sensible estimate of the probability of winning should be 1. On the other
hand, as x  0 ,
10 
l  p | x   l  p | 0     p 0 1  p   1  p  .
10 0 10

0
 
Thus, pˆ  0 . Since we lost all the time, the sensible estimate of the
probability of winning should be 0. In general, as x  n ,
n
pˆ  , n  0,1, ,10,
10

maximize the likelihood function.

(b) Bayesian approach::

2
  p : prior density for p, i.e., prior beliefs in terms of probabilities of
various possible values of p being true.
Let
r  a  b  a 1
  p  p 1  p   Beta  a, b  .
b 1

r  a  r  b
Thus, if we know the gambler is a professional gambler, then we can use
the following beta density function,
  p   2 p  Beta 2,1 ,
to describe the winning probability p of the gambler.
The plot of the density function is
Beta(2,1)
2.0
1.5
prior

1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Since a professional gambler is likely to win, higher probability is


assigned to the large value of p.

If we know the gambler is a gambler with bad luck, then we can use the
following beta density function,
  p   21  p   Beta 1,2  ,
to describe the winning probability p of the gambler. The plot of the
density function is

3
Beta(1,2)

2.0
1.5
prior

1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Since a gambler with bad luck is likely to lose, higher probability is


assigned to the small value of p.

If we feel the winning probability is more likely to be around 0.5, then we


can use the following beta density function,
  p   6 p 1  p   Beta 2,2  ,
to describe the winning probability p of the gambler. The plot of the
density function is
Beat(2,2)
1.5
1.0
prior

0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

If we don’t have any information about the gambler, then we can use the
following beta density function,
  p   1  Beta 1,1 ,
to describe the winning probability p of the gambler. The plot of the

4
density function is
Beta(1,1)

1.2
1.1
prior

1.0
0.9
0.8

0.0 0.2 0.4 0.6 0.8 1.0


p

posterior density of p given x  conditional density of p given x


f  x, p  joint density of x and p
 f  p | x  
f  x marginal density of x
f  x | p   p 
  f  x | p   p   l  p | x   p 
f  x

Thus, the posterior density of p given x is


r  a  b  a 1 b 1  n 
f  p | x    pl  p | x  p 1  p    p x 1  p 
n x

r a r b x
 
 c  a , b, x  p 1  p 
x  a 1 b 10  x 1

In fact,
r  a  b  10 
f  p | x  p x  a1 1  p 
b 10 x 1

r  x  a  r  b  10  x 
 Beta x  a, b  10  x 

Then, we can use some statistic based on the posterior density, for
example, the posterior mean
xa
E  p | x    pf  p | x  dp 
1

0 a  b  10
.
As x  n ,

an
pˆ  E  p | n  
a  b  10

is different from the maximum likelihood estimate n 10 .

5
Note:
f  p | x    pl  p | x
  the original informatio n about p  the informatio n from the data 
 the new informatio n about p given the data

Properties of Bayesian Analysis:


1. Precise assumption will lead to consequent inference.
2. Bayesian analysis automatically makes use of all the information from
the data.
3. The inferences unacceptable must come from inappropriate assumption
and not from inadequacies of the inferential system.
4. Awkward problems encountered in sampling theory do not arise.
5. Bayesian inference provides a satisfactory way of explicitly
introducing and keeping track of assumptions about prior knowledge or
ignorance.

You might also like