Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)

Chapter 1 Basic Concepts
Thomas Bayes (1702-1761): two articles from his pen published

posthumously in 1764 by his friend
Richard Price.
Laplace (1774): stated the theorem on inverse probability in general

form.
Jeffreys (1939): rediscovered Laplace’s work.
Example 1:
y i , i  1, 2,  , n : the lifetime of batteries
Assume y i ~ N  ,  2  . Then,
 1 n 2
 
p y |  ,  2   n exp 2   yi    , y   y1 ,  , yn  .
t
 2 i1 
To obtain the information about the values of  and  2 , two methods are
available:
(a) Sampling theory (frequentist):

 and  2 are the hypothetical true values. We can use
 point estimation: finding some statistics ˆ y  and ˆ 2  y  to estimate
 and  2 , for example,
n n
 yi  y  y
2
, ˆ2 i
.
ˆ y   y  i 1
  y  i 1
n n 1
 interval estimation: finding an interval estimate ˆ1  y  , ˆ2  y  and  

ˆ 2
1  y  , ˆ 22  y   for  and  , for example, the interval estimate for  ,
2
 s s   
 y  z 2 , y  z  , P Z  z   , Z ~ N (0,1).
 n 2 n  2 2
1
(b) Bayesian approach:
Introduce a prior density   ,  2  for  and  2 . Then, after some
manipulations, the posterior density (conditional density given y)
f  ,  2 | y  can be obtained. Based on the posterior density, inferences
about  and  2 can be obtained.
Example 2:
X ~ b10, p  : the
number of wins for some gambler in 10 bets,
where p is the probability of winning.
Then,
10 
f  x | p     p x 1  p 
10 x
, x  1,2,  ,10.
x
 
(a) Sampling theory (frequentist):
To estimate the parameter p, we can employ the maximum likelihood

principle. That is, we try to find the estimate p̂ to maximize the
likelihood function
10 
l  p | x   f  x | p     p x 1  p 
10  x
x
.
 
For example, as x  10 ,
10 
l  p | x   l  p | 10    p 10 1  p 
10 10
 p 10 .
10
 
Thus, pˆ  1 . It is a sensible estimate. Since we can win all the time, the
sensible estimate of the probability of winning should be 1. On the other
hand, as x  0 ,
10 
l  p | x   l  p | 0     p 0 1  p   1  p  .
10 0 10
0
 
Thus, pˆ  0 . Since we lost all the time, the sensible estimate of the
probability of winning should be 0. In general, as x  n ,
n
pˆ  , n  0,1, ,10,
10
maximize the likelihood function.
(b) Bayesian approach::
2
  p : prior density for p, i.e., prior beliefs in terms of probabilities of
various possible values of p being true.
Let
r  a  b  a 1
  p  p 1  p   Beta  a, b  .
b 1
r  a  r  b
Thus, if we know the gambler is a professional gambler, then we can use
the following beta density function,
  p   2 p  Beta 2,1 ,
to describe the winning probability p of the gambler.
The plot of the density function is
Beta(2,1)
2.0
1.5
prior
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Since a professional gambler is likely to win, higher probability is

assigned to the large value of p.
If we know the gambler is a gambler with bad luck, then we can use the
following beta density function,
  p   21  p   Beta 1,2  ,
to describe the winning probability p of the gambler. The plot of the
density function is
3
Beta(1,2)
2.0
1.5
prior
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Since a gambler with bad luck is likely to lose, higher probability is

assigned to the small value of p.
If we feel the winning probability is more likely to be around 0.5, then we

can use the following beta density function,
  p   6 p 1  p   Beta 2,2  ,
density function is
Beat(2,2)
1.5
1.0
prior
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
If we don’t have any information about the gambler, then we can use the
following beta density function,
  p   1  Beta 1,1 ,
4
density function is
Beta(1,1)
1.2
1.1
prior
1.0
0.9
0.8
0.0 0.2 0.4 0.6 0.8 1.0

p
posterior density of p given x  conditional density of p given x

f  x, p  joint density of x and p
 f  p | x  
f  x marginal density of x
f  x | p   p 
  f  x | p   p   l  p | x   p 
f  x
Thus, the posterior density of p given x is

r  a  b  a 1 b 1  n 
f  p | x    pl  p | x  p 1  p    p x 1  p 
n x
r a r b x
 
 c  a , b, x  p 1  p 
x  a 1 b 10  x 1
In fact,
r  a  b  10 
f  p | x  p x  a1 1  p 
b 10 x 1
r  x  a  r  b  10  x 
 Beta x  a, b  10  x 
Then, we can use some statistic based on the posterior density, for
example, the posterior mean
xa
E  p | x    pf  p | x  dp 
1
0 a  b  10
.
As x  n ,
an
pˆ  E  p | n  
a  b  10
is different from the maximum likelihood estimate n 10 .
5
Note:
f  p | x    pl  p | x
  the original informatio n about p  the informatio n from the data 
 the new informatio n about p given the data
Properties of Bayesian Analysis:

1. Precise assumption will lead to consequent inference.
2. Bayesian analysis automatically makes use of all the information from
the data.
3. The inferences unacceptable must come from inappropriate assumption
and not from inadequacies of the inferential system.
4. Awkward problems encountered in sampling theory do not arise.
5. Bayesian inference provides a satisfactory way of explicitly
introducing and keeping track of assumptions about prior knowledge or
ignorance.

Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)

Uploaded by

Copyright:

Available Formats

Chapter 1 Basic Concepts

Thomas Bayes (1702-1761): two articles from his pen published

Laplace (1774): stated the theorem on inverse probability in general

Jeffreys (1939): rediscovered Laplace’s work.

y i , i  1, 2,  , n : the lifetime of batteries

(a) Sampling theory (frequentist):

 interval estimation: finding an interval estimate ˆ1  y  , ˆ2  y  and  

(a) Sampling theory (frequentist):

To estimate the parameter p, we can employ the maximum likelihood

maximize the likelihood function.

(b) Bayesian approach::

0.0 0.2 0.4 0.6 0.8 1.0

Since a professional gambler is likely to win, higher probability is

0.0 0.2 0.4 0.6 0.8 1.0

Since a gambler with bad luck is likely to lose, higher probability is

If we feel the winning probability is more likely to be around 0.5, then we

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

posterior density of p given x  conditional density of p given x

Thus, the posterior density of p given x is

is different from the maximum likelihood estimate n 10 .

Properties of Bayesian Analysis:

You might also like