You are on page 1of 34

Introduction to Bayesian Statistics

Jenn Brynjarsdttir and Yifang Li


Statistical and Applied Mathematical Sciences Institute (SAMSI)
and North Carolina State University

SAMSI Undergraduate workshop


May 14 18, 2012

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

1 / 31

Outline
1

Probability
Conditional Probability
Example: Special coin

Bayesian statistics
Posterior distribution
Coin example in Bayesian framework

Bayes for the Normal distribution


Example: Jeremys IQ
The Normal - Normal Bayesian analysis
The interplay of prior information and data

Activity

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

2 / 31

Dont you love probability?


Agatha Christie The Mirror Crackd. Toronto: Bantam Books, 1962
I think youre begging the question, said Haydock, and I can see
looming ahead one of those terrible exercises in probability where six
men have white hats and six men have black hats and you have to
work it out by mathematics how likely it is that the hats will get mixed
up and in what proportion. If you start thinking about things like that,
you would go round the bend. Let me assure you of that!

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

3 / 31

Probability

Probability is the language of Statistics


Two schools of thought within Statistics: Frequentist and Bayesian
Frequentist probability
The long run relative frequency of an outcome or event
Think: Rolling dice, gambling odds etc.
Subjective probability
We can put a measure on a degree of belief
If we are uncertain about something, we express the uncertainty
with a probability distribution
Bayesians work with both kinds of probabilities

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

4 / 31

Probability

Conditional Probability

Conditional probability
Given events A and B in some sample space S,
P(A|B) =

P(A and B)
P(B)

P(A|B) =

P(B|A)P(A)
P(B)

Bayes Rule

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

5 / 31

Probability

Conditional Probability

Inverse probability problems


Turning the conditional probability around

We know that if a die is fair we have a 1/6 chance of getting a six


We know that if a coin is fair we have a 1/2 chance of getting a
head
Turn this around: How can we know if a coin is fair?

Am I fair?
To investigate we toss the coin!
How many times? When are we convinced whether the coin is fair
or not?

As we toss it we get more and evidence and update our belief


about the what the probability of getting a head is.
The tool we use for this updating is Bayes rule
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

6 / 31

Probability

Example: Special coin

Coin example
You have 4 visually identical coins in your pocket - 3 are standard
quarters and the 4th is a special coin.
The special coin appears identical to the 3 quarters, but has an
70% chance of landing heads up.
You reach into your pocket and randomly select a coin and toss it.
Suppose it lands heads up. What is the probability that the coin is
the special coin?

Which one is special?

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

7 / 31

Probability

Example: Special coin

Coin example continued


Start by defining a convenient notation:
Q = standard quarter,
S = special (weighted) quarter,
H = a coin lands with the head facing up
We know the following:
P(Q) = 0.75 and P(S) = 0.25
P(H | Q) = 0.50 and P(H | S) = 0.70
What is the probability that the coin the special quarter S?
Using Bayes Rule:
P(S|H) =

P(H | S)P(S)
0.70 0.25
=
P(H)
P(H)

But we also need P(H)


Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

8 / 31

Probability

Example: Special coin

Coin example continued


Law of total probability
P(B) = P(B and A) + P(B and Ac )
= P(B|A)P(A) + P(B|Ac )P(Ac )
Recall that P(Q) = 0.75, P(S) = 0.25, P(H | Q) = 0.50 and
P(H | S) = 0.70 so we get
P(H) = P(H | S)P(S) + P(H | Q)P(Q)
= 0.70 0.25 + 0.50 0.75 = 0.55
So we get
P(S|H) =
Jenn & Yifang (SAMSI & NCSU)

0.70 0.25
= 0.318
0.55

Bayesian Methods

May 15, 2012

9 / 31

Probability

Example: Special coin

Coin example continued

What is the probability that the coin is the regular Q?


P(H | Q)P(Q)
P(H | S)P(S) + P(H | Q)P(Q)
0.50 0.75
=
= 0.681
0.70 0.25 + 0.50 0.75

P(Q|H) =

More general form of Bayes Rule


P(A|B) =

Jenn & Yifang (SAMSI & NCSU)

P(B|A)P(A)
P(B|A)P(A) + P(B|Ac )P(Ac )

Bayesian Methods

May 15, 2012

10 / 31

Bayesian statistics

Bayesian Statistics

Thomas Bayes (1701 - 1761) - Introduced Bayes formula 1761,


published posthumously in 1763 (by his friend Richard Price)
More general formula presented in 1773 by Pierre Simon Laplace.
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

11 / 31

Bayesian statistics

Posterior distribution

Bayes Rule
P(A|B) =

P(B|A)P(A)
P(B|A)P(A) + P(B|Ac )P(Ac )

Posterior distribution
( | x) = R

f (x | )()
)d(
)

f (x | )(

Prior distribution: Describes our current (prior) knowledge about


(or A). Can be subjective.
Likelihood: Distribution for the data (as a function of the
parameter).
Posterior distribution: Our updated knowledge about (or A) after
seeing the data.
Marginal distribution
Frequentist inference is based only on the Likelihood
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

12 / 31

Bayesian statistics

Coin example in Bayesian framework

Coin example in Bayesian framework


Coin example - continued
Suppose we take that same coin and toss it 3 more times, and each
toss resulted in a heads.
Now what is the probability that the coin is the special quarter?
Previous results: P(Q) = 0.75, P(S) = 0.25, P(H | Q) = 0.50,
P(H | S) = 0.70
P(Q | H) = 0.682, P(S | H) = 0.318
We will use Bayes rule again, but what is our Prior, i.e. current
view of P(Q) and P(Q)?
P(Q) = 0.75 or 0.682 ?
P(S) = 0.25 or 0.318 ?
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

13 / 31

Bayesian statistics

Coin example in Bayesian framework

Coin example in Bayesian framework Continued


Likelihood:
P(H | Q) = (0.5)3
P(H | S) = (0.7)3

Posterior:
(0.5)3 0.682
= 0.439
(0.5)3 0.682 + (0.7)3 0.318
(0.7)3 0.318
P(S | HHH) =
= 0.561
(0.7)3 0.318 + (0.5)3 0.682

P(Q | HHH) =

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

14 / 31

Bayesian statistics

Coin example in Bayesian framework

Coin example in Bayesian framework Continued

In the first round we had 1/4 chance of having the special coin
Then we got more information through our first experiment (H) and
the chances of having the special coin increased to 31.8%
After obtaining more data (HHH) we again updated our knowledge
and the chances that we have the special coin is 56.1%
Notice how we used the result in the first step, i.e the posterior
probability, as our prior in the next step.
This is an example of a Sequential Bayesian analysis
Todays posterior distribution is tomorrows prior

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

15 / 31

Bayes for the Normal distribution

Normal Distribution
Normal density: N(, 2 )
1
2 2

(x)2
2 2

0.5

0.30

f (x) =

=2

0.3

0.20

0.4

=8

0.1

=6

0.10

0.2

=1

= 12

0.0

0.00

=4

10

15

Jenn & Yifang (SAMSI & NCSU)

20

Bayesian Methods

10

15

May 15, 2012

16 / 31

Bayes for the Normal distribution

Example: Jeremys IQ

Bayesian analysis: Normal - Normal


Example

Jeremys IQ
Jeremy, an enthusiastic Georgia Tech student, poses a statistical
model for his scores on standard IQ tests. He thinks that, in general,
his scores (Y) are normally distributed with unknown mean and the
variance of 80. Prior (and expert) opinion is that the IQ of Georgia
Tech students, is a normal random variable, with mean 110 and the
variance 120. Jeremy took two IQ tests and scored 98 in the first and
104 in the second. What is the Bayesian estimate of Jeremys IQ?
= 101
The frequentist estimator of would be y
To find the Bayesian estimate we want to find the mean of the
posterior distribution of Jeremys IQ.
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

17 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


A general setup

Suppose we have independent data Yi sampled from a normal


distribution:
Yi | N(, 2 ) i = 1, . . . , n
has the following normal
The the sample average X = Y
distribution
Likelihood: X | N(, 2 /n)
Suppose our prior information about can be described with the
following normal distribution:
N(0 , 0 )
Then what is the posterior distribution for ?
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

18 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


Posterior density:
p(X |)p()
p(X |)p()
p(X |)p()d
(
)


(X )2
( 0 )2
exp
exp
2 2 /n
202
(
!)
2
X
2
0
1
2 2 + 2 2 2
exp
2 2 /n
/n 0
0
!
!!)
(
1
1
1
X
0
2
= exp

+
2
+
2
2 /n 02
2 /n 02

p(|X ) = R

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

19 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


A normal distribution for :






(
)2
1 2
2 2
f () exp
exp
2
2
2
2

From last slide:



 



1
1
1
X
0
p(|X ) exp
2
+

2
+
2
2 /n 02
2 /n 02

So we can see that the posterior has to be a normal distribution.


We simply have to determine
and
2.
Set

2 =

1
1
+ 2
2
/n 0

1
=

02 2 /n
02 + 2 /n

Then we have
(

1
p(|X ) exp
2
Jenn & Yifang (SAMSI & NCSU)

2
Bayesian Methods

X
0
+ 2
2
/n 0

!!)

May 15, 2012

20 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


From
(

1
p(|X ) exp
2

2
2

X
0
+
2 /n 02

!!)

we see that

X
0
+ 2
2
/n 0

2 2 /n
= 20 2
0 + /n

X
0
+ 2
2
/n 0

02
2 /n
X
+
0
02 + 2 /n
02 + 2 /n

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

21 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


Summary

For
| N(, 2 /n)
Likelihood: Y
is the average of Y1 , . . . , Yn , and
where Y
Prior: N(0 , 02 )
we have
Posterior: | X N(
,
2)
where

2 =
Jenn & Yifang (SAMSI & NCSU)

2
02
+ /n 0
Y
02 + 2 /n
02 + 2 /n

02 2 /n
02 + 2 /n
Bayesian Methods

May 15, 2012

22 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


Posterior distribution for Jeremys IQ

Recall:
| N(, 2 /n = 80/2)
Likelihood: Y
Prior: N(0 = 110, 02 = 120)
The posterior distribution of Jeremys IQ is normal with mean and
variance
2
02
+ /n 0
Y
02 + 2 /n
02 + 2 /n
120
80/2
=
101 +
110 = 103.25
120 + 80/2
120 + 80/2
2 2 /n
120 80/2

2 = 2 0 2
=
= 30
120 + 80/2
0 + /n

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

23 / 31

Bayes for the Normal distribution

The Normal - Normal Bayesian analysis

Bayesian analysis: Normal - Normal


Credible interval for Jeremys IQ

In Bayesian analysis the result is the posterior distribution


But sometimes we want an estimate of our parameter with
uncertainty bounds
A common Bayesian estimator is the posterior mean,

Uncertainty bounds are found from percentiles of the posterior


distribution
A 95% credible interval for is

1.96
= 103.25 1.96 30 = (92.5, 114.0)
Interpretation: There is 95% chance that Jeremys IQ () is
between 92.5 and 114
That is NOT the interpretation of the frequentist confidence
interval!
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

24 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Interplay of Prior information and data

The posterior mean is a combination of the prior mean and the


data (the sample mean):

2
02
+ /n 0
Y
02 + 2 /n
02 + 2 /n

If we are very certain in our prior beliefs (0 is small) we go with


the prior mean
If our data is good ( is small or n is large) we go with the data
mean
Interactive demonstration ...

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

25 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Interplay of Prior information and data

2.0

N = 5, prior mean = 5, prior sd =0.5

1.0
0.0

0.5

Density

1.5

Likelihood
Prior
Posterior

10

15

20

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

26 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Interplay of Prior information and data

2.0

N = 5, prior mean = 5, prior sd =2

1.0
0.0

0.5

Density

1.5

Likelihood
Prior
Posterior

10

15

20

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

26 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Interplay of Prior information and data

2.0

N = 30, prior mean = 5, prior sd =0.5

1.0
0.0

0.5

Density

1.5

Likelihood
Prior
Posterior

10

15

20

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

26 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Interplay of Prior information and data

2.0

N = 5, prior mean = 12, prior sd =0.5

1.0
0.0

0.5

Density

1.5

Likelihood
Prior
Posterior

10

15

20

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

26 / 31

Bayes for the Normal distribution

The interplay of prior information and data

Connection to Kalman Filter


The posterior variance can alternatively be written as
!
2 2 /n
2

2 = 2 0 2
= 1 2 02
02
0 + /n
0 + /n
The posterior variance can alternatively be written as

2
02
+ /n 0
Y
02 + 2 /n
02 + 2 /n

= 0 +


02
0
Y
2
0 + 2 /n

These alternative forms are what you will see in Nates lecture on the
Kalman Filter - but for the multivariate case.
Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

27 / 31

Bayes for the Normal distribution

The interplay of prior information and data

More about Bayesian statistics


Kalman Filter can be viewed as Bayesian updating
I.e. when the distributions are normal, variances are known and the
model is linear - see Nates lecture

When we are not operating with the normal distribution with a


known , obtaining the posterior distribution can be difficult or not
analytically possible
The reason is the integral in
( | x) = R

f (x | )()
)d(
)

f (x | )(

Instead we often try to approximate the posterior density or obtain


(approximate) samples from the posterior distribution
E.g. Monte Carlo Methods - see lecture by Alex, Chia and Jessi

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

28 / 31

Activity

Activity
What is the average height of NBA players?

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

29 / 31

Activity

Activity
Hint for determining the prior

99.7%

4 3 2

+ + 2 + 3 + 4

Almost all (99.7%) samples from a normal distribution fall within 3


standard deviations from the mean

Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

30 / 31

Activity

The end

Eyjafjallajkull three weeks before it erupted in 2010


Jenn & Yifang (SAMSI & NCSU)

Bayesian Methods

May 15, 2012

31 / 31

You might also like