Professional Documents
Culture Documents
Arnaud Doucet
Email: arnaud@cs.ubc.ca
1
CS students: dont forget to re-register in CS-535D.
2
2.1 Outline
Bayesian Statistics.
f ( x| ) ()
( | x) = .
f ( x| ) () d
Proof:
2
var () = E E ()2
2
= E E X (E (E ( | X)))2
2
= E E X E (E ( | X))
2
2 2
+E (E ( | X)) (E (E ( | X)))
We have
1
1
(x) = Pr ( X = x| ) () d = for x = 0, ..., n
0 n+1
Pr ( Y = 1| x) = Pr ( Y = 1| , x) ( | x) d
x+1
= ( | x) d = E [ | x] = .
n+2
Consider X1 | N , 2
and N 0 , 0
m 2
2 2
(x1 ) ( m0 )
( | x1 ) f ( x1 | ) () exp
2 2 202
2 x
1 1 1 m
exp + + + 2
2 2 02 2
1 2
exp 2 ( m1 )
21
| x1 N m1 , 12
2 2
1 1 1
with 2 = + 2 1 = 2
2 0
,
1 02 0 + 2
2 x1 m
m1 = 1 + 2 .
2 0
To predict the distribution of a new observation X| N , 2
in light
of x1 we use the predictive distribution
f ( x| x1 ) = f ( x| ) ( | x1 ) d
E [ X| x1 ] = E [ + V | x1 ] = E [ | x1 ] = m1 ,
Now assume that you observe a realization x2 of X2 | N , . 2
f ( x2 | ) ( | x1 )
f ( x1 | ) ( | x2 ) .
Posterior of at time n is
| x1 , ..., xn N mn , n2
where
1 1 n 02 2 2
= 2 + 2 n =
2
,
n2 0 n02 + 2 n n
n n
i=1 xi m i=1 xi
mn = n2 + .
2 02 n n
Asymptotically in n the prior is washed out by the data and
E [ | x1 , ..., xn ] = mn M L .
f ( x| x1 , ..., xn ) .
i.i.d.
Assume you have some couting observations Xi P (); i.e.
xi
f ( xi | ) = e
xi !
1
() = Ga (; , ) = e .
()
We have
n
( | x1 , ..., xn ) = Ga ; + xi , + n .
i=1
If we want to test H0 : 1
2 vs H1 : < 1
2 then, in a Bayesian approach,
you can simply compute
1
( H0 | x) = 1 ( H1 | x) = ( | x) d.
1/2
( H1 | x) (H0 )
=
( H0 | x) (H1 )
Bayes factors are not limited to the comparison of models with the
same parameter space.
- if log10 (B10 ) varies between 0 and 0.5, the evidence against H0 is poor,
- if it is above 2, it is decisive.
Bayes factor tell you where one should prefer H0 to H1 : it does NOT
tell you whether model H1 any of these models are sensible!
Bayes procedures can be directly used to test point null hypothesis; i.e.
H0 : = 0 (that is 0 () = 0 ()) versus H1 : 1 where the prior
is then dened as
() = (H0 ) 0 () + (H1 ) 1 ()
( x| H1 ) f ( x| ) 1 () d
B10 (x) = = .
( x| H0 ) f ( x| 0 )
Assume you have a coin, you toss it 10 times and gets x = 10 heads.
Is it biased?
1
In a Bayesian framework, we test H0 versus H1 : U 2 , 1 using
1 x 10x 1
1
2 1 (1 ) d 1
2 1 10 d
1 10
50.
2 2
B10 = 1 x
1 10x
=
2 1 2 2
Assume you have X| , 2
N , 2
where 2 is assumed known but
We will see that next week but using vague priors for model selection
is a very very bad idea... (Lindleys paradox).