1
y yi i i (1 i )ni yi
The Laplace approximation for a binomial via moment matching is thus yi N [ni i , ni i (1 i )] A Beta distribution has the Laplace approximation 1 (1 )1 N[ 1 ( 1)( 1) , ] + 2 ( + 2)3
Thus our prior has the approximation i w1 N [ 1 1 (1 1)(1 1) 2 1 (2 1)(2 1) , ] + (1 w1 )N [ , ] 3 1 + 1 2 (1 + 1 2) 2 + 2 2 (2 + 2 2)3
I believe we want to calculate the distribution of p(Y |j , j , w1 )
K K
p(Y |j , j , w1 ) =
i
N [ni i , ni i (1 i )]
i
p(i |j , j , w1 )d
This would appear to be a complete mess that I spent far too much time attempting to sort out. However, if we multiply the likelihood and the prior together rst:
y i i (1 i )ni yi [w1 1 1 (1 )1 1 + (1 w1 )2 1 (1 )2 1 ]
=
j
wj i i
y +j 1
(1 i )ni yi +j 1
Which than can then be approximated more easily by w1 N ( (yi + 1 1)(ni yi + 1 1) yi + 1 1 , ) ni + 1 + 1 2 (ni + 1 + 1 2)3 yi + 2 1 (yi + 2 1)(ni yi + 2 1) , ) ni + 2 + 2 2 (ni + 2 + 2 2)3
+(1 w1 )N (
This has total likelihood (2 lines)
n 2
ln
i j
wj (
(yi + j 1)(ni yi + j 1) 1/2 ) (ni + j + j 2)3
exp[(
yi + j 1 (yi + j 1)(ni yi + j 1) yi )2 /2 ] ni ni + j + j 2 (ni + j + j 2)3
2 For simplicity in further notes, let the mean be uj and the variance j . We will want to set up an EM algorithm as follows for that mess: 0 0 0 0) Select initial estimates for j , j , w1
1) E-Step Find the expected value of a hidden variable t t t estimates of j , j , wj
i,j
i,j
for each observation based on current
2 t t wj /j exp[(yi /ni ut )2 /2j ] j 2 k t t 2 wk /k exp[(yi /ni ut )2 /2k ] k
2) M-Step t+1 Set wj = Solve
n i i,j /n 2
arg max
j ,j i j
t i,j ln
1 t2 exp[(yi /ni ut )2 /2j ] j t j
This is actually a massive pain to solve, and I have not managed to do so. As such, only skeleton code for the EM algorithm is submitted. 2 ui is Normal(u, 2 ) so vi = a + bui is Normal(a + bu, b2 2 ). With the constraint on 2 2 being diagonal, and independent errors for x and y ( 2 = y = x ), we thus have: xi yi N( u a + bu , 2 0 2 2 2 ) 0 b + y
Thus the likelihood given n observations is L[(x, y)|a, b, u, ] = (
2
1 ) exp[ ( 2 (b2 2 + 2 ) 2
n
(xi u)2 + 2
(yi a bu)2 )] b2 2 + 2
Given the structure of these two observations (MV Normal with marginal Normals), it makes sense to me to model this as a regression problem. In particular, we have xi = ui +
1 2
yi = a + bui +
i
N (0, 2 )
1
yi = a + bxi + b Thus we end up with:
yi N (a + bxi , b2 2 + 2 ) Y |X, 2 , a, b N56 (a + bX, (b2 2 + 2 )I) However, after considerable issues attempting to sort out how to set this up, I give up and fall back to a regression model with Zellners G-Prior (thus the estimate will have an incorrect variance structure): y|, 2 , X N56 (X , 2 I) P (| 2 , X) N2 (0, c 2 (X T X)1 ), P ( 2 ) 2 Posterior N56 (X , 2 I) N2 (0, c 2 (X T X)1 ) 2 Conditionals c c B, 2 (X T X)1 ) c+1 c+1 56 s2 1 P ( 2 |y, X, ) IG( , + ()T X T X()) 2 2(c + 1) 4 P (| 2 , y, X) N2 (
0
Using a Gibbs sampler: 1) Select initial 0 , 2 2) Update with 2
t+1
= P ( 2 | t , y, X)
t+1
t+1 = P (| 2
, y, X) estimates: Variance 0.051504521 0.001409253 0.051273862 3
We arrive at the Estimate a 5.1437016 b -0.2699121 2 1.1667646
Convergenge plots are in p2.png 3 Our K observations are yi the number of survivors out of ni the total number, with three factor variables: sex, age, and passenger type. Using a logistic model with a at prior, we thus have the posterior (in a form that makes it easier to work with in R)
K
p[B|X, (Y, N )]
i=1 K
(i )yi (1 i )ni yi
p[B|X, (Y, N )]
i=1
exp[X i B] yi 1 ) ( )ni yi i B] 1 + exp[X 1 + exp[X i B]
The initial estimate of B, is the maximum likelihood estimator. Metropolis-Hasting is thus updated by 1) Generate B Nk (B t1 , T ) with T as a scale factor
p( 2) Calculate p = min(1, p(BB|y) ) t1 |y)
3) Update B t = B with probability p, B t1 with probability 1-p Our parameter estimates for a no-interaction model are: Estimate (Intercept) 2.0629318 sexmale -2.4375442 childkid 1.0813796 class2 -1.0128018 class3 -1.7921001 classC -0.8602009 Variance 0.03059069 0.02177024 0.05988409 0.03982714 0.03021637 0.02758146
This is with 5000 samples after a burn-in of 10000. Graphs for convergence is p3m1a.png. Convergence seems ne. And for the interaction model:
(Intercept) childkid sexmale class2 class3 classC childkid:sexmale childkid:class2 childkid:class3 sexmale:class2 sexmale:class3 sexmale:classC
Estimate 3.8332383 41.639925 -4.5582154 -1.9454472 -3.9939029 -1.8018233 0.6690028 -28.870286 -41.665613 0.2339378 3.0609728 1.2792995
Variance 0.4698347 85.9295259 0.4948292 0.6169035 0.4848646 1.031236 0.2806819 114.4440457 85.8944413 0.7066308 0.5219417 1.0829465
This is with 10,000 samples after a burn-in of 100,000. Graphs for convergence are p3m2a.png, p3m2a.png. We still do not have good convergence for several variables, particularly childkid and its interactions with class. While these are notably dierent from the MLE estimates, further inspection (ie/ throwing them into the link function and calculating the probability) indicate they make some sense. Eectively these estimates make a stronger statement for a high survival rate of the base case (female rst class) compared with the MLE estimate, which has the base rate probability more or less at 0.5, with the other estimates pulling down the probability (or drastically increasing it!) for various cases. I was not able to satisfactorily calculate Bayes factors - is this because of the at prior or my mistake?