Professional Documents
Culture Documents
)
• Exact Bayesian inference for Logistic
Regression p(C)p(
1|) (w
T
w)dw is intractable, because:
posterior
• Equivalent to finding Hessian matrix
SN ln p(w | t) 0 S yn (1-
1
n nnT
y ) n i1
Evaluation of Posterior Distribution
• Gaussian prior
p(w)=N(w|m0,S0)
– Where m0 and S0 are hyper-parameters
• Posterior distribution
p(w|t) p(w)p(t|
w) where t =(t1,..,tN)T
N
– Substituting = y {1
1− n
p(t | w) ∏
tn
t
n n
−y n=
1 1
}
n
ln p(w|t) (w m0 )1S0 (w m0 )
2
T
(t n ln yn (1 t n )ln(1 yn )
i1
const
• yn ) (w
where T
n
Gaussian Approximation of Posterior
• Maximize posterior p(w|t) to give
– MAP solution wmap
• Done by numerical optimization
– Defines mean of the Gaussian
• Covariance given by
– Inverse of matrix of 2nd derivatives of negative n
1
SN ln p(w|t) S y n (1 yn )n nT
log-likelihood 0
i1
• Predictive distribution is
p(C1 | t) (a)
p(a)da = (a)N a | a , 2
a da
• Convolution of Sigmoid-Gaussian is intractable
• Use probit instead of logistic sigmoid
1
0.5
0
!5 0 5
Approximation using Probit
p(C1 | t)= (a)N a | a , a2 da
• Use probit which is similar to Logistic sigmoid 1
– Defined as
a
• Approximate σ(a)
0,1)d
by Φ(λa) 0
! ( 2
a )a)
where (
( 2 ) (1 2 /
8)1/2
Probit Classification
Applying it to
p(C1 | t)= (a)N a | a ,2a
da
We
have p(C1 | , t) a2 ) a
(
where
a w T
a2 T S N
map
Decision boundary corresponding to p(C1|ϕ,t) =0.5 is given by
a 0
This is the same solution as
w Tmap 0
Thus marginalization has no effect!
When minimizing misclassification rate with equal prior probabilities
)
• Bayesian Logistic Regression is intractable
• Using Laplacian the posterior parameter
distribution p(w|t) can be approximated as a
Gaussian
• Predictive distribution is convolution of
sigmoids and Gaussian p(C | ) !
1
(w )q(w)dw
T