Professional Documents
Culture Documents
Unit 3-Generative Models
Unit 3-Generative Models
where we defined
a = ln p(x|C1 )p(C1) (2)
p(x| 2 2
C )p(C )
1
σ(a) is the logistic sigmoid
function (plotted in red)
1
σ(a) = (3) 0.5
1 + exp
(−a)
or squashing function, because
it maps R(whole real axis) 0
into a finite interval −5 0 5
Probabilistic generative models (cont.)
σ
a= (5)
1−
ln
σ
It represents the log of the ratio of probabilities for
two classes
ln (p(C1|x)/p(C2 |x))
= p(x|C1)p(C1)
p(C1|x)
p(x|C1 )p(C1) + p(x|C2)p(C2)
1
=
1+ — lnp(x|C1 )p(C1 )
p(x| 2 2
exp ` C )p(C ) ˛¸
x
a
= σ
ln p(x|C12 )p(C12)
` p(x|C˛¸ )p(C x)
a
(7)
ak = ln p(x|
Ck )p(Ck ) ≃1
If ak >> aj , for all j /= k , p(Ck |x)
then ≃0
p(Cj |x)
We analyse the consequences of choosing specific forms for class-conditional
densities
1
It is a multiclass generalisation of the logistic sigmoid and it is also known as the softmax
function
Continuous inputs
Probabilistic generative models
Continuous inputs
and, we want to explore the form of the posterior probabilities p(Ck |x)
The Gaussians have different means µ k but share the same covariance matrix Σ
Continuous inputs (cont.)
1
With 2 classes, p(C1 |x) = p(x|C1 )p(C1) = = σ(a)
1 + exp
p(x|C1)p(C1) + p(x|C2)p(C2)
and a = ln p(x|C1 )p(C1) , we (−a)
have 2 2
p(x|C )p(C )
(9)
p(C1|x) = σ(w T x + w0 )
wher
e
w = (10)
Σ −1
(µ 1 − µ 2 )
1 1
w0 = − −1 −1 p(C1) (11)
µ 1Σ µ1 + µ2 Σ µ 2 + ln
2 2 p(C2)
The quadratic terms in x from the exponents of the Gaussian densities have
cancelled (due to the assumption of common covariance matrices) leading to
◮ a linear function of x in the argument of the logistic sigmoid
Continuous inputs (cont.)
The left-hand plot shows the class-conditional densities for two classes over 2D
Decision boundaries are surfaces with constant posterior probabilities p(Ck |x)
◮ Given by linear functions of x
◮ Decision boundaries are linear in input space
Prior probabilities p(Ck ) enter only through the bias parameter w0 so changes in priors
have the effect of making parallel shifts of the decision boundary
◮ more generally of the parallel contours of constant posterior probability
Continuous inputs (cont.)
p(x|Ck )p(Ck ) exp (ak ) and
For the K -class case, using p(Ck |x) = Σ K p(x| j j
= Σ K
exp (aj )
j =1 j =1
C )p(C )
ak = ln (p(x|Ck )p(Ck )), we have
(12)
ak (x) = w Tkx + wk0
(13)
wk = Σ−1µk
1 T −1
wk0 = − µk Σ (14)
2 µ k + ln p(Ck )
2.5
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2.5
−2 −1 0 1
2
Consider first the two-class case, each having a Gaussian density with
shared covariance matrix Σ , and suppose we have data {x n , t n } N
probability p(C2) = 1 − π
=
p(xn , C1 ) p(C1 )p(xn |C1 ) = πN (xn |µ1 , Σ)
p(x , C2 ) p(C )p(x |C ) = (1 − π)N (x |µ , Σ)
For a data point xnn from = C1 (C22), we nhave2 t n = 1 (t n = 0) nand2thus
class
ΣN
t` n ln (π) + (1 −˛¸ t n ) ln (1 − π)x+ t`n ln (N (x˛¸n |µ 1 , Σ ))
x + `(1 − t n ) ln (N
˛¸ (xn |µ 2 ,
n=1 µ 1 ,Σ µ 2 ,Σ
xΣ )) π
` ˛¸ x
µ 1 ,µ 2 ,Σ
P robabilistic generative
models
ΣN
(16)
t n ln (π) + (1 − t n ) ln (1 − π)
n=1
1 ΣN N1 N1 (17)
π= tn = N = N1 + 2
N n=1
N
The maximum likelihood estimate for π is the fraction of points in C1
This result is easily generalized to the multiclass case where
again the maximum likelihood estimate of the prior probability
associated with class Ck is given by the fraction of the training
set points assigned to that class.
ΣN 1 (18)
t n ln N (xn |µ 1 , Σ ) = − 2 t n (xn − µ 1 )T Σ −1
(xn − µ 1 ) + const
Σn=1
N n=1
1 ΣN
µ1 = t n xn
N
(19)
1 n=1
1 ΣN
P robabilistic generative
models
ΣN
N N
=− ln |Σ | − (21)
Tr(Σ −1 S) 2 2
Where we have defined,
N1 N2
S = S1 + (22)
S2
1 Σ
N N
S1 = (xn − µ 1)(xn − T
(23)
N
1
n∈C1 µ )
1
1 Σ T
= (24)
S2 N (xn − µ 2 )(xn − µ 2)
2 n∈C2
P robabilistic generative
models
N1 1 Σ N2 1 Σ
Σ = S= (xn − µ 1)(xn − µ )T T
NN NN (xn − µ 2 )(xn − µ 2)
1
n∈C1 +1 2
n∈C2