Professional Documents
Culture Documents
Sam Roweis
O tober 2, 2002
E [a = a
X
x
p(x)a(x)
Probability
B (x and y ) = g [B (x); B (y x) = g [B (y ); B (x y )
Joint Probability
p(x,y,z)
y
x
Conditional Probability
p(x,y|z)
y
x
Bayes' Rule
p(x y ) =
p(y x)p(x)
p(y )
=P
p(y x)p(x)
0 0
x p(y x )p(x )
0
Marginal Probabilities
X
y
p(x; y )
p(x,y)
z
y
p(x,y)
p(x)
x
=
p(y)
8z
Entropy
X
x
Expe
ted value of log p(x) (a fun
tion whi
h depends on p(x)!).
H (p) > 0 unless only one possible out
omein whi
h
ase H (p) = 0.
Maximal value when p is uniform.
Tells you the expe
ted "
ost" if ea
h event
osts log p(event)
Be Careful!
Jensen's Inequality
are onvex
probability mass fun
tions p(x = k ) (for dis
rete variables) tell us
how likely it is to get a parti
ular value for a random variable
(possibly
onditioned on the values of some other variables.)
We
an
onsider various types of variables: binary/dis
rete
(
ategori
al),
ontinuous, interval, and integer
ounts.
For ea
h type we'll see some basi
probability models whi
h are
parametrized families of distributions.
For dis rete ( ategori al) quantities, the most basi parametrization
p(x,y,z)
p(x,y|z)
y
x
y
x
Statisti s
Exponential Family
>
Poisson
= log
T (x) = x
A( ) = = e
h(x) =
1
x!
Multinomial
xj
p( ) =
k!
x1!x2!
xn!
x1 x2
2
8
<X
xj
x) exp
p( ) = h(
nP
n
1
i=1
log
n 1
i=1 i.
i
n xi
+ k log n
i = log i log n
T (xi) = xi
P
A( ) = k log n = k log i ei
h( ) = k !=x1!x2!
xn!
xi log i
9
=
;
Bernoulli
= exp log
x + log(1
)
= log
1
T (x) = x
A( ) = log(1 ) = log(1 + e )
h(x) = 1
The logisti
fun
tion relates the natural parameter and the
han
e
of heads
1
1+e
The softmax fun
tion relates the basi
and natural parameters:
i
ei
j
je
=P
)
>
(x
1
)
x xx
1=2 1
>
=2
Important Gaussian Fa ts
p(y|x=x0)
p(x,y)
x0
p(x)
Gaussian (normal)
2 ( 2
x2
1
x
= p exp 2
22
2
2
22
log
= [=2 ; 1=22
T (x) = [x ; x2
A( ) = log + =2 2
p
h(x) = 1= 2
Gaussian Marginals/Conditionals
>
> >
z=
x N a ; A C
b C B
y
>
x N (a; A)
y N (b; B)
xjy N (a + CB (y b); A CB C )
yjx N (b + C A (x a); B C A C)
1
>
>
>
Parameterizing Conditionals
>
= ()
1
f () =
()
Moments
= mean
= varian
e
x) / expf
p(
x)g
Hk (
x) =
H(
ai x i +
wij xixj
pairs
ij
Learning.
Set the parameters of the density fun
tions given some (partially)
observed data to maximize likelihood or penalized likelihood.
Learning
x) = p(xj)
x) = log p(xj)
D)
D) + r()
Known Models
Maximum Likelihood
Dj) =
p(
`(;
D) =
Y
m
X
m
x j
log p(xmj)
p( m )
ML
= argmax `(; D)
Example: Multinomial
Xm
log k
= log
xm
X
m
m
m
[x =1
[x =k
1
: : : k
[x = k =
m
) k =
Nk
k
Nk
M
Nk log k
k
k k
= 1):
1 2
log(22)
1 X (xm )2
2 m
2
= (1=2) m(xm )
P
= 2M2 + 21 4 m(xm )2
P
) ML = (1=M ) P
m xm
2
ML = (1=M ) m x2m 2ML
`
`
2
x
mX
= log
(1
m
)1 x
xm + log(1
) =
ML
)
)NT
NH
NT
1
NH
NH + NT
X
m
(1
xm)
B
A
y
x
x
x
x
x
x
x x
x
x
x
x
x
x
x
x
x
>
jx
j x; )
p(y ; ) = gauss(y
>
)
ML
xm)xm
m
= (X X) X Y
=
>
(ym
>
>
`( ;
D) = log p(Dj) =
X
m
log h(xm)
M A( )+
T ( m)
= m T (xm) M A()
P
)
= M1 m T (xm)
P
1
m
ML = M
m T (x )
`
A( )
>