Professional Documents
Culture Documents
Convex Optimization
Lecture 16 - Softmax Regression and Neural Networks
Fall 2017
1 / 19
Softmax Regression Neural Networks
Today’s Lecture
1 Softmax Regression
2 Neural Networks
2 / 19
Softmax Regression Neural Networks
Outline
1 Softmax Regression
2 Neural Networks
3 / 19
Softmax Regression Neural Networks
Softmax Regression
extend logistic regression to multi-class classification
hypothesis:
exp(x (1)T a)
P(b = 1 | a; x)
.. 1 ..
hx (a) = = PK
. (k)T
.
k=1 exp(x a)
P(b = K | a; x) exp(x (K )T a)
parameters to learn:
x = [x (1) , . . . , x (K ) ] ∈ Rn×K
4 / 19
Softmax Regression Neural Networks
m Y K
" #1 (i)
Y exp(x (k)T a(i) ) {b =k}
PK (j)T a(i) )
i=1 k=1 j=1 exp(x
log-likelihood is
m X
K
X exp(x (k)T a(i) )
`(x) = 1{b(i) =k} · log PK
(j)T a(i) )
i=1 k=1 j=1 exp(x
gradient is
m
!
∂`(x) X (i) exp(x (k)T a(i) )
= a · 1{b(i) =k} − PK
∂x (k) i=1 j=1 exp(x
(j)T a(i) )
5 / 19
Softmax Regression Neural Networks
Outline
1 Softmax Regression
2 Neural Networks
6 / 19
Softmax Regression Neural Networks
number of layers n` = 3
(`)
the activation (i.e., output) of unit i at layer `: ai
computation:
(2) (1) (1) (1) (1)
a1 = f (W11 x1 + W12 x2 + W13 x3 + b1 )
(2) (1) (1) (1) (1)
a2 = f (W21 x1 + W22 x2 + W23 x3 + b2 )
(2) (1) (1) (1) (1)
a3 = f (W31 x1 + W32 x2 + W33 x3 + b3 )
(3) (2) (2) (2) (2) (2) (2) (2)
hW ,b (x) = a1 = f (W11 a1 + W12 a2 + W13 a3 + b3 )
11 / 19
Softmax Regression Neural Networks
where
(`−1) (`−1)
aj = f (zj )
12 / 19
Softmax Regression Neural Networks
13 / 19
Softmax Regression Neural Networks
14 / 19
Softmax Regression Neural Networks
1
2
J(W , b; x (i) , y (i) ) =
hW ,b (x (i) ) − y (i)
2
characteristics:
• nonconvex – gradient descent used in practice
• initialization: small random values near 0 (but not all zeros)
15 / 19
Softmax Regression Neural Networks
16 / 19
Softmax Regression Neural Networks
18 / 19
Softmax Regression Neural Networks