You are on page 1of 10

Discussion 5 – Loss Function

Cross Entropy Loss


Recap: SVM Loss Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class

Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧  if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖

𝑠 𝑗 − 𝑠 𝑦 +1∧  otherwise
𝑖

Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1

𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖

Square Hinge Loss


𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Recap: Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1

Data loss: Model predictions Regularization: Prevent the model


should match training data from doing too well on training data
Recap: Regularization
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln ⁡𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

Softmax Function 𝐿𝑖 =− ln ⁡
𝑠𝑘 ∑ 𝑒
𝑠 𝑗

𝑒 𝑗
𝑠𝑘
Cat 3.2
𝑒 ∑𝑒 𝑠𝑗

negative
𝑗
Car 5.1 24.53 0.13 log-likelihood
Frog -1.7 exp 164.02 norm 0.869 𝐿𝑖 =− ln ⁡(0.13)=2.04
0.18 0.001

unnormalized normalized prob


probability (sum to 1) Slide Credit : Stanford CS231n
Entropy: Information Theory 𝑃( 𝑦)
𝐷 𝐾𝐿 ( 𝑃∨¿𝑄 )=∑ 𝑃 ( 𝑦 ) ln KL Divergence
Measure of Uncertainty 𝑄 (𝑦)
H(Q) HH TT HT TH H(P) HH TT HT TH
Probability 0.25 0.25 0.25 0.25 Probability 0.5 0.25 0.125 0.125
Bit representation 00 01 10 11 Bit representation 1 01 000 001
No. of Bits 2 2 2 2 No. of Bits 1 2 3 3
−l o g 2 ⁡Q −l o g 2 ⁡P
Entropy, 0.5 0.5 0.5 0.5 Entropy, 0.5 0.5 0.375 0.375
−Q l o g 2 ⁡Q Expected Entropy 2 − P l o g 2 ⁡P Expected Entropy 1.75
−∑Q l o g 2 ⁡Q −∑ P l o g 2 ⁡P

H(P, Q)
Predicted Probability
HH
0.25
TT
0.25
HT
0.25
TH
0.25
Entropy
𝐻 ( 𝑃 )=− ∑ 𝑃 ( 𝑦 )  ln𝑃(𝑦)
Bit representation 00 01 10 11
No. of Bits 2 2 2 2
−l o g 2 ⁡Q
𝐻 ( 𝑃,𝑄 )=− ∑ 𝑃 ( 𝑦 )  ln𝑄(𝑦) Cross entropy
Entropy, 1 0.5 0. 25 0. 25
− P l o g 2 ⁡Q Expected Entropy 2
−∑ P l o g 2 ⁡Q
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln 𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

𝐿𝑖 =− ln ⁡
𝑒
𝑠𝑘 ∑𝑒 𝑠𝑗

𝑒
𝑠𝑘
∑𝑒 𝑠𝑗
𝐻 (𝑃 ,𝑄)
Cat 3.2 𝑗

Car 5.1 24.53 0.13 compare 1


exp 164.02 norm
− ∑ 𝑃 ( 𝑦 ) ln𝑄 ( 𝑦 )
Frog -1.7 0.869 0
0.18 0.001 0
unnormalize Q 𝑥 P
d probability Slide Credit : Stanford CS231n
Hinge Loss (SVM)
Softmax vs. SVM -2.85

0.86

-15 0.28
0.01 -0.05 0.1 0.05
0.0

0.7 0.2 0.05 0.16


22
+¿ 0.2
Cross Entropy Loss (SoftMax)
0.0 -0.45 -0.2 0.03 -44 -0.3
-2.85 0.058 0.016
𝑊 56 𝑏
0.86 2.36 0.631
𝑥
0.28 1.32 0.353
Slide Credit : Stanford CS231n
SVM vs. Softmax
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
𝑖 ∑ 𝑒𝑠 𝑗

𝑗 ≠ 𝑦𝑖 𝑗

Zero loss if the correct class score is Convert raw scores as probability
• largest among class score • Between 0 and 1
• clearly separable by a large margin • Scores of all classes sum up to 1
Non-zero loss if the correct class is Cross Entropy of the predicted
• not the largest probability and true probability
• not clearly separable distribution,
• negative log likelihood of true class
prediction
Slide Credit : Stanford CS231n
Recap
Dataset of (x,y)
A score function: regularization loss
A loss function:
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
Softmax: ∑ 𝑒 𝑠 𝑗
𝑓 (𝑥𝑖 , 𝑊 )
data loss
𝐿
𝑗

SVM: 𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 ) 𝑖
𝑗 ≠ 𝑦𝑖
𝑁
1
Full loss: 𝐿= ∑ 𝐿𝑖 +𝑅(𝑊 )
𝑁 𝑖=1 Slide Credit : Stanford CS231n

You might also like