Cross Entropy Loss Explained for Multiclass Classification

Discussion 5 – Loss Function
Cross Entropy Loss

Recap: SVM Loss Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class
Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧ if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖
𝑠 𝑗 − 𝑠 𝑦 +1∧ otherwise
𝑖
Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖
Square Hinge Loss

𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Recap: Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1
Data loss: Model predictions Regularization: Prevent the model

should match training data from doing too well on training data
Recap: Regularization
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln ⁡𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗
𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖
Softmax Function 𝐿𝑖 =− ln ⁡
𝑠𝑘 ∑ 𝑒
𝑠 𝑗
𝑒 𝑗
𝑠𝑘
Cat 3.2
𝑒 ∑𝑒 𝑠𝑗
negative
𝑗
Car 5.1 24.53 0.13 log-likelihood
Frog -1.7 exp 164.02 norm 0.869 𝐿𝑖 =− ln ⁡(0.13)=2.04
0.18 0.001
unnormalized normalized prob

probability (sum to 1) Slide Credit : Stanford CS231n
Entropy: Information Theory 𝑃( 𝑦)
𝐷 𝐾𝐿 ( 𝑃∨¿𝑄 )=∑ 𝑃 ( 𝑦 ) ln KL Divergence
Measure of Uncertainty 𝑄 (𝑦)
H(Q) HH TT HT TH H(P) HH TT HT TH
Probability 0.25 0.25 0.25 0.25 Probability 0.5 0.25 0.125 0.125
Bit representation 00 01 10 11 Bit representation 1 01 000 001
No. of Bits 2 2 2 2 No. of Bits 1 2 3 3
−l o g 2 ⁡Q −l o g 2 ⁡P
Entropy, 0.5 0.5 0.5 0.5 Entropy, 0.5 0.5 0.375 0.375
−Q l o g 2 ⁡Q Expected Entropy 2 − P l o g 2 ⁡P Expected Entropy 1.75
−∑Q l o g 2 ⁡Q −∑ P l o g 2 ⁡P
H(P, Q)
Predicted Probability
HH
0.25
TT
0.25
HT
0.25
TH
0.25
Entropy
𝐻 ( 𝑃 )=− ∑ 𝑃 ( 𝑦 ) ln𝑃(𝑦)
Bit representation 00 01 10 11
No. of Bits 2 2 2 2
−l o g 2 ⁡Q
𝐻 ( 𝑃,𝑄 )=− ∑ 𝑃 ( 𝑦 ) ln𝑄(𝑦) Cross entropy
Entropy, 1 0.5 0. 25 0. 25
− P l o g 2 ⁡Q Expected Entropy 2
−∑ P l o g 2 ⁡Q
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln 𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗
𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖
𝐿𝑖 =− ln ⁡
𝑒
𝑠𝑘 ∑𝑒 𝑠𝑗
𝑒
𝑠𝑘
∑𝑒 𝑠𝑗
𝐻 (𝑃 ,𝑄)
Cat 3.2 𝑗
Car 5.1 24.53 0.13 compare 1

exp 164.02 norm
− ∑ 𝑃 ( 𝑦 ) ln𝑄 ( 𝑦 )
Frog -1.7 0.869 0
0.18 0.001 0
unnormalize Q 𝑥 P
d probability Slide Credit : Stanford CS231n
Hinge Loss (SVM)
Softmax vs. SVM -2.85
0.86
-15 0.28
0.01 -0.05 0.1 0.05
0.0
0.7 0.2 0.05 0.16

22
+¿ 0.2
Cross Entropy Loss (SoftMax)
0.0 -0.45 -0.2 0.03 -44 -0.3
-2.85 0.058 0.016
𝑊 56 𝑏
0.86 2.36 0.631
𝑥
0.28 1.32 0.353
Slide Credit : Stanford CS231n
SVM vs. Softmax
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
𝑠𝑦
𝑒 𝑖
𝐿𝑖 =− ln
𝑖 ∑ 𝑒𝑠 𝑗
𝑗 ≠ 𝑦𝑖 𝑗
Zero loss if the correct class score is Convert raw scores as probability
• largest among class score • Between 0 and 1
• clearly separable by a large margin • Scores of all classes sum up to 1
Non-zero loss if the correct class is Cross Entropy of the predicted
• not the largest probability and true probability
• not clearly separable distribution,
• negative log likelihood of true class
prediction
Slide Credit : Stanford CS231n
Recap
Dataset of (x,y)
A score function: regularization loss
A loss function:
𝑠𝑦
𝑒 𝑖
𝐿𝑖 =− ln
Softmax: ∑ 𝑒 𝑠 𝑗
𝑓 (𝑥𝑖 , 𝑊 )
data loss
𝐿
𝑗
SVM: 𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 ) 𝑖
𝑗 ≠ 𝑦𝑖
𝑁
1
Full loss: 𝐿= ∑ 𝐿𝑖 +𝑅(𝑊 )
𝑁 𝑖=1 Slide Credit : Stanford CS231n

Cross Entropy Loss Explained for Multiclass Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cross Entropy Loss Explained for Multiclass Classification

Uploaded by

Copyright:

Available Formats

Discussion 5 – Loss Function

Cross Entropy Loss

Square Hinge Loss

Data loss: Model predictions Regularization: Prevent the model

unnormalized normalized prob

Car 5.1 24.53 0.13 compare 1

0.7 0.2 0.05 0.16

You might also like