Professional Documents
Culture Documents
Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧ if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖
𝑠 𝑗 − 𝑠 𝑦 +1∧ otherwise
𝑖
Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖
𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖
Softmax Function 𝐿𝑖 =− ln
𝑠𝑘 ∑ 𝑒
𝑠 𝑗
𝑒 𝑗
𝑠𝑘
Cat 3.2
𝑒 ∑𝑒 𝑠𝑗
negative
𝑗
Car 5.1 24.53 0.13 log-likelihood
Frog -1.7 exp 164.02 norm 0.869 𝐿𝑖 =− ln (0.13)=2.04
0.18 0.001
H(P, Q)
Predicted Probability
HH
0.25
TT
0.25
HT
0.25
TH
0.25
Entropy
𝐻 ( 𝑃 )=− ∑ 𝑃 ( 𝑦 ) ln𝑃(𝑦)
Bit representation 00 01 10 11
No. of Bits 2 2 2 2
−l o g 2 Q
𝐻 ( 𝑃,𝑄 )=− ∑ 𝑃 ( 𝑦 ) ln𝑄(𝑦) Cross entropy
Entropy, 1 0.5 0. 25 0. 25
− P l o g 2 Q Expected Entropy 2
−∑ P l o g 2 Q
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln 𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗
𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖
𝐿𝑖 =− ln
𝑒
𝑠𝑘 ∑𝑒 𝑠𝑗
𝑒
𝑠𝑘
∑𝑒 𝑠𝑗
𝐻 (𝑃 ,𝑄)
Cat 3.2 𝑗
0.86
-15 0.28
0.01 -0.05 0.1 0.05
0.0
𝐿𝑖 =− ln
𝑖 ∑ 𝑒𝑠 𝑗
𝑗 ≠ 𝑦𝑖 𝑗
Zero loss if the correct class score is Convert raw scores as probability
• largest among class score • Between 0 and 1
• clearly separable by a large margin • Scores of all classes sum up to 1
Non-zero loss if the correct class is Cross Entropy of the predicted
• not the largest probability and true probability
• not clearly separable distribution,
• negative log likelihood of true class
prediction
Slide Credit : Stanford CS231n
Recap
Dataset of (x,y)
A score function: regularization loss
A loss function:
𝑠𝑦
𝑒 𝑖
𝐿𝑖 =− ln
Softmax: ∑ 𝑒 𝑠 𝑗
𝑓 (𝑥𝑖 , 𝑊 )
data loss
𝐿
𝑗
SVM: 𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 ) 𝑖
𝑗 ≠ 𝑦𝑖
𝑁
1
Full loss: 𝐿= ∑ 𝐿𝑖 +𝑅(𝑊 )
𝑁 𝑖=1 Slide Credit : Stanford CS231n