Classification: Probabilistic Generative Model

Classification:
Probabilistic Generative
Model
Slide credits
Most of these slides were adapted from:
• Hung-yi Lee (National Taiwan University)
• Kris Kitani (Carnegie Mellon University).
• Noah Snavely (Cornell University).
• Fei-Fei Li (Stanford University).

Classification 𝑥 Function Class n
• Credit Scoring
• Input: income, savings, profession, age, past financial
history ……
• Output: accept or refuse
• Medical Diagnosis
• Input: current symptoms, age, gender, past medical
history ……
• Output: which kind of diseases
• Handwritten character recognition
output:
Input
金
• Face recognition :
• Input: image of a face, output: person
Example Application
𝑓 ( )=¿ 𝑓 ( )=¿ 𝑓 ( )=¿

pokemon games
(NOT pokemon cards
or Pokemon Go)
Example Application
• HP: hit points, or health, defines how much damage a
pokemon can withstand before fainting 35
• Attack: the base modifier for normal attacks (eg. Scratch,
Punch) 55
• Defense: the base damage resistance against normal attacks 40
• SP Atk: special attack, the base modifier for special attacks
(e.g. fire blast, bubble beam) 50
• SP Def: the base damage resistance against special attacks 50
• Speed: determines which pokemon attacks first each round 90
Can we predict the “type” of pokemon based

on the information?
How to do Classification
• Training data for Classification
( 𝑥1 , ^
𝑦1) ( 𝑥2 , ^
𝑦 )
2
…… ( 𝑥 𝑁 , ^𝑦 𝑁 )
Classification as Regression?
Binary classification as example
Training: Class 1 means the target is 1; Class 2 means the
target is -1
Testing: closer to 1 class 1; closer to -1 class 2
b + w 1 x1 + w 2 x2 = 0 to decrease error
Class 2 Class 2
-1 -1
1 1
x2 x2
Class 1 Class 1 >>1
y = b + w 1 x1 + w 2 x2 error
x1 x1
Penalize to the examples that are “too correct” … (Bishop, P186)
• Multiple class: Class 1 means the target is 1; Class 2 means

the target is 2; Class 3 means the target is 3 …… problematic
Ideal Alternatives
• Function (Model): 𝑓 (𝑥 )
𝑥
𝑔 ( 𝑥 )> 0 Output = class 1
𝑒𝑙𝑠𝑒 Output = class 2
• Loss function:
𝐿 ( 𝑓 ) =∑ 𝛿 ( 𝑓 ( 𝑥 ) ≠ ^𝑦 )
𝑛 𝑛 The number of times f
get incorrect results on
𝑛 training data.
• Find the best function:

• Example: Perceptron, SVM Not Today
Two Boxes
Box 1 Box 2
P(B1) = 2/3 P(B2) = 1/3
P(Blue|B1) = 4/5 P(Blue|B1) = 2/5

P(Green|B1) = 1/5 P(Green|B1) = 3/5
from one of the boxes
Where does it come from?

𝑃 ( Blue∨𝐵1 ) 𝑃 ( 𝐵1 )
P(B1 | Blue) ¿
𝑃 ( Blue∨ 𝐵1 ) 𝑃 ( 𝐵1 ) + 𝑃 ( Blue∨𝐵2 ) 𝑃 ( 𝐵 2)
Estimating the Probabilities
Two Classes From training data
Class 1 Class 2
P(C1) P(C2)
P(x|C1) P(x|C2)
Given an x, which class does it belong to

𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶1 )
𝑃 ( 𝐶 1∨𝑥 ) ¿
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶1 ) + 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
Generative Model 𝑃 ( 𝑥 ) ¿ 𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )+ 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )

Prior
Class 1 Class 2
P(C1) P(C2)
Water Normal
Water and Normal type with ID < 400 for training,
rest for testing
Training: 79 Water, 61 Normal
P(C1) = 79 / (79 + 61) =0.56
P(C2) = 61 / (79 + 61) =0.44
Probability from Class
P(x|C1) = ? P( |Water) = ?
Each Pokémon is represented as

a vector by its attribute. feature
79 in total
Water
Type
Probability from Class - Feature
• Considering Defense and SP Defense
[ ]
48
50
…… ,
[ ]
65
64
P(x|Water)=? 0
?
Assume the points
Water are sampled from a [ ]
103
45
Type Gaussian
https://blog.slinuxer.com/tag/pca
Gaussian Distribution
𝑓 𝜇 , Σ ( 𝑥 )=
1
(2 𝜋 ) 𝐷 /2
1
|Σ|
1/ 2 {
𝑒𝑥𝑝 −
1
2
( 𝑥 −𝜇 ) 𝑇
Σ
−1
( 𝑥 −𝜇 ) }
Input: vector x, output: probability of sampling x
The shape of the function determines by mean and
covariance matrix
𝜇= 0
0 [] 2 0
∑ 0 2
¿
[ ] 𝜇= 2
3[] 2 0
∑¿ 0 2 [ ]
𝑥2
𝑥2
𝑥1
𝑥1
https://blog.slinuxer.com/tag/pca
Gaussian Distribution
𝑓 𝜇 , Σ ( 𝑥 )=
1
(2 𝜋 ) 𝐷 /2
1
|Σ|
1/ 2 {
𝑒𝑥𝑝 −
1
2
( 𝑥 −𝜇 ) 𝑇
Σ
−1
( 𝑥 −𝜇 ) }
Input: vector x, output: probability of sampling x
The shape of the function determines by mean and
covariance matrix
𝜇= 0
0[] ∑¿ [ ]
2 0
0 6
𝜇= 0
0[] ∑¿ [ 2 −1
−1 2 ]
𝑥2
𝑥2
𝑥1 𝑥1
Probability from Class
Assume the points are sampled from a Gaussian distribution
Find the Gaussian distribution behind them Probability for
new points
…… ,
New x
[ ]
𝜇= 75.0
71.3
Σ=
[ 327 929 ]
8 74 327
How to find them?
Water 𝑓 𝜇, Σ ( 𝑥 )=
1
(2 𝜋 ) 𝐷 /2
1
|Σ|
1/ 2 {
𝑒𝑥𝑝 −
1
2
( 𝑥 −𝜇 ) 𝑇
Σ −1
( 𝑥 −𝜇 ) }
Type
Maximum Likelihood 𝑓 𝜇, Σ ( 𝑥 )=
1
𝐷 /2
( 2 𝜋 ) |Σ|
1
1/ 2
𝑒𝑥𝑝 {
−
1
2
( 𝑥 −𝜇 ) 𝑇
Σ
−1
( 𝑥 −𝜇 ) }
[ ]
5 0 …… ,
∑¿ 0 5 𝜇= 150
140 [ ]
[ ]
𝜇= 75
75
∑¿ −1 2[
2 −1
]
The Gaussian with any mean and covariance matrix can
generate these points. Different Likelihood
Likelihood of a Gaussian with mean and covariance matrix
= the probability of the Gaussian samples …… ,
𝐿 ( 𝜇 , Σ ) ¿ 𝑓 𝜇, Σ 𝑥 𝑓 𝜇 , Σ 𝑥 𝑓 𝜇, Σ 𝑥 … … 𝑓 𝜇, Σ 𝑥 )
( 1
) ( 2
) ( 3
) ( 79
Maximum Likelihood
We have the “Water” type Pokémons: …… ,
We assume …… , generate from the Gaussian () with the
maximum likelihood
𝐿 ( 𝜇 , Σ ) ¿ 𝑓 𝜇, Σ ( 𝑥 1) 𝑓 𝜇 , Σ ( 𝑥 2 ) 𝑓 𝜇, Σ ( 𝑥 3 ) … … 𝑓 𝜇, Σ ( 𝑥79 )
𝑓 𝜇, Σ ( 𝑥 )=
1
(2 𝜋 ) 𝐷 /2
1
|Σ|
1/ 2 {
𝑒𝑥𝑝 −
1
2
( 𝑥 −𝜇 ) 𝑇
Σ −1
( 𝑥 −𝜇 ) }
𝜇∗ , Σ ∗ =𝑎𝑟𝑔 max 𝐿 ( 𝜇 , Σ )
𝜇,Σ
79 79
1 1
𝜇 = ∑𝑥
∗ 𝑛 ∗ 𝑛 ∗ 𝑛 ∗ 𝑇
Σ = ∑ ( 𝑥 −𝜇 )( 𝑥 − 𝜇 )
79 𝑛=1 average 79 𝑛=1
Maximum Likelihood
Class 1: Water Class 2: Normal
[ ] [
𝜇 = 75.0 Σ 1= 8 74
1
71.3 327
327
929 ] 2
[
59.8 ]
𝜇 = 55.6 Σ 2= 8 47
422 [ 422
685 ]
Now we can do classification 
𝑓𝜇 1
,Σ
1 ( 𝑥)=
1
(2 𝜋 ) 𝐷/ 2
1
|Σ 1|
1 /2
𝑒𝑥𝑝 − {
1
2
( 𝑥 − 𝜇1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇1 ) } P(C1)
= 79 / (79 + 61) =0.56
𝜇 = 75.0 Σ 1= 8 74
1
71.3 327 [ ] [ 327

929 ]
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶1 )
𝑃 ( 𝐶 1∨𝑥 ) ¿
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶1 ) + 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
𝑓𝜇
2
,Σ
2 ( 𝑥 )=
1 1
( 2 𝜋 ) 𝐷 /2 |Σ 2|1/ 2
𝑒𝑥𝑝 −{1
2
( 𝑥 − 𝜇 2 𝑇
) ( Σ 2 −1
) ( 𝑥 − 𝜇2
) } P(C2)
= 61 / (79 + 61)
=0.44
𝜇 = 55.6
2
59.8 [ ] 2
Σ = [ 8 47
422
422
685 ]
If x belongs to class 1 (Water)
𝑃 ( 𝐶 1∨𝑥 )
𝑃 ( 𝐶1∨𝑥)< 0.5
Blue points: C1 (Water), Red points: C2 (Normal)
How’s the results?
Testing data: 47% accuracy 
All: hp, att, sp att,
de, sp de, speed (6 features)
6-dim vector
6 x 6 matrices
64% accuracy … 𝑃 ( 𝐶1∨𝑥)< 0.5
Modifying Model
Class 1: Water Class 2: Normal
[ ] [
𝜇 = 75.0 Σ 1= 8 74
1
71.3 327
327
929 ] 2
[
59.8 ] 422[
𝜇 = 55.6 Σ 2= 8 47 422
685 ]
The same
Less parameters
Ref: Bishop,
Modifying Model chapter 4.2.2
• Maximum likelihood
“Water” type Pokémons: “Normal” type
…… , Pokémons: …… ,
1 2
𝜇 𝜇
Find , , maximizing the likelihood
𝐿 (𝜇 , 𝜇 , Σ )= 𝑓 𝜇 , Σ ( 𝑥 ) 𝑓 𝜇 , Σ ( 𝑥 ) ⋯ 𝑓 𝜇 , Σ ( 𝑥 )
1 2 1 2 79
1 1 1
× 𝑓 𝜇 , Σ ( 𝑥 80 ) 𝑓 𝜇 , Σ ( 𝑥 81 ) ⋯ 𝑓 𝜇 , Σ ( 𝑥 140 )
2 2 2
and is the same as before

Modifying Model
The boundary is linear
The same covariance matrix

All: hp, att, sp att, de, sp de, speed
54% accuracy 73% accuracy
Three Steps
• Function Set (Model):
𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )
𝑃 ( 𝐶 1∨𝑥 ) =
𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )+ 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
𝑥
If , output: class 1
Otherwise, output: class 2
• Goodness of a function:
• The mean and covariance that maximizing the
likelihood (the probability of generating data)
• Find the best function: easy
Probability Distribution
• You can always use the distribution you like 
𝑃 ( 𝑥∨𝐶 1) ¿ 𝑃 ( 𝑥 1∨𝐶 1 ) 𝑃 ( 𝑥 2∨𝐶1 ) …… 𝑃 ( 𝑥 𝑘∨𝐶1 ) ……
[]
𝑥1
𝑥2 1-D Gaussian
⋮
𝑥𝑘 For binary features, you may assume
⋮ they are from Bernoulli distributions.
𝑥𝐾
If you assume all the dimensions are independent,

then you are using Naive Bayes Classifier.
Posterior Probability
𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )
𝑃 ( 𝐶 1∨𝑥 ) =
𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )+ 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
¿
1 1
𝑃 ( 𝑥∨ 𝐶 2 ) 𝑃 ( 𝐶 2 )
¿ ¿𝜎 (𝑧 )
1+ 1+𝑒𝑥𝑝 ( − 𝑧 )
𝑃 ( 𝑥∨ 𝐶 1 ) 𝑃 ( 𝐶 1 ) Sigmoid
function
 z 
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶 1 )
𝑧 =𝑙𝑛
𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
z
Warning of Math
Posterior Probability
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 ( 𝐶 1 )
𝑃 ( 𝐶 1∨𝑥 ) ¿ 𝜎 ( 𝑧 ) sigmoid 𝑧 =𝑙𝑛
𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )
𝑁1
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 (𝐶1 ) 𝑁1+ 𝑁 2 𝑁1
𝑧 =𝑙𝑛 +𝑙𝑛 ¿
𝑃 ( 𝑥∨𝐶 2 ) 𝑃 (𝐶2 ) 𝑁2 𝑁2
𝑁1+ 𝑁 2
𝑃 ( 𝑥∨ 𝐶 1) =
1
(2 𝜋 ) 𝐷/ 2
1
1 1 /2
|Σ |
𝑒𝑥𝑝 {
−
1
2
( 𝑥 − 𝜇1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇1
) }
𝑃 ( 𝑥∨ 𝐶 2 )=
1
(2 𝜋) 𝐷 /2
1
2 1/ 2
|Σ |
𝑒𝑥𝑝 −{1
2
( 𝑥 −𝜇
2 𝑇
) ( Σ
2 −1
) ( 𝑥 − 𝜇
2
) }
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 (𝐶1 ) 𝑁1
𝑃 ( 𝑥∨𝐶 2 ) 𝑃 (𝐶2 ) 𝑁 2
𝑃 ( 𝑥∨ 𝐶 1) =
1
(2 𝜋 ) 𝐷/ 2
1
1 1 /2
|Σ |
𝑒𝑥𝑝 −
1
2 {
( 𝑥 − 𝜇1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇1
) }
𝑒𝑥𝑝 {− ( 𝑥 −𝜇 ) ( 𝑥 −𝜇 )}
1 1 1 2 𝑇 −1
𝑃 ( 𝑥∨ 𝐶 2 )= ( Σ2 ) 2
( 2 𝜋 ) 𝐷 /2 |Σ 2|1/ 2 2
1
(2 𝜋 ) 𝐷/2
1
1 1/ 2
|Σ |
𝑒𝑥𝑝 −
1
2 {
( 𝑥 − 𝜇1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇1
) }
𝑙𝑛
𝑒𝑥𝑝 {− ( 𝑥 − 𝜇 ) ( 𝑥 − 𝜇 )}
1 1 1 2 𝑇 −1
( Σ2 ) 2
( 2 𝜋 )𝐷 /2 |Σ 2|1/ 2 2
2 1 /2
|Σ |
¿ 𝑙𝑛
1 1 /2
|Σ |
𝑒𝑥𝑝 {−
1
2
[ 𝑇 −1 𝑇 −1
( 𝑥 − 𝜇1 ) ( Σ 1 ) ( 𝑥 − 𝜇1 ) − ( 𝑥 − 𝜇2 ) ( Σ 2 ) ( 𝑥 −
1 /2
| Σ 2| 1
)]
¿ 𝑙𝑛
1 1 /2
−
2
[ ( 𝑥 − 𝜇 1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇 1
) − ( 𝑥 − 𝜇 2 𝑇
) ( Σ 2 −1
) ( 𝑥 − 𝜇 2
|Σ |
𝑃 ( 𝑥∨𝐶 1 ) 𝑃 (𝐶1 ) 𝑁1
𝑃 ( 𝑥∨𝐶 2 ) 𝑃 (𝐶2 ) 𝑁 2
1 /2
| Σ 2| 1
¿ 𝑙𝑛 − [ ( 𝑥 − 𝜇 1 𝑇
) ( Σ 1 −1
) ( 𝑥 − 𝜇 1
) − ( 𝑥 − 𝜇 2 𝑇
) ( Σ 2 −1
) ( 𝑥 − 𝜇 2
)]
|Σ |1 1 /2 2
( 𝑥 − 𝜇 1 )𝑇 ( Σ 1 )−1 ( 𝑥 − 𝜇 1 )
−1 𝑇 −1 𝑇 −1
¿ 𝑥 (Σ ) 𝑥 − 2( 𝜇 ) ( Σ ) 𝑥+ ( 𝜇 ) ( Σ )
𝑇 1 1 1 1 1 1
𝜇
( 𝑥 − 𝜇 2 )𝑇 ( Σ 2 )− 1 ( 𝑥 − 𝜇2 )
−1 𝑇 −1 𝑇 −1
¿ 𝑥 (Σ ) 𝑥 − 2 (𝜇 ) ( Σ ) 𝑥 +( 𝜇 ) ( Σ )
𝑇 2 2 2 2 2 2
𝜇
1 /2
|Σ 2| 1 𝑇 1 −1 1 𝑇 1 −1 1 1 𝑇 1 −1 1
𝑧 =𝑙𝑛
1 1 /2
(
− 𝑥 Σ ) (
𝑥+ 𝜇 ) ( Σ ) 𝑥 − (𝜇 ) ( Σ ) 𝜇
|Σ | 2 2
+1 𝑇 2 − 1 2 𝑇 2 −1 1 2 𝑇 2 −1 2 𝑁1
(
𝑥 Σ ) (
𝑥− 𝜇 ) ( Σ ) 𝑥+ 𝜇( ) ( Σ ) 𝜇 +𝑙𝑛
2 2 𝑁2
End of Warning
𝑃 ( 𝐶 1∨𝑥 ) ¿ 𝜎 ( 𝑧 )
2 1 /2
|Σ | 1 𝑇 1 −1 1 𝑇 1 −1 1 1 𝑇 1 −1 1
𝑧 =𝑙𝑛
1 1 /2
− 𝑥 ( Σ ) 𝑥 +( 𝜇 ) ( Σ ) 𝑥 − ( 𝜇 ) ( Σ ) 𝜇
|Σ | 2 2
+1 𝑇 2 − 1 2 𝑇 2 −1 1 2 𝑇 2 −1 2 𝑁1
𝑥 ( Σ ) 𝑥 − ( 𝜇 ) ( Σ ) 𝑥 + ( 𝜇 ) ( Σ ) 𝜇 +𝑙𝑛
2 2 𝑁2
Σ 1=Σ 2 =Σ
1 1
𝑇 𝑇 1 2 𝑇 𝑁1
𝑧 =( 𝜇 −𝜇 ) Σ 𝑥 − ( 𝜇 ) Σ 𝜇 + ( 𝜇 ) Σ 𝜇 +𝑙𝑛
1 2 −1 −1 1 −1 2
2 2 𝑁2
𝒘𝑻 b
𝑃 ( 𝐶 1∨ 𝑥 ) =𝜎 ( 𝑤 ∙ 𝑥+𝑏 ) How about directly find w and b?
In generative model, we estimate , , ,

Then we have w and b
Reference
• Bishop: Chapter 4.1 – 4.2
• Data: https://www.kaggle.com/abcsds/pokemon
• Useful posts:
• https://www.kaggle.com/nishantbhadauria/d/abcsds/
pokemon/pokemon-speed-attack-hp-defense-analysis-
by-type
• https://www.kaggle.com/nikos90/d/abcsds/pokemon/
mastering-pokebars/discussion
• https://www.kaggle.com/ndrewgele/d/abcsds/
pokemon/visualizing-pok-mon-stats-with-seaborn/
discussion

Classification: Probabilistic Generative Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification: Probabilistic Generative Model

Uploaded by

Copyright:

Available Formats

Classification:

• Hung-yi Lee (National Taiwan University)

• Kris Kitani (Carnegie Mellon University).

• Noah Snavely (Cornell University).

• Fei-Fei Li (Stanford University).

𝑓 ( )=¿ 𝑓 ( )=¿ 𝑓 ( )=¿

Can we predict the “type” of pokemon based

Penalize to the examples that are “too correct” … (Bishop, P186)

• Multiple class: Class 1 means the target is 1; Class 2 means

• Find the best function:

P(B1) = 2/3 P(B2) = 1/3

P(Blue|B1) = 4/5 P(Blue|B1) = 2/5

from one of the boxes

Where does it come from?

Given an x, which class does it belong to

Generative Model 𝑃 ( 𝑥 ) ¿ 𝑃 ( 𝑥∨𝐶1 ) 𝑃 ( 𝐶 1 )+ 𝑃 ( 𝑥∨𝐶 2 ) 𝑃 ( 𝐶 2 )

Each Pokémon is represented as

How to find them?

Class 1: Water Class 2: Normal

71.3 327 [ ] [ 327

Class 1: Water Class 2: Normal

and is the same as before

The same covariance matrix

𝑃 ( 𝑥∨𝐶 1) ¿ 𝑃 ( 𝑥 1∨𝐶 1 ) 𝑃 ( 𝑥 2∨𝐶1 ) …… 𝑃 ( 𝑥 𝑘∨𝐶1 ) ……

If you assume all the dimensions are independent,

In generative model, we estimate , , ,

You might also like