Materi 06 - Naive Bayes 2021

Naive Bayes
Tim Pengajaran
Mata Kuliah Machine Learning
Jurusan Teknologi Informasi Tahun 2021
Disclaimer
▪ This presentation material, including examples, images, references
are provided for informational and explanation assistance only
▪ The names of actual products and companies mentioned here in, if

any, may be the trademarks of their respective owners
▪ Credits shall be given to the images taken from the open-source

and cannot be used for promotional activities
Machine Learning 2021 - Materi 06 - Naive Bayes

Outline Naive Bayes
• Bayes’ Theorem
• Generative and Discriminative Models
• Naive Bayes Computation

Recalling Classifier
• A machine learning
model that is used to
discriminate different
objects based on
certain features.
• Naive Bayes Classifier.

What is Naive Bayes?
• A probabilistic machine learning model that’s used for classification task.

• Its essential is based on the Bayes theorem.

Bayes’ Theorem
• A formula for calculating the probability • 𝑨 and 𝑩 are events.
of an event using prior knowledge of • 𝑷 𝑨 is the probability of observing event 𝑨.
related conditions. • 𝑷 𝑩 is the probability of observing event 𝑩.
• 𝑷 𝑨 𝑩 is the conditional probability of
• The theorem was discovered by an observing 𝑨 given that 𝑩 was observed.
English statistician and minister named • In classification tasks, the goal is to map
Thomas Bayes in the 18th century. features of explanatory variables to a discrete
response variable.
𝑷 𝑨 𝑷𝑩𝑨 • Must find the most likely label, 𝑨, given the
𝑷 𝑨𝑩 =
𝑷 𝑩 features, 𝑩.

Discriminative and Generative Model
• Discriminative models learns a decision
boundary that is used to discriminate
between classes.
• They predict P(y|x), the probability of y
given x, calculating the P(x,y), the
probability of x and y.
• Generative models model the joint

probability distribution of the features
and the classes, P(x, y).
• A discriminative model does not care how
the data is generated. Here we just care
about P(y|x)

Discriminative vs. Generative Model
• Imagine we are trying to classify dogs and cats
based on the animal weight and height
• using a generative model.
– We will have to compute the following
probabilities for each data point:
• P(cat,weight)
• P(cat,height)
• P(dog,weight)
• P(dog,height)
– IF we have 1,000 data points to train our model.
This means that at least we will need to
compute 4,000 probabilities.

• Imagine we are trying to classify dogs and cats
based on the animal weight and height
• Using a Discriminative model.
– we just need to compute P(y|x) for each data
point.
– calculate 2,000 probabilities if the data set has
1,000 data points


Generative Model
• Equivalent to modelling the probabilities of the classes and the
probabilities of the features given the classes.
• Models model how the classes generate features or new
examples of the data with intermediate steps but can be more
biased.
• More robust to noisy training data and may perform better when
training data is scarce(difficult to get data or the data is small as compared to the
amount needed).
• Intermediate step introduces more assumptions to the model.

When these assumptions. The disadvantage is that these
assumptions can prevent generative models from learning

Naive Bayes Computation
• Rewrite Bayes' theorem for a classification task
𝑃 𝑥1 , … , 𝑥𝑛 𝑦 𝑃 𝑦
𝑃 𝑦 𝑥1 , … , 𝑥𝑛 =
𝑃 𝑥1 , … , 𝑥𝑛
• 𝑦 is the positive class, 𝑥1 is the first feature for the instance, and 𝑛 is the number of features.
• 𝑃 𝑥1 , … , 𝑥𝑛 is constant for all inputs, so we can omit it; the probability of observing a
particular feature in the training set does not vary for different test instances.
• This leaves two terms: the prior class probability, 𝑃 𝑦 , and the conditional probability,
𝑃 𝑥1 , … , 𝑥𝑛 𝑦 . Naive Bayes estimates these terms using maximum a posteriori estimation
(MAP).
• 𝑃 𝑦 is simply the frequency of each class in the training set.

Naive Bayes Computation
• Performing maximum a posteriori estimation (MAP) in Naive Bayes
𝑃 𝑥1 , … , 𝑥𝑛 𝑦 𝑃 𝑦
𝑃 𝑦 𝑥1 , … , 𝑥𝑛 =
𝑃 𝑥1 , … , 𝑥𝑛
𝑃 𝑦 𝑥1 , … , 𝑥𝑛 ∝ 𝑃 𝑦 𝑃 𝑥1 𝑦 𝑃 𝑥2 𝑦 … . 𝑃 𝑥𝑛 𝑦
𝑛
𝑃 𝑦 𝑥1 , … , 𝑥𝑛 ∝ 𝑃 𝑦 ෑ 𝑃 𝑥𝑖 𝑦
𝑖=1
• The predicted class is given by:

𝑛
𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑃 𝑦 ෑ 𝑃 𝑥𝑖 𝑦
𝑖=1

Assumptions of NB
• The features are conditionally independent given the response
variable.
• Training instances are independent and identically distributed

(i.i.d), this means that training instances are independent from
each other and are drawn from the same probability
distribution.

Naive Bayes Types
• Multinomial Naive Bayes – Mostly used for document classification
problem, i.e whether a document belongs to the category of sports,
politics, technology etc. The features/predictors used by the classifier are
the frequency of the words present in the document.
• Bernoulli Naive Bayes – Similar to the multinomial one but the predictors
are Boolean variables. The parameters that is used to predict the class
variable take up only values yes or no, for example if a word occurs in the
text or not.
• Gaussian Naive Bayes – When the predictors take up a continuous value

and are not discrete, assumed that these values are sampled from a
Gaussian distribution.

Let’s Watch a Film
https://youtu.be/O2L2Uv9pdDA

EXAMPLE
• Diketahui hasil survey yang dilakukan sebuah lembaga kesehatan menyatakan
bahwa 30% penduduk di dunia menderita sakit paru-paru. Dari 90% penduduk
yang sakit paru-paru ini 60% adalah perokok, dan dari penduduk yang tidak
menderita sakit paru-paru 20% perokok.
• Fakta ini bisa didefinisikan dengan: X=sakit paru-paru dan Y=perokok.
• Maka :
– P(X) = 0.9
– P(~X) = 0.1
– P(Y|X) = 0.6 → P(~Y|X) = 0.4
– P(Y|~X) = 0.2 → P(~Y|~X) = 0.8

EXAMPLE
• Dengan metode bayes dapat dihitung:
– P({Y}|X) = P(Y|X).P(X) = (0.6) . (0.9) = 0.54
– P({Y}|~X) = P(Y|~X) P(~X) = (0.2).(0.1) = 0.02
• Bila diketahui seseorang merokok, maka dia menderita
sakit paru-paru karena P({Y}|X) lebih besar dari P({Y}|~X)
P(AB)
P ( B| A ) =
P(A)

EXAMPLE
• Asumsi:
# Cuaca Temperatur Kecepatan Angin Berolah-raga – Y = berolahraga,
1 Cerah Normal Pelan Ya – X1 = cuaca,
2 Cerah Normal Pelan Ya – X2 = temperatur,
3 Hujan Tinggi Pelan Tidak
4 Cerah Normal Kencang Ya
– X3 = kecepatan angin
5 Hujan Tinggi Kencang Tidak • Berdasar Data
6 Cerah Normal Pelan Ya
– P(Y=ya) = 4/6 → P(Y=tidak) = 2/6
– P(X1=cerah|Y=ya) = 4/4=1,
P(X1=cerah|Y=tidak) = 0
– P(X3=kencang|Y=ya) = 1/4 ,
P(X3=kencang|Y=tidak) = 1/2

EXAMPLE
• Apakah bila cuaca cerah dan kecepatan angin
kencang, Apakah orang akan berolahraga?
# Cuaca Temperatur Kecepatan Angin Berolah-raga • Propabilitas terhadap ya
1 Cerah Normal Pelan Ya – P( X1=cerah,X3=kencang | Y=ya )
2 Cerah Normal Pelan Ya
3 Hujan Tinggi Pelan Tidak
– {P(X1=cerah|Y=ya).P(X3=kencang|Y=ya) } .
4 Cerah Normal Kencang Ya
P(Y=ya)
5 Hujan Tinggi Kencang Tidak – { (1) . (1/4) } . (4/6) = 1/6
6 Cerah Normal Pelan Ya • Propabilitas terhadap tidak
– P( X1=cerah,X3=kencang | Y=tidak )
– {P(X1=cerah|Y=tidak).P(X3=kencang|Y=tida
k) } . P(Y=tidak)
𝑛
– { (0) . (1/2) } . (2/6) = 0
𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑃 𝑦 ෑ 𝑃 𝑥𝑖 𝑦
𝑖=1
Gaussian Naive Bayes
• Naive bayes classifier juga dapat menangani atribut
bertipe kontinyu.
• Salah satu caranya adalah menggunakan distribusi
Gaussian.
• Distribusi ini dikarakterisasi dengan dua parameter yaitu
mean (μ), dan variansi(σ2 ).
• Untuk setiap kelas Yj , peluang kelas bersyarat untuk
atribut Xi dinyatakan dengan persamaan distribusi
Gaussian
• Fungsi densitas mengekspresikan probabilitas relatif.
• Data dengan mean μ dan standar deviasi σ, fungsi densitas
probabilitasnya adalah :
• μ dan σ dapat diestimasi dari data, untuk setiap kelas.
• untuk menghitung Likelihood P(X | C)

• Probabilitas kemunculan setiap nilai
untuk atribut Harga Tanah (C1)

untuk atribut Jarak dari Pusat Kota (C2)

untuk atribut Angkutan Umum (C3)

untuk atribut Dipilih untuk Perumahan
(C4)

• Apabila diberikan C1 = 300, C2 = 17, C3 = Tidak, maka:
• P(C3=tidak|C4=ya) = 4/5 , P(C3=tidak|C4=tidak) = 2/5

• Probabilitas (likehood) terhadap ya
– P( C1=300,C2=17,C3=tidak| Y=ya )
– {P(C1=300|Y=ya).P(C2=17 |Y=ya). P(C3=tidak |Y=ya) } . P(Y=ya)
– 0,0021*0,0009*4/5*1/2
– 0,000000756.
• Probabilitas (likehood) terhadap tidak
– P( C1=300,C2=17,C3=tidak| Y=tidak )
– {P(C1=300|Y=tidak).P(C2=17 |Y=tidak). P(C3=tidak |Y=tidak) } .
P(Y=tidak)
– 0,0013*0,0633*2/5*1/2
𝑛
– 0,000016458. 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑦 ෑ 𝑃 𝑥 𝑦
𝑦 𝑖
𝑖=1
• Nilai probabilitas dapat dihitung dengan melakukan
normalisasi terhadap likelihood tersebut sehingga jumlah
nilai yang diperoleh = 1

Exercise
• How to predict “harga tanah MAHAL, jarak
dari pusat kota SEDANG, dan ADA angkutan
umum”?
• C1
• P(c1=murah|c4=ya)=2/5
• P(c1=murah|c4=tidak)=1/5
• P(c1=sedang|c4=ya)=2/5
• P(c1=sedang|c4=tidak)=1/5
• P(c1=mahal|c4=ya)=1/5
• P(c1=mahal|c4=tidak)=3/5

Exercise
• Probabilitas kemunculan setiap
nilai untuk atribut Jarak dari
Pusat Kota (C2)

Exercise
nilai untuk atribut Jarak dari
Pusat Kota (C3)

Exercise
nilai untuk atribut Dipilih untuk
perumahan (C4)

Exercise
• predict “harga tanah MAHAL, jarak dari pusat kota
SEDANG, dan ADA angkutan umum”?
• Probabilitas (likehood) terhadap ya
YA = P(Ya|Tanah=MAHAL) . P(Ya|Jarak=SEDANG) .
P(Ya|Angkutan=ADA) . P(Ya)
= 1/5 x 2/5 x 1/5 x 5/10 = 2/125 = 0,008
• Probabilitas (likehood) terhadap tidak
TIDAK = P(Tidak| Tanah=MAHAL) . P(Tidak|Jarak=SEDANG) .
P(Tidak|Angkutan=ADA) . P(Tidak)
= 3/5 x 1/5 x 3/5 x 5/10 = 2/125 = 0,036
𝑛
Machine Learning 2021 - Materi 06𝑦- = 𝑎𝑟𝑔𝑚𝑎𝑥
Naive Bayes 𝑦 𝑃 𝑦 ෑ 𝑃 𝑥𝑖 𝑦
𝑖=1
(Sunny, Hot, Normal, False)
(Sunny, Hot, Normal, False)
homework
• Classify whether the day is suitable for playing golf, given
the features of the day.
• The columns represent these features and the rows

represent individual entries.
• Example: If we take the first row of the dataset, we can

observe that is not suitable for playing golf if the outlook is
rainy, temperature is hot, humidity is high and it is not
windy.
• How to predict “Play Golf or Not knowing Information

given by the Features”?
• (today = Sunny, Hot, Normal, False)?

Homework
1. Create Python codes to make
prediction to the Case on the Left.
2. Write the Manual Computation in

Spreadsheet application to compare
the results with the Automatic
Compution in point 1.


Materi 06 - Naive Bayes 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Materi 06 - Naive Bayes 2021

Uploaded by

Copyright:

Available Formats

Naive Bayes

▪ The names of actual products and companies mentioned here in, if

▪ Credits shall be given to the images taken from the open-source

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• Naive Bayes Classifier.

Machine Learning 2021 - Materi 06 - Naive Bayes

• A probabilistic machine learning model that’s used for classification task.

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• Generative models model the joint

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• Intermediate step introduces more assumptions to the model.

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• The predicted class is given by:

Machine Learning 2021 - Materi 06 - Naive Bayes

• Training instances are independent and identically distributed

Machine Learning 2021 - Materi 06 - Naive Bayes

• Gaussian Naive Bayes – When the predictors take up a continuous value

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• μ dan σ dapat diestimasi dari data, untuk setiap kelas.

• untuk menghitung Likelihood P(X | C)

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• P(C3=tidak|C4=ya) = 4/5 , P(C3=tidak|C4=tidak) = 2/5

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

Machine Learning 2021 - Materi 06 - Naive Bayes

• The columns represent these features and the rows

• Example: If we take the first row of the dataset, we can

• How to predict “Play Golf or Not knowing Information

Machine Learning 2021 - Materi 06 - Naive Bayes

2. Write the Manual Computation in

Machine Learning 2021 - Materi 06 - Naive Bayes

You might also like