Department of Computing

Department of Computing
CS471: Machine Learning (Spring 2021)

Class: BESE-9AB
Lab 02: Naïve Bayes
CLO1: Understand machine learning algorithms, tools and techniques.
CLO3: Use modern tools to solve practical problems.
Date: March 5th, 2021
Time: 9:00AM – 12:00PM & 2:00PM – 5:00PM
Instructors: Dr Omar Arif
CS471: Machine Learning Page 1

Lab 02: Naïve Bayes
Introduction
In this lab you will implement and test Naïve Bayes Algorithm. You may find this helpful.
https://scikit-learn.org/stable/modules/naive_bayes.html
Objectives
Implement Naïve Bayes. Play

you
Tools/Software Requirement are
inter
Python 3, Scikit-learn, Pandas, Matplotlib, numpy este
d in
Task Description the
post
Task 1 Categorical Naïve Bayes erio
r
Let us suppose you are given the following dataset.
prob
abili
ty
you
are
inter
este
d in
the
post
weather erio Tem
r
prob
abili
ty
You are interested in the posterior probability p ( play ∨ weather , temperature ) . In order to
find the posterior probability, you need to know the likelihood and prior.
In categorical Naïve Bayes (studied in the class), the likelihood probability of an

attribute/feature to take a particular value given the class c is computed using
N tic + λ
p ( x i=t| y=c ) =
N c + λ ni
Where N tic is the number of times category t appears in the samples x i which belong to class
y=c. N c is the number of samples/examples with y=c . λ is the smoothing parameter and ni is
the number of available category of feature x i.

1. Use the above dataset to learn the Bayes net and fill in the following table.
Prior Likelihood
Play P ( play ) play Weather p ( weather ∨ play ) play Temp p ( temp ∨ play )
Yes 9/14 Sunny 2/9 Mild 4/9
No 5/14 yes overcast 4/9 yes Cool 3/9
Rainy 3/9 Hot 2/9
Sunny 3/5 Mild 2/5
No overcast 0 No Cool 1/5
Rainy 2/5 Hot 2/5
2. Now using the above prior and likelihood values, compute
2 2 9
p ( play ∨ weather =Sunny , temp=hot ) =

( 9 9 ) 14
⋅ ⋅
=¿
2 2 9 3 2 5
( 9 ⋅ 9 )⋅ 14 +( 5 ⋅ 5 )⋅ 14
\frac{\left(\frac{2}{9}\cdot\frac{2}{9}\right)\cdot\frac{9}{14}}{\left(\frac{2}
{9}\cdot\frac{2}{9}\right)\cdot\frac{9}{14}+\left(\frac{3}{5}\cdot\frac{2}
{5}\right)\cdot\frac{5}{14}}
\frac{\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
{\left(\frac{2+1}{9+3}\cdot\frac{2+1}{9+3}\right)\cdot\frac{9+3}
{14+6}+\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
3. compute the posterior p ( play ∨ weather =Sunny , temp=hot ) using Laplace smoothing
with λ=1
4. Now use Sci-kit CategoricalNB (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) to compute the posterior
p ( play ∨ weather =Sunny , temp=hot ) with λ={0,1 }. CategoricalNB assumes that
the input data is encoded such that all categories for each feature i are represented
with numbers 0 … ni −1 where ni is the number of available categories of feature i.
Therefore, preprocess the dataset using OrdinalEncoder (https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers

Task 2 Categorical Naïve Bayes on UCI dataset
1. Use pandas.read_csv to read the csv file corresponding to the UCI dataset selected in
the last lab.
2. Partition the dataset in to two part, i.e training and testing. You can set a division of
your dataset as 80% training and 20% testing. Use train_test_split for splitting the
data.
3. CategoricalNB assumes that the input data is encoded such that all categories for each
feature i are represented with numbers 0 … ni −1 where ni is the number of available
categories of feature i. Therefore preprocess the dataset using OrdinalEncoder
(https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers
4. Apply a Categorical Naïve Bayes Classifier (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) using scikit-learn on your dataset
a. To analyze the performance on train and test split, use
sklearn.metrics.accuracy_score
b. Apply lambda smoothing by varying the parameter “alpha” from 0 to 50. Plot
the result using Matplotlib.
c. What is the best alpha value for your dataset?
Task 3 Gaussian Naïve Bayes
If the feature values are not categorical but continuous, then Gaussian Naïve Bayes is used,
which assumes that the likelihood of a Gaussian:
1. Apply Gaussian Naïve Bayes

on breast cancer data set (sklearn.datasets.load_breast_cancer()). The dataset consists
of 30 features. Split the dataset in to train and test set and use GaussianNB to train a
classifier. Report train and test errors.
Deliverables
• Students are required to upload the lab task solution in ipynb format on LMS.

• The file name must contain your name and CMS ID in the following format. <Lab_your
CMS ID_your name.ipynb>

Department of Computing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Department of Computing

Uploaded by

Copyright:

Available Formats

Department of Computing

CS471: Machine Learning (Spring 2021)

Lab 02: Naïve Bayes

CLO1: Understand machine learning algorithms, tools and techniques.

CLO3: Use modern tools to solve practical problems.

Date: March 5th, 2021

Time: 9:00AM – 12:00PM & 2:00PM – 5:00PM

Instructors: Dr Omar Arif

CS471: Machine Learning Page 1

Implement Naïve Bayes. Play

In categorical Naïve Bayes (studied in the class), the likelihood probability of an

CS471: Machine Learning Page 2

Yes 9/14 Sunny 2/9 Mild 4/9

No 5/14 yes overcast 4/9 yes Cool 3/9

Rainy 3/9 Hot 2/9

Sunny 3/5 Mild 2/5

No overcast 0 No Cool 1/5

Rainy 2/5 Hot 2/5

2. Now using the above prior and likelihood values, compute

p ( play ∨ weather =Sunny , temp=hot ) =

CS471: Machine Learning Page 3

Task 3 Gaussian Naïve Bayes

1. Apply Gaussian Naïve Bayes

CS471: Machine Learning Page 4

CS471: Machine Learning Page 5

You might also like