You are on page 1of 5

Department of Computing

CS471: Machine Learning (Spring 2021)


Class: BESE-9AB

Lab 02: Naïve Bayes

CLO1: Understand machine learning algorithms, tools and techniques.

CLO3: Use modern tools to solve practical problems.

Date: March 5th, 2021

Time: 9:00AM – 12:00PM & 2:00PM – 5:00PM

Instructors: Dr Omar Arif

CS471: Machine Learning Page 1


Lab 02: Naïve Bayes

Introduction

In this lab you will implement and test Naïve Bayes Algorithm. You may find this helpful.

https://scikit-learn.org/stable/modules/naive_bayes.html

Objectives

Implement Naïve Bayes. Play


you
Tools/Software Requirement are
inter
Python 3, Scikit-learn, Pandas, Matplotlib, numpy este
d in
Task Description the
post
Task 1 Categorical Naïve Bayes erio
r
Let us suppose you are given the following dataset.
prob
abili
ty
you
are
inter
este
d in
the
post
weather erio Tem
r
prob
abili
ty
You are interested in the posterior probability p ( play ∨ weather , temperature ) . In order to
find the posterior probability, you need to know the likelihood and prior.

In categorical Naïve Bayes (studied in the class), the likelihood probability of an


attribute/feature to take a particular value given the class c is computed using

N tic + λ
p ( x i=t| y=c ) =
N c + λ ni

Where N tic is the number of times category t appears in the samples x i which belong to class
y=c. N c is the number of samples/examples with y=c . λ is the smoothing parameter and ni is
the number of available category of feature x i.

CS471: Machine Learning Page 2


1. Use the above dataset to learn the Bayes net and fill in the following table.

Prior Likelihood

Play P ( play ) play Weather p ( weather ∨ play ) play Temp p ( temp ∨ play )

Yes 9/14 Sunny 2/9 Mild 4/9

No 5/14 yes overcast 4/9 yes Cool 3/9

Rainy 3/9 Hot 2/9

Sunny 3/5 Mild 2/5

No overcast 0 No Cool 1/5

Rainy 2/5 Hot 2/5

2. Now using the above prior and likelihood values, compute

2 2 9

p ( play ∨ weather =Sunny , temp=hot ) =


( 9 9 ) 14
⋅ ⋅
=¿
2 2 9 3 2 5
( 9 ⋅ 9 )⋅ 14 +( 5 ⋅ 5 )⋅ 14
\frac{\left(\frac{2}{9}\cdot\frac{2}{9}\right)\cdot\frac{9}{14}}{\left(\frac{2}
{9}\cdot\frac{2}{9}\right)\cdot\frac{9}{14}+\left(\frac{3}{5}\cdot\frac{2}
{5}\right)\cdot\frac{5}{14}}

\frac{\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
{\left(\frac{2+1}{9+3}\cdot\frac{2+1}{9+3}\right)\cdot\frac{9+3}
{14+6}+\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
3. compute the posterior p ( play ∨ weather =Sunny , temp=hot ) using Laplace smoothing
with λ=1
4. Now use Sci-kit CategoricalNB (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) to compute the posterior
p ( play ∨ weather =Sunny , temp=hot ) with λ={0,1 }. CategoricalNB assumes that
the input data is encoded such that all categories for each feature i are represented
with numbers 0 … ni −1 where ni  is the number of available categories of feature i.
Therefore, preprocess the dataset using OrdinalEncoder (https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers

CS471: Machine Learning Page 3


Task 2 Categorical Naïve Bayes on UCI dataset

1. Use pandas.read_csv to read the csv file corresponding to the UCI dataset selected in
the last lab.
2. Partition the dataset in to two part, i.e training and testing. You can set a division of
your dataset as 80% training and 20% testing. Use train_test_split for splitting the
data.
3. CategoricalNB assumes that the input data is encoded such that all categories for each
feature i are represented with numbers 0 … ni −1 where ni  is the number of available
categories of feature i. Therefore preprocess the dataset using OrdinalEncoder
(https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers
4. Apply a Categorical Naïve Bayes Classifier (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) using scikit-learn on your dataset
a. To analyze the performance on train and test split, use
sklearn.metrics.accuracy_score
b. Apply lambda smoothing by varying the parameter “alpha” from 0 to 50. Plot
the result using Matplotlib.
c. What is the best alpha value for your dataset?

Task 3 Gaussian Naïve Bayes

If the feature values are not categorical but continuous, then Gaussian Naïve Bayes is used,
which assumes that the likelihood of a Gaussian:

1. Apply Gaussian Naïve Bayes


on breast cancer data set (sklearn.datasets.load_breast_cancer()). The dataset consists
of 30 features. Split the dataset in to train and test set and use GaussianNB to train a
classifier. Report train and test errors.

Deliverables

• Students are required to upload the lab task solution in ipynb format on LMS.

CS471: Machine Learning Page 4


• The file name must contain your name and CMS ID in the following format. <Lab_your
CMS ID_your name.ipynb>

CS471: Machine Learning Page 5

You might also like