Professional Documents
Culture Documents
Introduction
In this lab you will implement and test Naïve Bayes Algorithm. You may find this helpful.
https://scikit-learn.org/stable/modules/naive_bayes.html
Objectives
N tic + λ
p ( x i=t| y=c ) =
N c + λ ni
Where N tic is the number of times category t appears in the samples x i which belong to class
y=c. N c is the number of samples/examples with y=c . λ is the smoothing parameter and ni is
the number of available category of feature x i.
Prior Likelihood
Play P ( play ) play Weather p ( weather ∨ play ) play Temp p ( temp ∨ play )
2 2 9
\frac{\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
{\left(\frac{2+1}{9+3}\cdot\frac{2+1}{9+3}\right)\cdot\frac{9+3}
{14+6}+\left(\frac{3+1}{5+3}\cdot\frac{2+1}{5+3}\right)\cdot\frac{5+3}{14+6}}
3. compute the posterior p ( play ∨ weather =Sunny , temp=hot ) using Laplace smoothing
with λ=1
4. Now use Sci-kit CategoricalNB (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) to compute the posterior
p ( play ∨ weather =Sunny , temp=hot ) with λ={0,1 }. CategoricalNB assumes that
the input data is encoded such that all categories for each feature i are represented
with numbers 0 … ni −1 where ni is the number of available categories of feature i.
Therefore, preprocess the dataset using OrdinalEncoder (https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers
1. Use pandas.read_csv to read the csv file corresponding to the UCI dataset selected in
the last lab.
2. Partition the dataset in to two part, i.e training and testing. You can set a division of
your dataset as 80% training and 20% testing. Use train_test_split for splitting the
data.
3. CategoricalNB assumes that the input data is encoded such that all categories for each
feature i are represented with numbers 0 … ni −1 where ni is the number of available
categories of feature i. Therefore preprocess the dataset using OrdinalEncoder
(https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). This
will convert the features in your data set to ordinal integers
4. Apply a Categorical Naïve Bayes Classifier (https://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.
naive_bayes.CategoricalNB) using scikit-learn on your dataset
a. To analyze the performance on train and test split, use
sklearn.metrics.accuracy_score
b. Apply lambda smoothing by varying the parameter “alpha” from 0 to 50. Plot
the result using Matplotlib.
c. What is the best alpha value for your dataset?
If the feature values are not categorical but continuous, then Gaussian Naïve Bayes is used,
which assumes that the likelihood of a Gaussian:
Deliverables
• Students are required to upload the lab task solution in ipynb format on LMS.