Professional Documents
Culture Documents
Image("b.jpg")
Out[223]:
import numpy as np
import math
import pandas as pd
trainy.dropna(inplace=True)
trainy.drop_duplicates(inplace=True)
Computing Priors for the two tumor types: Benign "B" and Malignant "M"
prior_M = 1-prior_B
priors = [prior_B,prior_M]
Obtaining mean and standard deviation associated with each class for the following attributes
testy.dropna(inplace=True)
testy.drop_duplicates(inplace=True)
In [203]: prediction = []
prediction.append('B')
else:
prediction.append('M')
In [207]: prediction_scaled = []
prediction_scaled.append('B')
else:
prediction_scaled.append('M')
Q3.c)
The priors of the two classes "B" (Benign) and "M" (Malignant) clearly shows the class imbalance in the given dataset. Nearly
62.45% of the dataset is labelled "B" while only 37.55% being labelled as "M". Although this isn't a severe case, this does
have a significant impact on the performance of Naive Bayes' classifier because of the following arguments. Priors of the
classes (as evident from the above mathematical derivation in (Q3.a)) plays a crucial role in prediction task as we assume that
the observed priors represents the natural frequency at which the classes of interest occur. Priors calculated from an
imbalanced dataset introduces bias in the prediction task as a higher value of Prior term could mask the effect of even higher
valued P(x i | Tt) terms thereby promoting the class with higher prior to be picked. So when there is a class imbalance,
reliablity over the classifier's prediction decreases.