You are on page 1of 4
1.5. Apply KNN Model and Naive Bayes Model. Interpret the inferences of each model. _K Nearest Neighbours Model KNN is a distance based supervised machine learning algorithm that can be used to solve both classification and regression problems. Main disadvantage of this model is it becomes very slow when large volume of data is there and thus makes it an impractical choice where inferences need to be drawn quickly. Before fitting the model, it is important to know about the hyper parameters that is involved in model building. Parameters: + n_neighbors + weights + algorithm Now after performing the "GridSearchCV’", the best parameters obtained are- *'n_neighbors’ = 5, + 'weights' = uniform, “algorithm’ = auto Now the results for unscaled data- Train Accuracy- 0.8369259606373008 Test Accuracy- 0.8165938864628821 Probabilities on the test set- 200 10 3 00 10 400 10 5 02 08 6 02 08 7 00 10 Now the results for scaled data- Train Accuracy- 0.8603561387066542 Test Accuracy- 0.8384279475982532 Probabilities on the test set- 02 08 00 10 00 10 00 10 awe wn 00 10 ~ 00 10 Inference- ‘The model performed better with the scaled data Also, overall the model performed well but there can be slight overfiting as Accuracy is more for Train set then for the test. Naive Bayes Model Naive Bayes classifiers is a model based on applying Bayes’ theorem with strong (naive) independent assumptions between the features. These assumptions however may not be the perfect case in real life learner inertia leanne re Bayes Theorem- prior x likelihood evidence posterior = Here the method that we are going to use is the GaussianNB() method, also know as BernoulliNB(). This method requires all the features to be in categorical type. A general assumption in this method is the data is following a normal or Gaussian distribution. There are no specific parameters in this model like other, so we will simply fit the model with default parameters. Now the results for unscaled data- Train Accuracy- 0.8219306466729147 Test Accuracy- 0.8471615720524017 Probabilities on the test set- ° 1 © 0.240951 0.759049 1 0.075278 0.924722 2 0.007475 0.992525 0.161603 0.638307 0.000622 0.999378 0.009635 0.990364 0.032205 0.967705 0.002369 0.997631 0.004579 0.99542 econ nwae 0.704677 Now the results for scaled data- Train Accuracy- 0.8219306466729147 Test Accuracy- 0.8471615720524017 Probabilities on the test set- 0 1 0 o240951 0.739049 1 0075278 0.924722 2 0.007475 0.992525 3 0.161693 0.838307 4 0.000622 0.999378 5 0.009636 0.990364 6 0032205 0.967795 7 0.002369 0.997631 @ 000aS79 9.905477 Inference- ‘The model performed exactly the same for both Unscaled and Scaled data. This model performed well on the data no overfitting or under-fitting present. 1.6 Model Tuning, Bagging (Random Forest should be applied for Bagging), and Boosting.

You might also like