Professional Documents
Culture Documents
Abstract:-
Problem Statement:-
Description of Dataset:-
The dataset represents a list of study from different patients that leads
to classification of either diabetic or not.
For this coursework I will use these presented data and adopt a Knn
algorithm to test some given data of patients and see if they are under
either category diabetes or non-diabetic.
Total number of studied list in this dataset related to diabetic and non-
diabetic patient is 768 , which we will manipulate ,scrap and clean
these data to use them in our KNN predictive model.
The dataset consists of several medical predictor values and one target
variable, outcome.
Implementation of Code:-
import pandas as pd
data = pd.read_csv("/content/diabetes.csv")
data
x = data.drop(['Outcome'], axis = 1)
x.head()
y = data['Outcome']
y
0 1
1 0
2 1
3 0
4 1
..
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64
scaler = MinMaxScaler()
x = scaler.fit_transform(x)
0 1
1 0
2 1
3 0
4 1
..
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(xtrain, ytrain)
KNeighborsClassifier (n_neighbors=1)
ypred = knn.predict(xtest)
ypred
array([1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1,
0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0,
1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0])
ytest
285 0
101 0
581 0
352 0
726 0
..
241 0
599 0
650 0
11 1
214 1
Name: Outcome, Length: 231, dtype: int64
print(confusion_matrix(ytest, ypred))
print(classification_report(ytest, ypred))
import numpy as np
error_rate = []
error_rate.append(np.mean(pred_i != ytest))
plt.figure(figsize=(10, 6))
plt.xlabel('K')
plt.ylabel('Error rate')
knn = KNeighborsClassifier(n_neighbors=13)
knn.fit(xtrain, ytrain)
predictions = knn.predict(xtest)
print(confusion_matrix(ytest, ypred))
print(classification_report(ytest, ypred))
[[119 27]
[ 40 45]]
precision recall f1-score support
p = data.hist(figsize = (20,20))
y_pred = knn.predict(xtest)
p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True,
cmap="YlGnBu" ,fmt='g')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
diabetes or not
REFERENCES :-
For this coursework we use these presented data and adopt a Knn
algorithm to test some given data of patients and see if they are under
either category diabetes or non-diabetic.
https://colab.research.google.com/drive/1vEet9M4-
0shXSlqTlhFmoh0LTthYtf7M?usp=sharing
https://youtu.be/DzWE7xIlkPM?si=E-mlb1fLtwKsZi9S