You are on page 1of 3

Import libraries

import numpy as np
import pandas as pd

Load the dataset


data = pd.read_csv("C:/Users/User/Downloads/diabetes2.csv")
data.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin


BMI \
0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

OneR Classifier
oneR considers each of the attributes. It make rules for each attribute, and selectes the rule
which generates highest accuracy. The algorithm is
column_age = []

for age in data['Age']:


if(age >= 20.9 and age < 33):
column_age.append("0")
elif(age < 45):
column_age.append("1")
elif(age < 57):
column_age.append("2")
elif(age < 69):
column_age.append("3")
elif(age < 81.1):
column_age.append("4")
# adding a new column
data["Age_Categorical"] = column_age

data.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin


BMI \
0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome Age_Categorical


0 0.627 50 1 2
1 0.351 31 0 0
2 0.672 32 1 0
3 0.167 21 0 0
4 2.288 33 1 1

for i in range(0,5):
print("If Age Category: ", i, " , number of outcomes(0): ",
len( data[ (data['Age_Categorical'] == str(i)) & (data['Outcome'] ==
0) ]) )
print("If Age Category: ", i, " , number of outcomes(1): ",
len( data[ (data['Age_Categorical'] == str(i)) & (data['Outcome'] ==
1) ] ) ,"\n")

If Age Category: 0 , number of outcomes(0): 345


If Age Category: 0 , number of outcomes(1): 112

If Age Category: 1 , number of outcomes(0): 88


If Age Category: 1 , number of outcomes(1): 90

If Age Category: 2 , number of outcomes(0): 35


If Age Category: 2 , number of outcomes(1): 51

If Age Category: 3 , number of outcomes(0): 28


If Age Category: 3 , number of outcomes(1): 14

If Age Category: 4 , number of outcomes(0): 4


If Age Category: 4 , number of outcomes(1): 1
So, If we have a set of rules only for attribute Age_Categorical which
says:
if, Age_Category(20.9,33) = 0 then Outcome= 0
if, Age_Categoriy(33,45) = 1 then Outcome= 1
if, Age_Categoriy(45,57) = 2 then Outcome= 1
if, Age_Categoriy(57,69) = 3 then Outcome= 0
if, Age_Categoriy(69,81.1) = 4 then Outcome= 0

The accuracy of the model will be:


(345+90+51+28+4)/(345+112+88+90+35+51+28+14+4+1)*100

67.44791666666666

You might also like