You are on page 1of 7

8/31/2021 Untitled6.

ipynb - Colaboratory

Name : YANDRAPU MANOJ NAIDU

Roll no: 20MDT1017

from google.colab import files 
uploaded = files.upload()

Choose Files Diabetes.csv


Diabetes.csv(application/vnd.ms-excel) - 23873 bytes, last modified: 8/31/2021 - 100% done
Saving Diabetes.csv to Diabetes (1).csv

import pandas as pd
import io
df = pd.read_csv(io.BytesIO(uploaded['Diabetes.csv']))
print(df)

Pregnancies Glucose ... Age Outcome

0 6 148 ... 50 1

1 1 85 ... 31 0

2 8 183 ... 32 1

3 1 89 ... 21 0

4 0 137 ... 33 1

.. ... ... ... ... ...

763 10 101 ... 63 0

764 2 122 ... 27 0

765 5 121 ... 30 0

766 1 126 ... 47 1

767 1 93 ... 23 0

[768 rows x 9 columns]

df.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

There are total of 9 columns in the Diabetes dataset

df.dtypes

Pregnancies int64

Glucose int64

BloodPressure int64

SkinThickness int64

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 1/7
8/31/2021 Untitled6.ipynb - Colaboratory

Insulin int64

BMI float64

DiabetesPedigreeFunction float64

Age int64

Outcome int64

dtype: object

each column data type

correlation=df.corr()
correlation.style.background_gradient(cmap='coolwarm')

Pregnancies Glucose BloodPressure SkinThickness Insulin BM


Pregnancies 1.000000 0.129459 0.141282 -0.081672 -0.073535 0.0176
Glucose 0.129459 1.000000 0.152590 0.057328 0.331357 0.2210
BloodPressure 0.141282 0.152590 1.000000 0.207371 0.088933 0.2818
SkinThickness -0.081672 0.057328 0.207371 1.000000 0.436783 0.3925
Insulin -0.073535 0.331357 0.088933 0.436783 1.000000 0.1978
BMI 0.017683 0.221071 0.281805 0.392573 0.197859 1.0000
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928 0.185071 0.1406
Age 0.544341 0.263514 0.239528 -0.113970 -0.042163 0.0362
Outcome 0.221898 0.466581 0.065068 0.074752 0.130548 0.2926

correlation matrix for diabetes dataset

correlation.style.background_gradient(cmap='coolwarm').set_precision(2)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab


Pregnancies 1.00 0.13 0.14 -0.08 -0.07 0.02 -0.03
Glucose 0.13 1.00 0.15 0.06 0.33 0.22 0.14
BloodPressure 0.14 0.15 1.00 0.21 0.09 0.28 0.04
SkinThickness -0.08 0.06 0.21 1.00 0.44 0.39 0.18
Insulin -0.07 0.33 0.09 0.44 1.00 0.20 0.19
BMI 0.02 0.22 0.28 0.39 0.20 1.00 0.14
DiabetesPedigreeFunction -0.03 0.14 0.04 0.18 0.19 0.14 1.00
Age 0.54 0.26 0.24 -0.11 -0.04 0.04 0.03
Outcome 0.22 0.47 0.07 0.07 0.13 0.29 0.17

rounding the decimal values to two

import matplotlib.pyplot as plt
plt.matshow(df.corr())
plt.show()

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 2/7
8/31/2021 Untitled6.ipynb - Colaboratory

age=df['Age']
out=df['Outcome']

import matplotlib.pyplot as plt
plt.bar(age,out)
plt.show()

bar chart for age vs outcome

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 3/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd06376050>
for i in df.columns:
  print(i,":",df[i][df[i]==0].count())

Pregnancies : 111

Glucose : 5

BloodPressure : 35

SkinThickness : 227

Insulin : 374

BMI : 11

DiabetesPedigreeFunction : 0

Age : 0

Outcome : 500

number of zeros present in each column

for col in df.columns:
  val=df[col].mean()
  df[col]=df[col].replace(0,val)

replaced zeros with mean values

df.head(10)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabete

0 6.000000 148.0 72.000000 35.000000 79.799479 33.600000

1 1.000000 85.0 66.000000 29.000000 79.799479 26.600000

2 8.000000 183.0 64.000000 20.536458 79.799479 23.300000

3 1.000000 89.0 66.000000 23.000000 94.000000 28.100000

4 3.845052 137.0 40.000000 35.000000 168.000000 43.100000

5 5.000000 116.0 74.000000 20.536458 79.799479 25.600000

6 3.000000 78.0 50.000000 32.000000 88.000000 31.000000

7 10.000000 115.0 69.105469 20.536458 79.799479 35.300000

8 2.000000 197.0 70.000000 45.000000 543.000000 30.500000

9 8.000000 125.0 96.000000 20.536458 79.799479 31.992578

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 4/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd085f80d0>

# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]

type(X)

numpy.ndarray

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)

# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(514, 8) (254, 8) (514,) (254,)

# determine the number of input features
n_features = X_train.shape[1]

# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_feat
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1, activation='sigmoid'))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 5/7
8/31/2021 Untitled6.ipynb - Colaboratory

<keras.callbacks.History at 0x7fdd06210110>

# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Test Accuracy: %.3f' % acc)

Test Accuracy: 0.661

# make a prediction
import numpy as np
row = np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)

Predicted: 0.204

import numpy as np
row1=np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
row1.shape

(1, 8)

yhat = model.predict([row1])
print('Predicted: %.3f' % yhat)

Predicted: 0.204

model.summary()

Model: "sequential_1"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_3 (Dense) (None, 10) 90

_________________________________________________________________

dense_4 (Dense) (None, 8) 88

_________________________________________________________________

dense_5 (Dense) (None, 1) 9

=================================================================

Total params: 187

Trainable params: 187

Non-trainable params: 0

_________________________________________________________________

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 6/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd086bc450>

check 0s completed at 12:23

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 7/7

You might also like