Name: Yandrapu Manoj Naidu Roll No: 20MDT1017: Choose Files

8/31/2021 Untitled6.
ipynb - Colaboratory
Name : YANDRAPU MANOJ NAIDU
Roll no: 20MDT1017
from google.colab import files
uploaded = files.upload()
Choose Files Diabetes.csv

Diabetes.csv(application/vnd.ms-excel) - 23873 bytes, last modified: 8/31/2021 - 100% done
Saving Diabetes.csv to Diabetes (1).csv
import pandas as pd
import io
df = pd.read_csv(io.BytesIO(uploaded['Diabetes.csv']))
print(df)
Pregnancies Glucose ... Age Outcome
0 6 148 ... 50 1
1 1 85 ... 31 0
2 8 183 ... 32 1
3 1 89 ... 21 0
4 0 137 ... 33 1
.. ... ... ... ... ...
763 10 101 ... 63 0
764 2 122 ... 27 0
765 5 121 ... 30 0
766 1 126 ... 47 1
767 1 93 ... 23 0
[768 rows x 9 columns]
df.head()
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
There are total of 9 columns in the Diabetes dataset
df.dtypes
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 1/7
8/31/2021 Untitled6.ipynb - Colaboratory
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
Outcome int64
dtype: object
each column data type
correlation=df.corr()
correlation.style.background_gradient(cmap='coolwarm')
Pregnancies Glucose BloodPressure SkinThickness Insulin BM

Pregnancies 1.000000 0.129459 0.141282 -0.081672 -0.073535 0.0176
Glucose 0.129459 1.000000 0.152590 0.057328 0.331357 0.2210
BloodPressure 0.141282 0.152590 1.000000 0.207371 0.088933 0.2818
SkinThickness -0.081672 0.057328 0.207371 1.000000 0.436783 0.3925
Insulin -0.073535 0.331357 0.088933 0.436783 1.000000 0.1978
BMI 0.017683 0.221071 0.281805 0.392573 0.197859 1.0000
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928 0.185071 0.1406
Age 0.544341 0.263514 0.239528 -0.113970 -0.042163 0.0362
Outcome 0.221898 0.466581 0.065068 0.074752 0.130548 0.2926
correlation matrix for diabetes dataset
correlation.style.background_gradient(cmap='coolwarm').set_precision(2)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab

Pregnancies 1.00 0.13 0.14 -0.08 -0.07 0.02 -0.03
Glucose 0.13 1.00 0.15 0.06 0.33 0.22 0.14
BloodPressure 0.14 0.15 1.00 0.21 0.09 0.28 0.04
SkinThickness -0.08 0.06 0.21 1.00 0.44 0.39 0.18
Insulin -0.07 0.33 0.09 0.44 1.00 0.20 0.19
BMI 0.02 0.22 0.28 0.39 0.20 1.00 0.14
DiabetesPedigreeFunction -0.03 0.14 0.04 0.18 0.19 0.14 1.00
Age 0.54 0.26 0.24 -0.11 -0.04 0.04 0.03
Outcome 0.22 0.47 0.07 0.07 0.13 0.29 0.17
rounding the decimal values to two
import matplotlib.pyplot as plt
plt.matshow(df.corr())
plt.show()
age=df['Age']
out=df['Outcome']
import matplotlib.pyplot as plt
plt.bar(age,out)
plt.show()
bar chart for age vs outcome
df.boxplot(by ='Outcome', column =['Insulin'], grid = False)
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)
<matplotlib.axes._subplots.AxesSubplot at 0x7fdd06376050>
for i in df.columns:
print(i,":",df[i][df[i]==0].count())
Pregnancies : 111
Glucose : 5
BloodPressure : 35
SkinThickness : 227
Insulin : 374
BMI : 11
DiabetesPedigreeFunction : 0
Age : 0
Outcome : 500
number of zeros present in each column
for col in df.columns:
val=df[col].mean()
df[col]=df[col].replace(0,val)
replaced zeros with mean values
df.head(10)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabete
0 6.000000 148.0 72.000000 35.000000 79.799479 33.600000
1 1.000000 85.0 66.000000 29.000000 79.799479 26.600000
2 8.000000 183.0 64.000000 20.536458 79.799479 23.300000
3 1.000000 89.0 66.000000 23.000000 94.000000 28.100000
4 3.845052 137.0 40.000000 35.000000 168.000000 43.100000
5 5.000000 116.0 74.000000 20.536458 79.799479 25.600000
6 3.000000 78.0 50.000000 32.000000 88.000000 31.000000
7 10.000000 115.0 69.105469 20.536458 79.799479 35.300000
8 2.000000 197.0 70.000000 45.000000 543.000000 30.500000
9 8.000000 125.0 96.000000 20.536458 79.799479 31.992578
<matplotlib.axes._subplots.AxesSubplot at 0x7fdd085f80d0>
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]
type(X)
numpy.ndarray
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(514, 8) (254, 8) (514,) (254,)
# determine the number of input features
n_features = X_train.shape[1]
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_feat
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)
<keras.callbacks.History at 0x7fdd06210110>
# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Test Accuracy: %.3f' % acc)
Test Accuracy: 0.661
# make a prediction
import numpy as np
row = np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)
Predicted: 0.204
import numpy as np
row1=np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
row1.shape
(1, 8)
yhat = model.predict([row1])
print('Predicted: %.3f' % yhat)
Predicted: 0.204
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 10) 90
_________________________________________________________________
_________________________________________________________________
=================================================================
Total params: 187
Trainable params: 187
Non-trainable params: 0
_________________________________________________________________
<matplotlib.axes._subplots.AxesSubplot at 0x7fdd086bc450>
check 0s completed at 12:23

Name: Yandrapu Manoj Naidu Roll No: 20MDT1017: Choose Files

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Name: Yandrapu Manoj Naidu Roll No: 20MDT1017: Choose Files

Uploaded by

Copyright:

Available Formats

8/31/2021 Untitled6.

Name : YANDRAPU MANOJ NAIDU

Roll no: 20MDT1017

Choose Files Diabetes.csv

Pregnancies Glucose ... Age Outcome

.. ... ... ... ... ...

763 10 101 ... 63 0

764 2 122 ... 27 0

765 5 121 ... 30 0

766 1 126 ... 47 1

[768 rows x 9 columns]

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

4 0 137 40 35 168 43.1

There are total of 9 columns in the Diabetes dataset

each column data type

Pregnancies Glucose BloodPressure SkinThickness Insulin BM

correlation matrix for diabetes dataset

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab

rounding the decimal values to two

bar chart for age vs outcome

number of zeros present in each column

replaced zeros with mean values

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabete

0 6.000000 148.0 72.000000 35.000000 79.799479 33.600000

1 1.000000 85.0 66.000000 29.000000 79.799479 26.600000

2 8.000000 183.0 64.000000 20.536458 79.799479 23.300000

3 1.000000 89.0 66.000000 23.000000 94.000000 28.100000

4 3.845052 137.0 40.000000 35.000000 168.000000 43.100000

5 5.000000 116.0 74.000000 20.536458 79.799479 25.600000

6 3.000000 78.0 50.000000 32.000000 88.000000 31.000000

7 10.000000 115.0 69.105469 20.536458 79.799479 35.300000

8 2.000000 197.0 70.000000 45.000000 543.000000 30.500000

9 8.000000 125.0 96.000000 20.536458 79.799479 31.992578

(514, 8) (254, 8) (514,) (254,)

Test Accuracy: 0.661

Layer (type) Output Shape Param #

dense_3 (Dense) (None, 10) 90

dense_4 (Dense) (None, 8) 88

dense_5 (Dense) (None, 1) 9

Total params: 187

Trainable params: 187

check 0s completed at 12:23

You might also like