You are on page 1of 8

EX. NO 3 R.

SNEHAL NIKAM - 8296


IMPLEMENT NAÏVE BAYES MODELS

import pandas as pd
titanic = pd.read_csv(r"C:\Users\nikam\OneDrive\Desktop\lab\Titanic.csv")
print(titanic)
titanic.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axi
s ='columns',inplace=True)
print(titanic.head(20))
target=titanic.Survived
inputs=titanic.drop('Survived', axis='columns')
dummies = pd.get_dummies(inputs.Sex)
print(dummies.head(10))
inputs = pd.concat([inputs,dummies],axis='columns')
print(inputs.head())
inputs.drop('Sex',axis ='columns',inplace=True)
print("Afeter removing Sex column")
print(inputs.head())
inputs.columns[inputs.isna().any()]
print(inputs.head(20))
inputs.Age = inputs.Age.fillna(inputs.Age.mean())
print("Filling Null Values")
print(inputs.head(20))
print(inputs.columns[inputs.isna().any()])
inputs.Fare = inputs.Fare.fillna(inputs.Fare.mean())
print(inputs.columns[inputs.isna().any()])
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(inputs,target,test_size =0.2)
print("Length of Training",len(X_train))
print("Length of Test",len(X_test))
print("Length of Dataset",len(inputs))
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
print(model.fit(X_train, y_train))
print(model.score(X_test,y_test))
print(X_test[:10])
print(y_test[:10])
print(model.predict(X_test[:10]))
print(model.predict_proba(X_test[:10]))

OUTPUT:

PassengerId Survived Pclass ... Fare Cabin Embarked


0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q

Survived Pclass Sex Age Fare


0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
5 0 3 male NaN 8.4583
6 0 1 male 54.0 51.8625
7 0 3 male 2.0 21.0750
8 1 3 female 27.0 11.1333
9 1 2 female 14.0 30.0708
10 1 3 female 4.0 16.7000
11 1 1 female 58.0 26.5500
12 0 3 male 20.0 8.0500
13 0 3 male 39.0 31.2750
14 0 3 female 14.0 7.8542
15 1 2 female 55.0 16.0000
16 0 3 male 2.0 29.1250
17 1 2 male NaN 13.0000
18 0 3 female 31.0 18.0000
19 1 3 female NaN 7.2250
female male
0 0 1
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
7 0 1
8 1 0
9 1 0

Pclass Sex Age Fare female male


0 3 male 22.0 7.2500 0 1
1 1 female 38.0 71.2833 1 0
2 3 female 26.0 7.9250 1 0
3 1 female 35.0 53.1000 1 0
4 3 male 35.0 8.0500 0 1
Afeter removing Sex column
Pclass Age Fare female male
0 3 22.0 7.2500 0 1
1 1 38.0 71.2833 1 0
2 3 26.0 7.9250 1 0
3 1 35.0 53.1000 1 0
4 3 35.0 8.0500 0 1
Pclass Age Fare female male
0 3 22.0 7.2500 0 1
1 1 38.0 71.2833 1 0
2 3 26.0 7.9250 1 0
3 1 35.0 53.1000 1 0
4 3 35.0 8.0500 0 1
5 3 NaN 8.4583 0 1
6 1 54.0 51.8625 0 1
7 3 2.0 21.0750 0 1
8 3 27.0 11.1333 1 0
9 2 14.0 30.0708 1 0
10 3 4.0 16.7000 1 0
11 1 58.0 26.5500 1 0
12 3 20.0 8.0500 0 1
13 3 39.0 31.2750 0 1
14 3 14.0 7.8542 1 0
15 2 55.0 16.0000 1 0
16 3 2.0 29.1250 0 1
17 2 NaN 13.0000 0 1
18 3 31.0 18.0000 1 0
19 3 NaN 7.2250 1 0
Filling Null Values
Pclass Age Fare female male
0 3 22.000000 7.2500 0 1
1 1 38.000000 71.2833 1 0
2 3 26.000000 7.9250 1 0
3 1 35.000000 53.1000 1 0
4 3 35.000000 8.0500 0 1
5 3 29.699118 8.4583 0 1
6 1 54.000000 51.8625 0 1
7 3 2.000000 21.0750 0 1
8 3 27.000000 11.1333 1 0
9 2 14.000000 30.0708 1 0
10 3 4.000000 16.7000 1 0
11 1 58.000000 26.5500 1 0
12 3 20.000000 8.0500 0 1
13 3 39.000000 31.2750 0 1
14 3 14.000000 7.8542 1 0
15 2 55.000000 16.0000 1 0
16 3 2.000000 29.1250 0 1
17 2 29.699118 13.0000 0 1
18 3 31.000000 18.0000 1 0
19 3 29.699118 7.2250 1 0
Index([], dtype='object')
Index([], dtype='object')

Length of Training 712


Length of Test 179
Length of Dataset 891
GaussianNB()
0.7932960893854749
Pclass Age Fare female male
143 3 19.000000 6.7500 0 1
367 3 29.699118 7.2292 1 0
145 2 19.000000 36.7500 0 1
24 3 8.000000 21.0750 1 0
398 2 23.000000 10.5000 0 1
60 3 22.000000 7.2292 0 1
493 1 71.000000 49.5042 0 1
85 3 33.000000 15.8500 1 0
466 2 29.699118 0.0000 0 1
87 3 29.699118 8.0500 0 1
print(y_test[:10])
143 0
367 1
145 0
24 0
398 0
60 0
493 0
85 1
466 0
87 0
Name: Survived, dtype: int64
[0 1 0 1 0 0 0 1 0 0]
[[0.98877268 0.01122732]
[0.08141863 0.91858137]
[0.9681346 0.0318654 ]
[0.05322299 0.94677701]
[0.97543675 0.02456325]
[0.98937843 0.01062157]
[0.70315246 0.29684754]
[0.08256126 0.91743874]
[0.97554233 0.02445767]
[0.9901925 0.0098075 ]]

You might also like