Professional Documents
Culture Documents
Roll no 22527
IT- 34 Knowledge Representation and Artificial Intelligence: ML, DL
Rubrics:
Parameters Satisfactory Good Very Good Score
Perform Any 1 of the 2 steps are Minimum 3 CO3: 3 Marks
minimum three steps is performed - 2 steps are
steps of data performed - 1 Marks performed - 3
pre-processing Mark Marks
(1 mark for each
step)
Perform Any 1 of the 3 2 of the 3 All 3 mentioned CO4: 6 Marks
Exploratory mentioned steps mentioned steps steps performed
Data Analysis: performed - 2 performed - 4 - 6 Marks
Summary Marks Marks
Statistics, Data
Visualization,
Correlation
analysis (2
marks each)
The dataset we will be working with comprises responses from the survey conducted among
residents of different cities. It includes the following features:
We will explore the relationships between these features and happiness, perform data analysis
and visualization, and build a classification model to predict happiness based on the given
attributes.
df.info()
Output>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 143 entries, 0 to 142
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 infoavail 143 non-null int64
1 housecost 143 non-null int64
2 schoolquality 143 non-null int64
3 policetrust 143 non-null int64
4 streetquality 143 non-null int64
5 ëvents 143 non-null int64
6 happy 143 non-null int64
dtypes: int64(7)
memory usage: 7.9 KB
>>df.describe()
mea
4.314685 2.538462 3.265734 3.699301 3.615385 4.216783 0.538462
n
25
4.000000 2.000000 3.000000 3.000000 3.000000 4.000000 0.000000
%
50
5.000000 3.000000 3.000000 4.000000 4.000000 4.000000 1.000000
%
75
5.000000 3.000000 4.000000 4.000000 4.000000 5.000000 1.000000
%
>>df['happy'].value_counts()
1 77
0 66
Name: happy, dtype: int64
This dataset has 77 data rows corresponding to Happy label and 66 data rows corresponding
to Unhappy label.
df.isna().sum()
infoavail 0
housecost 0
schoolquality 0
policetrust 0
streetquality 0
ëvents 0
happy 0
dtype: int64
Observation :
• sns.displot(df['infoavail'],kde=True)
output >> <seaborn.axisgrid.FacetGrid at 0x798ec19f6e90>
>>sns.displot(df['housecost'],kde=True)
Output>
>>sns.displot(df['schoolquality'],kde=True)
Output > <seaborn.axisgrid.FacetGrid at 0x798ebf94cd30>
>>sns.displot(df['policetrust'],kde=True)
Output <seaborn.axisgrid.FacetGrid at 0x798ec1bc3280>
>> sns.displot(df['streetquality'],kde=True)
Output <seaborn.axisgrid.FacetGrid at 0x798ebd81fd30>
>>sns.displot(df['ëvents'],kde=True)
<seaborn.axisgrid.FacetGrid at 0x798ebd7297e0>
Algorithms: Train-Test Split
x = df.iloc[:,:6]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x , df["happy"],
test_size=0.2,
random_state=0)
x_train.shape, x_test.shape
Decision Trees
#importing library
from sklearn import tree
#implementing decision trees
dtr = tree.DecisionTreeClassifier()
dtr.fit(x_train,y_train)
DecisionTreeClassifier()
SVM
# importing library
from sklearn import svm
#implementing SVM
sv = svm.SVC()
sv.fit(x_train,y_train)
Output >>SVC()
#predicting values and testing accuracy
spred = sv.predict(x_test)
accuracy_score(spred,y_test)
Output>>0.56
Random Forest
#importing library
from sklearn.ensemble import RandomForestClassifier
#implementing random forests
rfr = RandomForestClassifier()
rfr.fit(x_train,y_train)
Output >>RandomForestClassifier()
x_train.shape, x_test.shape
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
lreg = LogisticRegression()
lreg.fit(x_train,y_train)
lpred = lreg.predict(x_test)
accuracy_score(lpred , y_test)
from sklearn import tree
dtr = tree.DecisionTreeClassifier()
dtr.fit(x_train,y_train)
dpred=dtr.predict(x_test)
accuracy_score(dpred,y_test)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train,y_train)
kpred = knn.predict(x_test)
accuracy_score(kpred,y_test)
from sklearn import svm
sv = svm.SVC()
sv.fit(x_train,y_train)
spred = sv.predict(x_test)
accuracy_score(spred,y_test)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train,y_train)
kpred = knn.predict(x_test)
accuracy_score(kpred,y_test)
from sklearn.ensemble import RandomForestClassifier
#implementing random forests
rfr = RandomForestClassifier()
rfr.fit(x_train,y_train)
fpred=rfr.predict(x_test)
accuracy_score(fpred,y_test)
KNN 0.6
SVM 0.56