You are on page 1of 18

2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

Model to predict whether Employee is gonna stay or leave the company using Random Forest Classifier

Import the libraries

In [1]: 1 import numpy as np


2 import pandas as pd
3 import matplotlib.pyplot as plt
4 %matplotlib inline
5 import seaborn as sns
6 import warnings
7 warnings.filterwarnings("ignore")

Import the Dataset

In [2]: 1 df=pd.read_csv("Employee_Attrition.csv")
2 df.head()

Out[2]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLe

0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80

Research &
1 49 No Travel_Frequently 279 8 1 Life Sciences 1 2 ... 4 80
Development

Research &
2 37 Yes Travel_Rarely 1373 2 2 Other 1 4 ... 2 80
Development

Research &
3 33 No Travel_Frequently 1392 3 4 Life Sciences 1 5 ... 3 80
Development

Research &
4 27 No Travel_Rarely 591 2 1 Medical 1 7 ... 4 80
Development

5 rows × 35 columns

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 1/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [3]: 1 # Last 5 records


2 df.tail()

Out[3]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOption

Research &
1465 36 No Travel_Frequently 884 23 2 Medical 1 2061 ... 3 80
Development

Research &
1466 39 No Travel_Rarely 613 6 1 Medical 1 2062 ... 1 80
Development

Research &
1467 27 No Travel_Rarely 155 4 3 Life Sciences 1 2064 ... 2 80
Development

1468 49 No Travel_Frequently 1023 Sales 2 3 Medical 1 2065 ... 4 80

Research &
1469 34 No Travel_Rarely 628 8 3 Medical 1 2068 ... 1 80
Development

5 rows × 35 columns

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 2/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [4]: 1 # Summary of dataset


2 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 1470 non-null int64
1 Attrition 1470 non-null object
2 BusinessTravel 1470 non-null object
3 DailyRate 1470 non-null int64
4 Department 1470 non-null object
5 DistanceFromHome 1470 non-null int64
6 Education 1470 non-null int64
7 EducationField 1470 non-null object
8 EmployeeCount 1470 non-null int64
9 EmployeeNumber 1470 non-null int64
10 EnvironmentSatisfaction 1470 non-null int64
11 Gender 1470 non-null object
12 HourlyRate 1470 non-null int64
13 JobInvolvement 1470 non-null int64
14 JobLevel 1470 non-null int64
15 JobRole 1470 non-null object
16 JobSatisfaction 1470 non-null int64
17 MaritalStatus 1470 non-null object
18 MonthlyIncome 1470 non-null int64
19 MonthlyRate 1470 non-null int64
20 NumCompaniesWorked 1470 non-null int64
21 Over18 1470 non-null object
22 OverTime 1470 non-null object
23 PercentSalaryHike 1470 non-null int64
24 PerformanceRating 1470 non-null int64
25 RelationshipSatisfaction 1470 non-null int64
26 StandardHours 1470 non-null int64
27 StockOptionLevel 1470 non-null int64
28 TotalWorkingYears 1470 non-null int64
29 TrainingTimesLastYear 1470 non-null int64
30 WorkLifeBalance 1470 non-null int64
31 YearsAtCompany 1470 non-null int64
32 YearsInCurrentRole 1470 non-null int64
33 YearsSinceLastPromotion 1470 non-null int64
34 YearsWithCurrManager 1470 non-null int64
dtypes: int64(26), object(9)
memory usage: 402.1+ KB

We could observe that there are totally 1470 records in the dataset, it has both numerical and categorical features and there are no null records

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 3/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [5]: 1 # checking nul values


2 df.isnull().sum().sum()

Out[5]: 0

In [8]: 1 # Checking duplicate records


2 df[df.duplicated()==True]

Out[8]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel

0 rows × 35 columns

We could see there are no duplicate records in the dataset

In [11]: 1 # Statistical analysis of the dataset


2 df.describe(include='all')

Out[11]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours

count 1470.000000 1470 1470 1470.000000 1470 1470.000000 1470.000000 1470 1470.0 1470.000000 ... 1470.000000 1470.0

unique NaN 2 3 NaN 3 NaN NaN 6 NaN NaN ... NaN NaN

Research &
top NaN No Travel_Rarely NaN NaN NaN Life Sciences NaN NaN ... NaN NaN
Development

freq NaN 1233 1043 NaN 961 NaN NaN 606 NaN NaN ... NaN NaN

mean 36.923810 NaN NaN 802.485714 NaN 9.192517 2.912925 NaN 1.0 1024.865306 ... 2.712245 80.0

std 9.135373 NaN NaN 403.509100 NaN 8.106864 1.024165 NaN 0.0 602.024335 ... 1.081209 0.0

min 18.000000 NaN NaN 102.000000 NaN 1.000000 1.000000 NaN 1.0 1.000000 ... 1.000000 80.0

25% 30.000000 NaN NaN 465.000000 NaN 2.000000 2.000000 NaN 1.0 491.250000 ... 2.000000 80.0

50% 36.000000 NaN NaN 802.000000 NaN 7.000000 3.000000 NaN 1.0 1020.500000 ... 3.000000 80.0

75% 43.000000 NaN NaN 1157.000000 NaN 14.000000 4.000000 NaN 1.0 1555.750000 ... 4.000000 80.0

max 60.000000 NaN NaN 1499.000000 NaN 29.000000 5.000000 NaN 1.0 2068.000000 ... 4.000000 80.0

11 rows × 35 columns

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 4/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [13]: 1 # correlation
2 df.corr()

Out[13]:
Age DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement JobLevel ... RelationshipSatisfa

Age 1.000000 0.010661 -0.001686 0.208034 NaN -0.010145 0.010146 0.024287 0.029820 0.509604 ... 0.05

DailyRate 0.010661 1.000000 -0.004985 -0.016806 NaN -0.050990 0.018355 0.023381 0.046135 0.002966 ... 0.00

DistanceFromHome -0.001686 -0.004985 1.000000 0.021042 NaN 0.032916 -0.016075 0.031131 0.008783 0.005303 ... 0.00

Education 0.208034 -0.016806 0.021042 1.000000 NaN 0.042070 -0.027128 0.016775 0.042438 0.101589 ... -0.00

EmployeeCount NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...

EmployeeNumber -0.010145 -0.050990 0.032916 0.042070 NaN 1.000000 0.017621 0.035179 -0.006888 -0.018519 ... -0.06

EnvironmentSatisfaction 0.010146 0.018355 -0.016075 -0.027128 NaN 0.017621 1.000000 -0.049857 -0.008278 0.001212 ... 0.00

HourlyRate 0.024287 0.023381 0.031131 0.016775 NaN 0.035179 -0.049857 1.000000 0.042861 -0.027853 ... 0.00

JobInvolvement 0.029820 0.046135 0.008783 0.042438 NaN -0.006888 -0.008278 0.042861 1.000000 -0.012630 ... 0.03

JobLevel 0.509604 0.002966 0.005303 0.101589 NaN -0.018519 0.001212 -0.027853 -0.012630 1.000000 ... 0.02

JobSatisfaction -0.004892 0.030571 -0.003669 -0.011296 NaN -0.046247 -0.006784 -0.071335 -0.021476 -0.001944 ... -0.01

MonthlyIncome 0.497855 0.007707 -0.017014 0.094961 NaN -0.014829 -0.006259 -0.015794 -0.015271 0.950300 ... 0.02

MonthlyRate 0.028051 -0.032182 0.027473 -0.026084 NaN 0.012648 0.037600 -0.015297 -0.016322 0.039563 ... -0.00

NumCompaniesWorked 0.299635 0.038153 -0.029251 0.126317 NaN -0.001251 0.012594 0.022157 0.015012 0.142501 ... 0.05

PercentSalaryHike 0.003634 0.022704 0.040235 -0.011111 NaN -0.012944 -0.031701 -0.009062 -0.017205 -0.034730 ... -0.04

PerformanceRating 0.001904 0.000473 0.027110 -0.024539 NaN -0.020359 -0.029548 -0.002172 -0.029071 -0.021222 ... -0.03

RelationshipSatisfaction 0.053535 0.007846 0.006557 -0.009118 NaN -0.069861 0.007665 0.001330 0.034297 0.021642 ... 1.00

StandardHours NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...

StockOptionLevel 0.037510 0.042143 0.044872 0.018422 NaN 0.062227 0.003432 0.050263 0.021523 0.013984 ... -0.04

TotalWorkingYears 0.680381 0.014515 0.004628 0.148280 NaN -0.014365 -0.002693 -0.002334 -0.005533 0.782208 ... 0.02

TrainingTimesLastYear -0.019621 0.002453 -0.036942 -0.025100 NaN 0.023603 -0.019359 -0.008548 -0.015338 -0.018191 ... 0.00

WorkLifeBalance -0.021490 -0.037848 -0.026556 0.009819 NaN 0.010309 0.027627 -0.004607 -0.014617 0.037818 ... 0.01

YearsAtCompany 0.311309 -0.034055 0.009508 0.069114 NaN -0.011240 0.001458 -0.019582 -0.021355 0.534739 ... 0.01

YearsInCurrentRole 0.212901 0.009932 0.018845 0.060236 NaN -0.008416 0.018007 -0.024106 0.008717 0.389447 ... -0.01

YearsSinceLastPromotion 0.216513 -0.033229 0.010029 0.054254 NaN -0.009019 0.016194 -0.026716 -0.024184 0.353885 ... 0.03

YearsWithCurrManager 0.202089 -0.026363 0.014406 0.069065 NaN -0.009197 -0.004999 -0.020123 0.025976 0.375281 ... -0.00

26 rows × 26 columns

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 5/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [14]: 1 plt.subplots(figsize=(20,15))
2 sns.heatmap(df.corr())

Out[14]: <AxesSubplot: >

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 6/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 7/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 8/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

Numerical and Categorical features

In [15]: 1 num_col=[]
2 cat_col=[]
3 for i in df.columns:
4 if(df[i].dtype=='O'):
5 cat_col.append(i)
6 else:
7 num_col.append(i)

In [16]: 1 # Numerical feature


2 num_col

Out[16]: ['Age',
'DailyRate',
'DistanceFromHome',
'Education',
'EmployeeCount',
'EmployeeNumber',
'EnvironmentSatisfaction',
'HourlyRate',
'JobInvolvement',
'JobLevel',
'JobSatisfaction',
'MonthlyIncome',
'MonthlyRate',
'NumCompaniesWorked',
'PercentSalaryHike',
'PerformanceRating',
'RelationshipSatisfaction',
'StandardHours',
'StockOptionLevel',
'TotalWorkingYears',
'TrainingTimesLastYear',
'WorkLifeBalance',
'YearsAtCompany',
'YearsInCurrentRole',
'YearsSinceLastPromotion',
'YearsWithCurrManager']

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 9/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [17]: 1 # Categorical Features


2 cat_col

Out[17]: ['Attrition',
'BusinessTravel',
'Department',
'EducationField',
'Gender',
'JobRole',
'MaritalStatus',
'Over18',
'OverTime']

Distribution of Data

In [18]: 1 df['Attrition'].value_counts()

Out[18]: No 1233
Yes 237
Name: Attrition, dtype: int64

We could observe that 237 employees out off 1470 employees wanted to resign company

In [19]: 1 df['Gender'].value_counts()

Out[19]: Male 882


Female 588
Name: Gender, dtype: int64

In [30]: 1 df1=df.groupby(['Attrition', 'Gender']).size().reset_index().rename(columns={0:'Count'})

In [31]: 1 df1

Out[31]:
Attrition Gender Count

0 No Female 501

1 No Male 732

2 Yes Female 87

3 Yes Male 150

1 We could see majorly Mens wanted to change the company compared to womens

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 10/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [37]: 1 df2=df.groupby(['Attrition', 'MaritalStatus']).size().reset_index().rename(columns={0:'Count'})

In [33]: 1 df2

Out[33]:
Attrition MaritalStatus Count

0 No Divorced 294

1 No Married 589

2 No Single 350

3 Yes Divorced 33

4 Yes Married 84

5 Yes Single 120

We could see majorly Bachelor's are looking to change the company compared to Married and Divorced employees

In [41]: 1 df3=df.groupby(['Attrition', 'JobRole']).size().reset_index().rename(columns={0:'Count'})

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 11/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [36]: 1 df3

Out[36]:
Attrition JobRole Count

0 No Healthcare Representative 122

1 No Human Resources 40

2 No Laboratory Technician 197

3 No Manager 97

4 No Manufacturing Director 135

5 No Research Director 78

6 No Research Scientist 245

7 No Sales Executive 269

8 No Sales Representative 50

9 Yes Healthcare Representative 9

10 Yes Human Resources 12

11 Yes Laboratory Technician 62

12 Yes Manager 5

13 Yes Manufacturing Director 10

14 Yes Research Director 2

15 Yes Research Scientist 47

16 Yes Sales Executive 57

17 Yes Sales Representative 33

Significantly Laboratory Technicians , Research Scientists , Sales Executives and Sales Representatives are looking for job chnage

In [39]: 1 df4=df.groupby(['Attrition', 'BusinessTravel']).size().reset_index().rename(columns={0:'Count'})

In [40]: 1 df4

Out[40]:
Attrition BusinessTravel Count

0 No Non-Travel 138

1 No Travel_Frequently 208

2 No Travel_Rarely 887

3 Yes Non-Travel 12

4 Yes Travel_Frequently 69

5 Yes Travel_Rarely 156

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 12/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

Those who did travel they are significantly looking to change the company

In [42]: 1 df5=df.groupby(['Attrition', 'Department']).size().reset_index().rename(columns={0:'Count'})

In [43]: 1 df5

Out[43]:
Attrition Department Count

0 No Human Resources 51

1 No Research & Development 828

2 No Sales 354

3 Yes Human Resources 12

4 Yes Research & Development 133

5 Yes Sales 92

Majorly Research & Development and sales departments employees are looking to change the company

In [44]: 1 # copy the data


2 df_copy=df.copy()

Label Encoding of Categorical Features

In [45]: 1 from sklearn.preprocessing import LabelEncoder


2 le=LabelEncoder()
3 for i in cat_col:
4 df_copy[i]=le.fit_transform(df_copy[i])

In [46]: 1 df_copy.head()

Out[46]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLeve

0 41 1 2 1102 2 1 2 1 1 1 ... 1 80 0

1 49 0 1 279 1 8 1 1 1 2 ... 4 80 1

2 37 1 2 1373 1 2 2 4 1 4 ... 2 80 0

3 33 0 1 1392 1 3 4 1 1 5 ... 3 80 0

4 27 0 2 591 1 2 1 3 1 7 ... 4 80 1

5 rows × 35 columns

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 13/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [47]: 1 X=df_copy.drop(['Attrition', 'EmployeeCount','EmployeeNumber'],axis=1)


2 y=df_copy['Attrition']

In [48]: 1 X.head()

Out[48]:
Age BusinessTravel DailyRate Department DistanceFromHome Education EducationField EnvironmentSatisfaction Gender HourlyRate ... RelationshipSatisfaction StandardHours StockOptionLev

0 41 2 1102 2 1 2 1 2 0 94 ... 1 80

1 49 1 279 1 8 1 1 3 1 61 ... 4 80

2 37 2 1373 1 2 2 4 4 1 92 ... 2 80

3 33 1 1392 1 3 4 1 4 0 56 ... 3 80

4 27 2 591 1 2 1 3 1 1 40 ... 4 80

5 rows × 32 columns

In [49]: 1 y.head()

Out[49]: 0 1
1 0
2 1
3 0
4 0
Name: Attrition, dtype: int32

In [50]: 1 X.shape,y.shape

Out[50]: ((1470, 32), (1470,))

Train Test Split

In [53]: 1 from sklearn.model_selection import train_test_split


2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)

In [54]: 1 X_train.shape,y_train.shape,X_test.shape,y_test.shape

Out[54]: ((1102, 32), (1102,), (368, 32), (368,))

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 14/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

Training the Model


In [55]: 1 from sklearn.ensemble import RandomForestClassifier
2 Rf_model=RandomForestClassifier()

In [56]: 1 Rf_model.fit(X_train,y_train)

Out[56]: RandomForestClassifier()

Training Accuracy

In [57]: 1 Rf_model.score(X_train,y_train)

Out[57]: 1.0

Testing Accuracy

In [58]: 1 y_pred_rf=Rf_model.predict(X_test)

In [59]: 1 from sklearn.metrics import accuracy_score

In [60]: 1 accuracy_score(y_test,y_pred_rf)

Out[60]: 0.8586956521739131

Hyperparameter Tuning

In [61]: 1 # we are tuning three hyperparameters right now, we are passing the different values for both parameters
2 grid_param = {
3 "n_estimators" : [90,100,115,130],
4 'criterion': ['gini', 'entropy'],
5 'max_depth' : range(2,20,1),
6 'min_samples_leaf' : range(1,10,1),
7 'min_samples_split': range(2,10,1),
8 'max_features' : ['auto','log2']
9 }

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 15/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [62]: 1 from sklearn.model_selection import GridSearchCV


2 grid_searh=GridSearchCV(estimator=Rf_model,param_grid=grid_param,cv=3,verbose=2,n_jobs=-1)

In [63]: 1 grid_searh.fit(X_train,y_train)

Fitting 3 folds for each of 20736 candidates, totalling 62208 fits

Out[63]: GridSearchCV(cv=3, estimator=RandomForestClassifier(), n_jobs=-1,


param_grid={'criterion': ['gini', 'entropy'],
'max_depth': range(2, 20),
'max_features': ['auto', 'log2'],
'min_samples_leaf': range(1, 10),
'min_samples_split': range(2, 10),
'n_estimators': [90, 100, 115, 130]},
verbose=2)

In [64]: 1 grid_searh.best_params_

Out[64]: {'criterion': 'gini',


'max_depth': 15,
'max_features': 'log2',
'min_samples_leaf': 2,
'min_samples_split': 7,
'n_estimators': 100}

In [65]: 1 Rf_model_with_best_params=RandomForestClassifier(criterion='gini',max_depth= 15,max_features= 'log2',min_samples_leaf= 2,min_samples_split= 7,n_estimators=100

In [66]: 1 Rf_model_with_best_params.fit(X_train,y_train)

Out[66]: RandomForestClassifier(max_depth=15, max_features='log2', min_samples_leaf=2,


min_samples_split=7)

In [67]: 1 y_predict_rf_bp=Rf_model_with_best_params.predict(X_test)

In [68]: 1 accuracy_score(y_test,y_predict_rf_bp)

Out[68]: 0.8478260869565217

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 16/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

Bagging Classifier
In [72]: 1 from sklearn.svm import SVC
2 from sklearn.ensemble import BaggingClassifier
3 from sklearn.datasets import make_classification
4 ​
5 model_bagging_svc = BaggingClassifier(base_estimator=SVC(),n_estimators=50, random_state=0).fit(X_train,y_train)
6 ​

In [73]: 1 y_predict_bagging=model_bagging_svc.predict(X_test)

In [74]: 1 accuracy_score(y_test,y_predict_bagging)

Out[74]: 0.842391304347826

Decision Tree Classifier


In [75]: 1 from sklearn.tree import DecisionTreeClassifier
2 model=DecisionTreeClassifier()

In [76]: 1 model.fit(X_train,y_train)

Out[76]: DecisionTreeClassifier()

In [77]: 1 model.score(X_train,y_train)

Out[77]: 1.0

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 17/18


2/22/23, 7:35 PM RandomForestClassifier - Jupyter Notebook

In [78]: 1 from sklearn import tree


2 import matplotlib.pyplot as plt
3 fig=plt.figure(figsize=(25,15))
4 tree.plot_tree(model,filled=True)

In [79]: 1 model.get_depth()

Out[79]: 15

In [80]: 1 y_predict=model.predict(X_test)

In [81]: 1 from sklearn.metrics import accuracy_score

In [82]: 1 accuracy_score(y_test,y_predict)

Out[82]: 0.8070652173913043

In this case we could see Ensemble classifier performing better than normal Decision Tree classifier

In [ ]: 1 ​

localhost:8888/notebooks/Downloads/Full Stack Data Science/Machine Learning/Supervised Learning/Random Forest/RandomForestClassifier.ipynb# 18/18

You might also like