MaWinPaPaMayPhyoAung - First Seminar

Prediction of Employee Attrition using
Bayes Risk Post-Pruning in Decision Tree
Supervised by : Dr. Nilar Aye

Presented by : Ma Win Pa Pa May Phyo Aung
Roll No : 6CS - 86, SE - 14
Seminar : First Seminar
Batch : 26th batch
Date : 21 - 7 - 2022
1
Outlines
 Abstract
 Introduction
 Objectives
 Related Works
 Background Theory
 System Block Diagram
 System flow chart
 Experiment
 Model Performance Evaluation
 Conclusion
 Time schedule
2
 References
Abstract
 Attrition is widely understood to be one of the major problems affecting
organizations today.
 Losing employees has a many direct and indirect impacts across a company.
 It occurs when an employee leaves and isn’t replaced at all or for a significant
amount of time, resulting in a reduction of the workforce.
 In this system, Decision Tree classifier is used to predict an employee is likely to
quit.
 And Bayes Risk Post-Pruning (PBMR) technique is applied to reduce the condition
of overfitting on decision tree.
 The proposed system performance was evaluated various evaluation standards such
as precision, sensitivity and F1 score values based on IBM Human Resource
Analytic Employee Attrition and Performance dataset from Kaggle site.
3
Introduction
 The employee attrition statistics have improved in the recent years. One of
the major problems that every company face irrespective of the work sector is
the Employee Attrition.
 Proper strategies and ideas are required to control the growing employee
attrition rate.
 In proposed system, Decision Tree classification model is applied to predict
employee attrition. …conclusion
 Bayes Risk Post-Pruning method is applied to overcome the issue of
overfitting during the modeling phrase. ..abstract
 IBM Human Resource Analytic Employee Attrition and Performance dataset
from Kaggle site is used to generate the prediction model and evaluate the
system accuracy. Conclusion, abstract 4
Objectives
 To analyze employee attrition using Decision Tree classifier.

 To comprehend Bayes Risk Post-Pruning Algorithm.
 To reduce recruitments, hiring and training costs.
 To control the growing employee attrition rate.
 To maintain qualified and strong Human Resource processes.
5
Related Work (1)
Bayes Risk Post-pruning in Decision Tree to Overcome overfitting
Problem on Customer churn classification
Devina Christianti, Sarini Abdullah, Siti Nurrohmah, Conference paper. January 2020,
DOI: 10.4108/eai.2-8-2019.2290487, ICSA 2019, August 02-03, Bogor, Indonesia
 This paper aims to avoid overfitting problem using Bayes Risk Post-
pruning method.
 This paper proposed Bayes Risk post-pruning can improve decision tree
performance and higher accuracy.
 In this system, the researchers applied two datasets of customer churn
classification form Kaggle site and IBM datasets.
6
Related Work (2)
A novel decision tree classification based on

post-pruning with Bayes minimum risk
Ahmed Mohamed Ahmed, Ahmet Rizaner, Ali Hakan Ulusoy, April 4,2018, PLoS ONE
13(4): e0194168
 This paper aims a post-pruning method that considers various evaluation

standards.
 This paper shows that the proposed method produces better classification
accuracy than Reduced-error pruning (REP) and Minimum-error pruning
(MEP).
 This system uses five different datasets, Zoo, Iris, Diabetes, Labor and
Blogger.
7
Related Work (3)
Machine Learning for predicting Employee Attrition
Norsuhada Mansor, Nor Samsiah Sani and Mohd Aliff, (IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 12, No. 11, 2021
 This paper aims the use of machine learning classification models to

predict employee attrition.
 This paper proposes comparison of the performance machine learning
techniques, Decision Tree classifier, Support Vector Machines classifier,
and Artificial Neural Networks classifier.
 In this study, the authors use IBM Human Resource Analytic Employee
Attrition and Performance dataset.
 And they applied parameter tuning and regularization techniques for
optimization purposes. 8
Background Theory
 Decision Tree is a supervised Machine Learning Algorithm, used to build

classification and regression models in the form of a tree structure.
 ID3 algorithm is a classification algorithm for building a decision tree by
selecting a best attribute that yields maximum Information Gain (IG) or
minimum Entropy (H).
 Bayes minimum risk classifier is a decision model based on quantifying
trade-offs between various decisions using probabilities and costs that
accompany such decisions.
 It is very important because in its application costs are corresponding to
misclassification error.
9
Structure of Bayes Risk Post-Pruning Algorithm
The risk corresponding to the cost of classifying the data with attributes x into class Ci, where
the correct class is Cj, j=1,2,…m . Conditional risk is defined when classifying x into the class
Ci as in equation (1).
(1)
where; 𝜆𝑗,𝑖 = cost of classifying the data into class 𝐶i ,
𝐶𝑗 = true class
𝑃𝑟(𝐶𝑗 |𝒙) = probability of a subject with attribute 𝒙 predicted in class 𝐶𝑗
Pr⁡(𝐶𝑗 |𝒙) is calculated using Bayes’ Theorem given in equation (2).

Pr⁡(𝐶𝑗 |𝒙) = = = (2)
One special case of risk matrix is zero-one-loss which has the same cost when misclassifying
(classifying a subject with the i class as a j class or vice versa) as in equation (3). 10
(3)
Cont’d
A post-pruning algorithm is run from the bottom (leaf node) – up (root node) by evaluating risk
each subtree based on Bayes Risk. Based on zero-one loss risk associated with each parent
node t is shown in equation (4).
(4)
where: 𝑅𝑡𝑖 (𝒙) = risk associated with node 𝑡 when classifying subject with attribute 𝒙
into class 𝐶i
𝑃𝑟(𝐶𝑗 |𝒙) = probability of a subject with attribute 𝒙 predicted in class 𝐶𝑗
The risk associated with the leaf node of its parent node 𝑡 is shown in equation (5).
(5)
where: 𝑅l𝑖 (𝒙) = risk associated with leaf node 𝑙 when classifying subject with attribute
𝒙 into class 𝐶i
𝑡𝑙 = total leaf nodes in the subtree 11
System Block Diagram
ID 3 Decision Tree Bayes Risk

Dataset Pruned Tree
(Classification Rules) Post-Pruning
12
System Flow chart
Start Input Data IBM
Dataset
Data Data
Training Testing
Decision Tree
Bayes Risk Fitting Evaluate

Model
Post-pruning Decision Tree Model performance
Display
performance
13
End
 Stage 1: Set input data from IBM Human Resource Analytic Employee
Attrition and Performance dataset. The dataset is
available from Kaggle Dataset Repository.
 Stage 2: The next step is preprocessing part.
 Stage 3: The input data are divided into two parts as Data Training and Data
Testing.
 Stage 4: ID3 Algorithm is applied to build decision tree on the training data.
 Stage 5: After getting the decision tree, some branches of the decision tree may
contain noise or outliers. So, the system uses Bayes Risk Post-Pruning
method to remove unnecessary branches or nodes.
 Stage 6: Then gets fitting decision tree.
 Stage 7: Testing dataset is used for evaluating model. After generating the
prediction model, the system will evaluate the model
performance.
 Stage 8: Finally, the system will display the model performance. 14
Experiment
 The proposed system uses IBM Human Resource Analytic Employee Attrition
and Performance dataset from Kaggle Dataset Repository.
 The dataset includes four major components: employee satisfaction, income,
seniority and demographics data.
 The dataset contains 1470 instances and 35 attributes.
 The identified class is labeled as ‘Attrition’ with 237 instances of ‘Yes’ and
1233 instances of ‘No’.
 The dataset is spilt into two parts, named training dataset to build decision tree
model and testing dataset to measure the model performance.
 Dataset Details
15
Model Performance Evaluation
 Model evaluation is an important step in the creation of a predictive model. It
aids in the discovery of the best model that fits the data you have.
 Model performance is measured by calculating the value of precision,
sensitivity and F1 score.
 The confusion matrix can be used to analyze the potential of a classifier. A
confusion matrix generates actual values and predicted values after the
classification process. The confusion matrix table as in Table 1.
Table 1. Confusion matrix
Predicted class
Actual class
Positive Negative
Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN) 16

Cont’d
 where,
Precision also known as positive predicted values is obtained by calculated the true
predicted positive observations divided by total predicted positive observations.
Sensitivity, recall is the fraction of true predicted positive observations and total
actual positive observation.
F1 score, F score or F measure is the harmonic mean of precision and sensitivity.
17
Conclusion
 Attrition is an inevitable part of any business. Attrition can involve the loss of
employees or the loss of customers.
 Employee Attrition is important because it can decrease labor costs without
incorporating staff departures.
 In this system, Decision Tree classifier to predict whether an employee would leave
the company or not given an employee attrition dataset.
 The proposed model is operated by using the IBM Human Resource Analytic
Employee Attrition and Performance dataset.
 This study uses Bayes Risk Post-Pruning to reduce the size of a decision tree to
overcome overfitting problems.
 The system compares the model performance after Bayes Risk Post-Pruning was
applied and the model performance in testing dataset.
 I think using Bayes Risk Post-Pruning method can improve decision tree 18
performance with higher precision, sensitivity and F1 score of the model.
Time Schedule
Year 2022
Month Feb July August September November
Pre-Seminar
1st Seminar
2nd Seminar
3rd Seminar
Defense
19
References
[1] Devina Christianti, Sarini Abdullah, Siti Nurrohmah, “Bayes Risk post-
pruning in Decision tree to overcome overfitting problem on Customer churn
classification”, Conference paper. January 2020, DOI: 10.4108/eai.2-8-
2019.2290487, ICSA 2019, August 02-03, Bogor, Indonesia.
[2] Ahmed Mohamed Ahmed, Ahmet Rizaner, Ali Hakan Ulusoy, “A novel
decision tree classification based on post-pruning with Bayes minimum risk”,
April 4,2018, PLoS ONE 13(4): e0194168.
[3] Norsuhada Mansor, Nor Samsiah Sani and Mohd Aliff, “Machine Learning
for predicting Employee Attrition”, (IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 12, No. 11, 2021.
[4] L. Alaskar, M. Crane and M. Alduailij, “Employee Turnover Prediction
Using Machine Learning,” In International Conference on Computing, pp. 301-316,
2020.
[5] F. Ozdemir, M. Coskun, C. Gezer and V.C Gungor, “Assessing Employee
Attrition Using Classifications Algorithms,” In Proceedings of the 2020 the
4th International Conference on Information System and Data Mining, pp. 118- 20
122, May 2020.
Thank You!
21

MaWinPaPaMayPhyoAung - First Seminar

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MaWinPaPaMayPhyoAung - First Seminar

Uploaded by

Copyright:

Available Formats

Prediction of Employee Attrition using

Bayes Risk Post-Pruning in Decision Tree

Supervised by : Dr. Nilar Aye

 To analyze employee attrition using Decision Tree classifier.

A novel decision tree classification based on

 This paper aims a post-pruning method that considers various evaluation

 This paper aims the use of machine learning classification models to

 Decision Tree is a supervised Machine Learning Algorithm, used to build

Pr⁡(𝐶𝑗 |𝒙) is calculated using Bayes’ Theorem given in equation (2).

ID 3 Decision Tree Bayes Risk

Bayes Risk Fitting Evaluate

Positive True Positive (TP) False Negative (FN)

Negative False Positive (FP) True Negative (TN) 16

F1 score, F score or F measure is the harmonic mean of precision and sensitivity.

You might also like