Professional Documents
Culture Documents
3
Introduction
The employee attrition statistics have improved in the recent years. One of
the major problems that every company face irrespective of the work sector is
the Employee Attrition.
Proper strategies and ideas are required to control the growing employee
attrition rate.
In proposed system, Decision Tree classification model is applied to predict
employee attrition. …conclusion
Bayes Risk Post-Pruning method is applied to overcome the issue of
overfitting during the modeling phrase. ..abstract
IBM Human Resource Analytic Employee Attrition and Performance dataset
from Kaggle site is used to generate the prediction model and evaluate the
system accuracy. Conclusion, abstract 4
Objectives
5
Related Work (1)
Bayes Risk Post-pruning in Decision Tree to Overcome overfitting
Problem on Customer churn classification
Devina Christianti, Sarini Abdullah, Siti Nurrohmah, Conference paper. January 2020,
DOI: 10.4108/eai.2-8-2019.2290487, ICSA 2019, August 02-03, Bogor, Indonesia
This paper aims to avoid overfitting problem using Bayes Risk Post-
pruning method.
This paper proposed Bayes Risk post-pruning can improve decision tree
performance and higher accuracy.
In this system, the researchers applied two datasets of customer churn
classification form Kaggle site and IBM datasets.
6
Related Work (2)
9
Structure of Bayes Risk Post-Pruning Algorithm
The risk corresponding to the cost of classifying the data with attributes x into class Ci, where
the correct class is Cj, j=1,2,…m . Conditional risk is defined when classifying x into the class
Ci as in equation (1).
(1)
where; 𝜆𝑗,𝑖 = cost of classifying the data into class 𝐶i ,
𝐶𝑗 = true class
𝑃𝑟(𝐶𝑗 |𝒙) = probability of a subject with attribute 𝒙 predicted in class 𝐶𝑗
One special case of risk matrix is zero-one-loss which has the same cost when misclassifying
(classifying a subject with the i class as a j class or vice versa) as in equation (3). 10
(3)
Cont’d
A post-pruning algorithm is run from the bottom (leaf node) – up (root node) by evaluating risk
each subtree based on Bayes Risk. Based on zero-one loss risk associated with each parent
node t is shown in equation (4).
(4)
where: 𝑅𝑡𝑖 (𝒙) = risk associated with node 𝑡 when classifying subject with attribute 𝒙
into class 𝐶i
𝑃𝑟(𝐶𝑗 |𝒙) = probability of a subject with attribute 𝒙 predicted in class 𝐶𝑗
The risk associated with the leaf node of its parent node 𝑡 is shown in equation (5).
(5)
where: 𝑅l𝑖 (𝒙) = risk associated with leaf node 𝑙 when classifying subject with attribute
𝒙 into class 𝐶i
𝑡𝑙 = total leaf nodes in the subtree 11
System Block Diagram
12
System Flow chart
Start Input Data IBM
Dataset
Data Data
Training Testing
Decision Tree
Display
performance
13
End
Stage 1: Set input data from IBM Human Resource Analytic Employee
Attrition and Performance dataset. The dataset is
available from Kaggle Dataset Repository.
Stage 2: The next step is preprocessing part.
Stage 3: The input data are divided into two parts as Data Training and Data
Testing.
Stage 4: ID3 Algorithm is applied to build decision tree on the training data.
Stage 5: After getting the decision tree, some branches of the decision tree may
contain noise or outliers. So, the system uses Bayes Risk Post-Pruning
method to remove unnecessary branches or nodes.
Stage 6: Then gets fitting decision tree.
Stage 7: Testing dataset is used for evaluating model. After generating the
prediction model, the system will evaluate the model
performance.
Stage 8: Finally, the system will display the model performance. 14
Experiment
The proposed system uses IBM Human Resource Analytic Employee Attrition
and Performance dataset from Kaggle Dataset Repository.
The dataset includes four major components: employee satisfaction, income,
seniority and demographics data.
The dataset contains 1470 instances and 35 attributes.
The identified class is labeled as ‘Attrition’ with 237 instances of ‘Yes’ and
1233 instances of ‘No’.
The dataset is spilt into two parts, named training dataset to build decision tree
model and testing dataset to measure the model performance.
Dataset Details
15
Model Performance Evaluation
Model evaluation is an important step in the creation of a predictive model. It
aids in the discovery of the best model that fits the data you have.
Model performance is measured by calculating the value of precision,
sensitivity and F1 score.
The confusion matrix can be used to analyze the potential of a classifier. A
confusion matrix generates actual values and predicted values after the
classification process. The confusion matrix table as in Table 1.
Table 1. Confusion matrix
Predicted class
Actual class
Positive Negative
where,
Precision also known as positive predicted values is obtained by calculated the true
predicted positive observations divided by total predicted positive observations.
Sensitivity, recall is the fraction of true predicted positive observations and total
actual positive observation.
17
Conclusion
Attrition is an inevitable part of any business. Attrition can involve the loss of
employees or the loss of customers.
Employee Attrition is important because it can decrease labor costs without
incorporating staff departures.
In this system, Decision Tree classifier to predict whether an employee would leave
the company or not given an employee attrition dataset.
The proposed model is operated by using the IBM Human Resource Analytic
Employee Attrition and Performance dataset.
This study uses Bayes Risk Post-Pruning to reduce the size of a decision tree to
overcome overfitting problems.
The system compares the model performance after Bayes Risk Post-Pruning was
applied and the model performance in testing dataset.
I think using Bayes Risk Post-Pruning method can improve decision tree 18
performance with higher precision, sensitivity and F1 score of the model.
Time Schedule
Year 2022
Month Feb July August September November
Pre-Seminar
1st Seminar
2nd Seminar
3rd Seminar
Defense
19
References
[1] Devina Christianti, Sarini Abdullah, Siti Nurrohmah, “Bayes Risk post-
pruning in Decision tree to overcome overfitting problem on Customer churn
classification”, Conference paper. January 2020, DOI: 10.4108/eai.2-8-
2019.2290487, ICSA 2019, August 02-03, Bogor, Indonesia.
[2] Ahmed Mohamed Ahmed, Ahmet Rizaner, Ali Hakan Ulusoy, “A novel
decision tree classification based on post-pruning with Bayes minimum risk”,
April 4,2018, PLoS ONE 13(4): e0194168.
[3] Norsuhada Mansor, Nor Samsiah Sani and Mohd Aliff, “Machine Learning
for predicting Employee Attrition”, (IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 12, No. 11, 2021.
[4] L. Alaskar, M. Crane and M. Alduailij, “Employee Turnover Prediction
Using Machine Learning,” In International Conference on Computing, pp. 301-316,
2020.
[5] F. Ozdemir, M. Coskun, C. Gezer and V.C Gungor, “Assessing Employee
Attrition Using Classifications Algorithms,” In Proceedings of the 2020 the
4th International Conference on Information System and Data Mining, pp. 118- 20
122, May 2020.
Thank You!
21