You are on page 1of 3

© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.

org (E-ISSN 2348-1269, P- ISSN 2349-5138)

People Leaving the Job – An Approach for Prediction


Using Machine Learning
Ishaan Ballal1, Shlok Kavathekar2, Shubham Janwe3, Pratik Shete4, Prof. Nivedita Bhirud5
1234(Students, Computer Engineering Department, Vishwakarma Institute of Information Technology, Pune)
5(Professor, Computer Engineering Department, Vishwakarma Institute of Information Technology, Pune)

Abstract: Organizations face huge costs resulting from (HRIS) of a global retailer is used to compare XGBoost against six
employee turnover. It may incur a high cost such as training historically used supervised classifiers and demonstrate its
expenses and the time it takes from when an employee starts to significantly higher accuracy for predicting employee turnover.
when they become a productive member. When a productive
employee quits the organization lost new product ideas, great Jeel Sukhadiya et al.[3] presented a detailed comparison and
project management, or customer relationships. analysis of different methodologies used for forecasting employee
So there is a need for the system that understands the key attrition rates. The methods such as Random Forest, Support
variables that influence the employee attrition rate using data Vector Machines (SVM), Gradient Boosted Classifier and Logistic
mining. Regression is used for predicting and classifying the attrition is a
This paper discusses different approaches used by the fictional dataset created by IBM data scientists.
researchers for the prediction of employee attrition rate in
addition to an abstract view of the system that we are going to S. Saranya and J. Sharmila Dev [4] predict the reason for employee
implement for Employee Attrition Rate Prediction using data attrition using three techniques like tree Learning Algorithm,
mining. Naïve’s Bayes Algorithm, and Logistic Algorithm.
Keywords—Data Mining, Employee Attrition Rate Prediction,
Identifying Reason, Employee Turnover. Yue Zhao et al.[5] demonstrated and assessed Supervised machine
learning methods for the prediction of employee turnover within an
I. INTRODUCTION organization. The experiments are performed on simulated human
resources datasets using a decision tree method; a random forest
method; a gradient boosting trees method; an extreme gradient
Information technology (IT) industry in India has been facing a boosting method; a logistic regression method; support vector
universal issue of high attrition in the past few years in HR machines; neural networks; linear discriminant analysis; a Naïve
Bayes method; and a K-nearest neighbour method.
Analytics [1][6], resulting in monetary and knowledge-based loses
to the companies. HR Analytics is the need of the company so that Dilip Singh Sisodia et al.[6] build a model that will predict
they can spend the money on the right employees rather than employee churn rate based on the HR analytics dataset obtained
spending on the wrong ones. With the help of HR Analytics, HR from the Kaggle website. The author generated a heat map to show
Managers can take numerous decisions on investment in the relation between attributes, and the correlation matrix. In the
employees to get excellent outcomes that benefit the stakeholders experimental part, the histogram is generated, which shows the
contrast between left employees vs. salary, department, satisfaction
and customers. Human Resource departments of the organizations
level, etc. For prediction purposes, five different machine learning
can predict the employee attrition [1][3][4] and the reason for an algorithms such as linear support vector machine, C 5.0 Decision
employee leaving an organization Tree classifier, Random Forest, k-nearest neighbor, and Naïve
Bayes classifier are used. This paper proposes the reasons which
Many factors lead to dissatisfaction in an employee, like long optimize employee attrition in any organization.
working hours, peer pressure, job location, job role, traveling time,
office space, amenities in the office, perks, and many more reasons Heng Zhang et al.[7] adopt the supervised learning technique for
could be a factor for employee attrition. employee turnover prediction. The approach from [8] helps human
resource personnel to understand psychological climate and
Every year a lot of companies hire several employees. The supports decisions in resolving employment turnover by selecting
companies invest time and money in training those employees, not desirable applicants who have a high probability of staying longer
just this but there are training programs within the companies for in an organization. Using the 12 dimensions of retention applied in
generating association rules and naive Bayes classifiers, the
their existing employees as well. These programs aim to increase
researchers develop a custom application that can support the
the effectiveness of their employees. organization's decision in relation to the hiring process.
Sometimes the employee may not have any problem in the Amir Mohammad Esmaieeli Sikaroudi, et al, [9] implemented
company but others may offer a better profile with a better pay knowledge discovery steps on real data of a manufacturing plant.
package. So, the employee may be willing to leave. So the HR They chew over many characteristics of employees such as age,
department needs to understand the employee satisfaction level [6]. technical skills, and work experience. They used to find out the
importance of data features is measured by the Pearson Chi-Square
II. LITERATURE SURVEY test.

John M. Kirimi and Christopher Moturi et al, [10] proposed


Rohit Punnoose et al.[2] explore the application of Extreme infrastructure for employee performance forecasting that enables
Gradient Boosting (XGBoost) [3][5] technique which is more the human resource professionals to refocus on human capability
robust for prediction of Employee turnover [2][8] because of its criteria and thereby enhance the performance appraisal process of
regularization formulation. The data from HR Information Systems its human capital.
IJRAR2001987 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 891
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
between the hyperplane and the support vectors is as large as
Ancheta et al.[12] make use of rule-based classification Data possible to reduce the error in classification. Support Vector
Mining technique to extract knowledge significant for predicting Machine.
training needs of newly-hired faculty members in order to devise
the necessary development programs. They used the Cross- III. PROPOSED SYSTEM
Industry Standard Process for Data Mining (CRISP-DM) is
discovering significant models needed for predictive analysis and We are developing a system that helps to predict employee
demonstrated the required professional training to prepare faculty attrition rate and understand the key variables that influence the
members to perform their tasks effectively employee attrition rate using data mining.

A. Data mining Technique Used for Prediction


of Employee Rate

a. Naïve Bayes Algorithm [4][5][6][8]

Naïve’s Bayes is a probability-based classification theorem which


is based on the Bayes Theorem. It can be used to predict the
outcome of an occurring event with independent conditions. Naïve
Bayes is an eager classifier and fast executing algorithm used for
real-time predictions and also has a higher success rate compared
to other algorithms.

b. Tree Learning Algorithm [4][5]

The decision tree [6] is a conventional algorithm used for


performing classifications based on the decisions made in one Figure 1: System Architecture
stage. This provides a tree-structured representation of the decision
sets. This classification based on the decision tree let us predict the
qualitative response without creating the dummy variables also the Here we are using random forest algorithm to build a prediction
class proportion among the training region which falls into a model for identifying the various reason for employee turnover.
particular region can be predicted using this algorithm. The data mining algorithm is performing well in predicting the
employees; those are likely to quit the respective organization
c. Logistic Regression [4] based on their working details and environments.

IV. CONCLUSION
This is a regression analysis that is used to set a statistical process
for estimating the relationship between the dependent variable and
The growth level, as well as market perception, mainly depends on
one or more independent variables. We can analyze how the
dependent variable is affected when one of the independent the strength of the employees. Now a day due to increased
variables is changed and by fixing the other independent variables. population and people with high capacity makes great success for
This technique is used to predict the qualitative response. any firm. But the prime issues which are normally addressed in any
organization are only the attrition.
d. Extreme Gradient Boosting (XGB)[3][5]
This paper finds a different data mining algorithm used for
Extreme Gradient Boosting is a tree-based method that was
performing well in predicting the employees, those are likely to
introduced in 2014 by Chen. It is also commonly referred to as
XGBoost. It is a scalable and accurate implementation of gradient quit the respective organization. The paper contains the survey of
boosted trees, explicitly designed for optimizing the computational various classification algorithms like logistic regression, LDA,
speed and model performance. Compared to gradient boosting, SVM, KNN, Random Forests to predict the probability of attrition
XGBoost utilizes a regularization term to reduce the overfitting of any new employee.
effect, yielding a better prediction and much faster computational
run times.

e. Random Forest [6]

Random Forest, it is one of the ensemble learning technique which


consists of several decision trees rather than a single decision tree
for classification. While classifying all the trees in the random
forest gives a class to an unknown example and the class having
maximum votes will be assigned to the unknown example.

f. Support Vector Machine [3][5]

A Support Vector Machine is a kind of classification technique,


where the data points are separated by a line in case of linear SVM,
and a hyperplane in case of non-linear SVM. The separation is
chosen in such a way that; the two sides of the hyperplane
categorize the data set into two classes. When an unknown data
comes it predicts which side/class it belongs to. The margin
IJRAR2001987 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 892
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
REFERENCES

[1] Kashyap Bhuva1*, Kriti Srivastava2,"Comparative


Study of the Machine Learning Techniques for
Predicting the Employee Attrition",2018
IJRARJuly2018, Volume 5, Issue 3.
[2] Rohit Punnoose, PhD candidate,Pankaj Ajit,
"Prediction of Employee Turnover in
Organizationsusing Machine Learning
Algorithms",(IJARAI) International Journal of
Advanced Research in Artificial Intelligence,Vol. 5,
No.9, 2016.
[3] Jeel Sukhadiya, 1Harshal Kapadia, 2Prof. Mitchell
D’silva "Employee Attrition Prediction using Data
Mining Techniques",International Journal of
Management, Technology And Engineering Volume 8,
Issue X, OCTOBER/2018
[4] S. Saranya *, J. Sharmila Dev,"PREDICTING
EMPLOYEE ATTRITION USING MACHINE
LEARNING ALGORITHMS AND ANALYZING
REASONS FOR ATTRITION", International Journal
of Advanced Engineering Research and Technology
(IJAERT) Volume 6 Issue 9, September 2018, ISSN
No.: 2348 – 8190
[5] Yue Zhao1(&), Maciej K. Hryniewicki2, Francesca
Cheng,"Employee Turnover Prediction with
MachineLearning: A Reliable Approach",Springer
Nature Switzerland AG 2019.
[6] Dilip Singh Sisodia ; Somdutta Vishwakarma ;
Abinash Pujahari ,” Evaluation of machine learning
models for employee churn prediction ”, 2017
International Conference on Inventive Computing and
Informatics (ICICI).
[7] Heng Zhang ; Lexi Xu ; Xinzhou Cheng ; Kun Chao ;
Xueqing Zhao ,“Analysis and Prediction of Employee
Turnover Characteristics based on Machine Learning ”,
2018 18th International Symposium on
Communications and Information Technologies
(ISCIT)
[8] Eduardo B. Santiago; Glenn Paul P. Gara, “A Model
Based Prediction of Desirable Applicants through
Employee’s Perception of Retention and Performance”,
2018 IEEE 10th International Conference on
Humanoid, Nanotechnology, Information Technology,
Communication and Control, Environment and
Management (HNICEM).
[9] Amir Mohammad Esmaieeli Sikaroudi ,
Rouzbehghousi and Ali Esmaieelisikaroudi,
2015 “A Data Mining Approach To
Employee Turnover Prediction” (Case Study:
Arak Automotive Parts Manufacturing),
Journal Of Industrial And Systems
Engineering Volume. 8, No. 4.
[10] JohnM.Kirimi, ChristopherA.
Moturi ,“Application of Data Mining
Classification in Employee Performance
Prediction”, International Journal of
Computer Applications (0975 –8887)Volume
146 –No.7, July 2016.
[11] Ancheta, R.A, Cabauatan, R.J.M., Lorena,
B.T.T., Rabago, W., Predicting faculty
development trainings and performance using
rule-based classification algorithm, Asian

IJRAR2001987 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 893

You might also like