Professional Documents
Culture Documents
girishchadha.gc@gmail.com
JV65UCK2AH
Interview Candidate Attendance
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Problem Statement
Recruitment agency holds a key responsibility of enrolling right people
with required skill levels and competitive compensation into the client
organization.
Candidate profiles are screened for the job opportunity and shortlisted
candidates are called for interview. It is observed that, not all who are
girishchadha.gc@gmail.com
JV65UCK2AH
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
About Data
The data pertains to the recruitment industry in India for the years 2014-
2016 and deals with candidate interview attendance for various clients.
There are a set of questions that are asked by a recruiter while scheduling
the candidate. The answers to these determine whether expected
girishchadha.gc@gmail.com
JV65UCK2AH
attendance is yes, no or uncertain.
Data Dictionary
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Observed.Attendance (response field)
Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
70% of the candidates have turned up for
the interview. Data is not balanced but
sufficient to proceed with analysis and
building prediction model.
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Gender (categorical)
Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
22% of the candidates are Female
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Location (categorical)
Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
Maximum candidates are from Chennai
and Bangalore location
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Skillset (categorical)
Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
Maximum job opportunity is
for
Java/J2EE/Struts/Hibernate
skillset. There seems
# to be good demand for
Freshers as well
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Bivariate
Expected Attendance Field Name:
Observed Attendance &
Expected Attendance
girishchadha.gc@gmail.com
JV65UCK2AH
Observation:
45-50% of the ‘Uncertain’
candidates have turned up for
the interview
25% of the expected candidates
did not come for the interview
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Bivariate
Position to be Closed Field Name:
Observed Attendance &
Position to be Closed
girishchadha.gc@gmail.com
JV65UCK2AH
Observation:
80% of the candidates from
Niche skills have come over for
the interview
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Importance
Filtering Method of Feature
Selection
• The technique of extracting a
subset of relevant features is called
feature selection.
• Feature selection can enhance the
girishchadha.gc@gmail.com
JV65UCK2AH
interpretability of the model,
speed up the learning process and
improve the learner performance.
• Filter methods assign an
importance value to each feature.
• Based on these values the features
can be ranked and a feature subset
can be selected.
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Decision Tree
Confusion Matrix
Recruitment Agency is
21 Features Reference
interested in right
Prediction No Yes prediction of ‘Yes’ i.e.
677 Observations
girishchadha.gc@gmail.com
JV65UCK2AH
No 10 1 how many candidates
minsplit=14 will come for the
Decision Tree Yes 77 200 interview Sensitivity
minbucket=20 Model
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Random Forest
Confusion Matrix
Recruitment Agency is
21 Features Reference
interested in right
Prediction No Yes prediction of ‘Yes’ i.e.
677 Observations
girishchadha.gc@gmail.com
JV65UCK2AH
No 21 8 how many candidates
ntree=50 will come for the
Random Forest Yes 66 193 interview Sensitivity
mtry=9 Model
Accuracy Sensitivity Specificity
nodesize = 50
0.7431 0.9602 0.2414
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
XG Boost
Confusion Matrix
Recruitment Agency is
Reference
interested in right
Dummies of 21
girishchadha.gc@gmail.com
Prediction No Yes prediction of ‘Yes’ i.e.
JV65UCK2AH
Features No 17 15 how many candidates
677 Observations will come for the
Yes 70 186 interview Sensitivity
XG Boost
Booster parameters
eta = 0.119
Accuracy Sensitivity Specificity
lambda=0.563
max_depth=19 0.7049 0.9254 0.1954
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Next Steps based on Insights
• ‘Expected Attendance’ that has been derived by the recruitment team after talking to candidates
is the most important feature providing maximum information for the prediction
• Considering ‘Accuracy’ and ‘Sensitivity’ as the 2 metrics for evaluation we see that the Random
Forest model is able to give us better results and hence is the chosen model
Metric Decision Tree Random Forest XG Boost
girishchadha.gc@gmail.com
JV65UCK2AH
Accuracy 72.92% 74.31% 70.49%
Sensitivity 99.50% 96.02% 92.54%
• Some of the critical factors like age, expected compensation or expected increment, number of
dependents, notice period etc. that determine candidates willingness to take up a new job have
not be given in the data. These additional features, if included, will surely improve the accuracy of
the prediction model.
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH