You are on page 1of 19

Analytics Use Case

girishchadha.gc@gmail.com
JV65UCK2AH
Interview Candidate Attendance

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Problem Statement
Recruitment agency holds a key responsibility of enrolling right people
with required skill levels and competitive compensation into the client
organization.
Candidate profiles are screened for the job opportunity and shortlisted
candidates are called for interview. It is observed that, not all who are
girishchadha.gc@gmail.com
JV65UCK2AH

called, turn up for the interview.


This leads to rework of the entire process including rework on logistic
arrangements, increasing the cost and also causing delay in recruiting the
required professionals into the organization.
Recruitment agency may also loose the business on hand.
Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
About Data
The data pertains to the recruitment industry in India for the years 2014-
2016 and deals with candidate interview attendance for various clients.

There are a set of questions that are asked by a recruiter while scheduling
the candidate. The answers to these determine whether expected
girishchadha.gc@gmail.com
JV65UCK2AH
attendance is yes, no or uncertain.
Data Dictionary

965 observations with 23 features available in source data


Source: https://www.kaggle.com/vishnusraghavan/the-interview-attendance-problem

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Observed.Attendance (response field)

Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
70% of the candidates have turned up for
the interview. Data is not balanced but
sufficient to proceed with analysis and
building prediction model.

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Gender (categorical)

Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
22% of the candidates are Female

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Location (categorical)

Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
Maximum candidates are from Chennai
and Bangalore location

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Univariate
Field Name:
Skillset (categorical)

Observation:
girishchadha.gc@gmail.com
JV65UCK2AH
Maximum job opportunity is
for
Java/J2EE/Struts/Hibernate
skillset. There seems
# to be good demand for
Freshers as well

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Bivariate
Expected Attendance Field Name:
Observed Attendance &
Expected Attendance

girishchadha.gc@gmail.com
JV65UCK2AH
Observation:
45-50% of the ‘Uncertain’
candidates have turned up for
the interview
25% of the expected candidates
did not come for the interview

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Exploratory Data Analysis - Bivariate
Position to be Closed Field Name:
Observed Attendance &
Position to be Closed

girishchadha.gc@gmail.com
JV65UCK2AH
Observation:
80% of the candidates from
Niche skills have come over for
the interview

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Importance
Filtering Method of Feature
Selection
• The technique of extracting a
subset of relevant features is called
feature selection.
• Feature selection can enhance the
girishchadha.gc@gmail.com
JV65UCK2AH
interpretability of the model,
speed up the learning process and
improve the learner performance.
• Filter methods assign an
importance value to each feature.
• Based on these values the features
can be ranked and a feature subset
can be selected.

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Decision Tree

Confusion Matrix
Recruitment Agency is
21 Features Reference
interested in right
Prediction No Yes prediction of ‘Yes’ i.e.
677 Observations
girishchadha.gc@gmail.com
JV65UCK2AH
No 10 1 how many candidates
minsplit=14 will come for the
Decision Tree Yes 77 200 interview  Sensitivity
minbucket=20 Model

cp=0.0673 Accuracy Sensitivity Specificity


0.7292 0.9950 0.1149

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Random Forest

Confusion Matrix
Recruitment Agency is
21 Features Reference
interested in right
Prediction No Yes prediction of ‘Yes’ i.e.
677 Observations
girishchadha.gc@gmail.com
JV65UCK2AH
No 21 8 how many candidates
ntree=50 will come for the
Random Forest Yes 66 193 interview  Sensitivity
mtry=9 Model
Accuracy Sensitivity Specificity
nodesize = 50
0.7431 0.9602 0.2414

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
XG Boost

Confusion Matrix
Recruitment Agency is
Reference
interested in right
Dummies of 21
girishchadha.gc@gmail.com
Prediction No Yes prediction of ‘Yes’ i.e.
JV65UCK2AH
Features No 17 15 how many candidates
677 Observations will come for the
Yes 70 186 interview  Sensitivity
XG Boost
Booster parameters
eta = 0.119
Accuracy Sensitivity Specificity
lambda=0.563
max_depth=19 0.7049 0.9254 0.1954

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Next Steps based on Insights
• ‘Expected Attendance’ that has been derived by the recruitment team after talking to candidates
is the most important feature providing maximum information for the prediction
• Considering ‘Accuracy’ and ‘Sensitivity’ as the 2 metrics for evaluation we see that the Random
Forest model is able to give us better results and hence is the chosen model
Metric Decision Tree Random Forest XG Boost
girishchadha.gc@gmail.com
JV65UCK2AH
Accuracy 72.92% 74.31% 70.49%
Sensitivity 99.50% 96.02% 92.54%

• Some of the critical factors like age, expected compensation or expected increment, number of
dependents, notice period etc. that determine candidates willingness to take up a new job have
not be given in the data. These additional features, if included, will surely improve the accuracy of
the prediction model.

Proprietary content.
©Great Learning.
All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by girishchadha.gc@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
girishchadha.gc@gmail.com
JV65UCK2AH

This file is meant for personal use by girishchadha.gc@gmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.

You might also like