You are on page 1of 80

HR

Analytics
Vishal Verma
Do we need HR
Analytics
HR doesn’t get the respect it deserve
because it’s soft and we don’t have
measures that accounting, marketing and
other areas have.

The problem is not the measurement.

The paradox is even when HR


Measurements systems are implemented
organizations typically hits a wall (Boudreau
& Ramstad, 2006)
What is Analytics
Analytics is an integration between Statistic
Statistics, Technology and Business s
Context that assist data driven Insights From
Decision
decision making. Making

In uses advanced Statistical and


Machine Learning algorithms to
derive insights from raw data. Business Technolo
Context gy
Problem/ Computational
Business Intensity
Scenarios
The basic Framework
John W. Boudreau & Peter M. Ramstad
Center for Effective Organizations University of Southern California
Marshall School of Business
The basic Framework
HCM: 21 (human capital management for the twenty-first century) - Jac Fitz- Enz
2010
The HCM: 21 model consists in four phases (Fitz-Enz J., 2010):
1. Scanning – the assessment of all the internal factors that might have an influence on
human, structural and relational capital;
2. Planning – the creation of a system that provides an alternative to the structured
system by relying on sustainable human capability rather than on just filling positions;
3. Producing – HR are view as processes with inputs and outputs and statistical analysis is
used in order to reveal the most suitable combination of inputs that drive the desired
outputs;
4. Predicting – the system consists in analysing strategic, operational and leading
indicators.
Before We Start Component of Analytical
Capability
Increasing the strategic impact of analytics requires the Strategy
development of increasing analytical capabilities in your Degree to which Analytics is
organization. integrated to Strategy
Development, Decision-making and
These must comprise strategy, people, process, technology Execution
and data, and are best deployed within an over-arching People
analytical strategy an capability development plan. The extent to which there is a
critical mass of personnel
recruited, trained and incentivized
to apply analytic techniques
Process
The level to which Analytics and
analytic approach are embedded in
core business process
Technology
The sophistication and proliferation
of Analytic tools and technology
Data
The richness, availability, quantity
and governance of data across
business functions
What is HR Analytics
HR Analytics is the systematic identification and quantification of People drivers of Business
outcomes.
Heuvel & Bondarouk, 2016

People analytics is the application of math, statistics and modeling to worker-related data to
see and predict patterns. In particular, people analytics, also known as HR analytics and
talent analytics, is analysis used to make better decisions about all aspects of HR strategy
with the goal of improving business performance.
https://searchhrsoftware.techtarget.com/definition/human-resources-analytics-talent-analytics

In short, HR analytics demonstrates the causal relationship between the activities exacted
by an HR department and the business outcomes that result from this activity.
https://www.humanresourcesmba.net/faq/what-is-human-resources-analytics/
Before We dig deep into HR
Analytics

Business Data Analyst


Analyst
Analytics Career
Options

Data Engineer Data Scientist


Potential Starting Points
HR Business
Processes Insights Results

Workforce 1. Resource Needs (Current & Forecasted)


Avoid workforce
Planning 2. Workforce Related Actions/Programs
shortfalls

Acquisition & 1. Effectiveness of Recruiting Efforts


Reduce cost and
Movement 2. Workforce Migration
improve quality
1. Focused Use of Top Performers
Workforce 2. Impact to Retention Increased
Performance 3. Effective Management Structure productivity

Demographic & 1. Effectiveness of Inclusion Programs Improved culture


Diversity 2. Early Identification of Gaps in Diversity and talent
management
Learning & 1. Effectiveness of Development Programs A truly
Development 2. Alignment of Progression to Development competitive
workforce
1. Turnover Issues Reduce “talent
Retention
2. Loss of Development Investments churn”
Stages of Workforce
Analytics
Descript Diagnos Predicti
ive tic ve Prescript
Analytic ive
Analytic Analytic Analytics
s s s
Why did What
What What will should
it
Happene happen we do?
Happene
d? i f. . ?
d?

The Four Stages Of Workforce Analytics: What Level Are You? - By Charles Coy
The Process in Brief
Data Source Combine & Combine &
Clean Clean

Demographics

Compensation

Engagement

Performance …..which is used to


Data is exported from
analyze data. By
existing systems and
analyzing the pattern and
combined in one
ATS/ETS/ESS causal relations one can
Dataset…..
build predictive models.
Descriptive & Diagnostic Analytics
Typically Speaking Descriptive & Diagnostic Analytics is based on Employee
Performance Metrics & HR Metrics

Employee
HR Metrics
Performance Metrics
Work quality metrics HR metrics in recruitment / Operations

Work quantity metrics HR metrics related to revenue

HR metrics related Organizational


Work efficiency metrics
performance
HR metrics related Process
Organizational performance metrics
optimization
Predictive Analytics
While other departments in an organization deal with profits, sales growth, and
strategic planning, Human Resources (HR) is responsible for employee well-being,
engagement, and staff motivation. Even though it may not be immediately
obvious, the management of these duties often requires a great deal of
measurement and technical skill. Predictive HR Analytics provides a clear and
accessible framework to understanding and learning to work with HR analytics at
an advanced level, using examples of particular predictive models, such as
diversity analysis, predicting turnover, evaluating interventions, and predicting
performance.
Understanding Data
Nominal
Categorica
l
Ordinal
Data
Types
Discrete Dat
a
Numerical
Continuous
Data
Categorical Data
Categorical data represents characteristics. The objects being
studied are grouped into categories based on some qualitative
trait. The resulting data are merely labels or categories.

Therefore it can represent things like a person’s gender, language


etc.

Categorical data can also take on numerical values (Example: 1 for


female and 0 for male).

Note that those numbers don’t have mathematical meaning.


Categorical Data
Nominal Data Binary
A type of categorical data
in which objects fall into Non
unordered categories.
Categori Binary
cal Data
Ordinal Data
Binary
A type of categorical data
in which order is important. Non
Binary
Examples of Nominal & Ordinal Data
Examples: Nominal Data
• Hair color – blonde, brown, red, black, etc. blonde, brown, red, black, etc.
• Race – Caucasian African Caucasian, African
• Smoking status – smoker, non-smoker

Examples: Ordinal Data


• Class – fresh, sophomore, junior, senior, super senior fresh, sophomore,
junior, senior, super senior
• Degree of illness – none mild moderate severe,….., going, gone
• Opinion of students about riots – ticked off, neutral, happy
Categorical Data
Discrete Data
Only certain values are
possible (there are Gaps
between the possible values)
Numerica
l Data
Continuous Data
Theoretically, any value
within an interval is possible
with a fine enough
measuring device.
Examples of Discrete & Continuous
Data
Discrete Data
SAT scores ; Number of students late for class; Number of crimes
reported to police; Number of times the word number is used
Generally, discrete data are counts.

Continuous Data
Cholesterol level; Height; Age; Time to complete a homework
assignment
Generally, continuous data come from measurements.
Types of Data
Quantitative
Data
Types of
Qualitative Data
Data
Mixed Methods
Data
Quantitative Data
 Requires use of statistical analysis
 Variables can be identified and relationships measured
 Counted or expressed numerically
 Often perceived as a more objective method of data analysis
 Typically collected with surveys or questionnaires
 Often represented visually using graphs or charts

Example: An evaluator may wish to measure the knowledge of


social skills amongst program participants. He/she may administer
surveys to participants to test their knowledge of these social skills.
Qualitative Data
 Examines non-numerical data for patterns and meanings
 Often described as being more “rich” than quantitative data
 Is gathered and analyzed by an individual, it can be more
subjective
 Can be collected through methods such as observation
techniques, focus groups, interviews, and case studies

Example: Evaluators may wish to look at the level of engagement


of afterschool staff in program trainings. He/she might conduct
interviews of these staff members to capture the level of
engagement that each staff member feels they have during the
trainings.
Mixed Methods Data
 May increase the validity of your evaluation
 May explain unexpected results obtained using only one approach
(quantitative or qualitative)
 Help you capture both process and outcome results
 May strengthen your analysis

Example: You may administer a survey to participants which solicits


answers that are eligible for statistical analysis as well as conduct a
focus group with a sampling of participants to capture any nuances
the survey may have missed.
DATA SOURCES
Primary Data Sources Secondary Data Sources

Data that are not pre-existing and are Information that has already been collected,
collected by the evaluator using methods processed and reported out by another
such as observations, surveys or interviews researcher/entity

Provides information if existing data on your Offers an opportunity to review any and all
topic/project is not current or directly secondary data available for your project
applicable to your evaluation questions before collecting primary data

Can be more expensive and time- Will tell you what questions still need to be
consuming, but it enables you to collect data addressed and what data you should collect
that is specific to the purpose of your yourself
evaluation
Data Collection Techniques
 Interviews
 Questionnaires and Surveys
 Observations
 Focus Groups
 Ethnographies, Oral History, and Case Studies
 Documents and Records
Interviews
 Interviews can be conducted in person or over the telephone
 Interviews can be done formally (structured), semi-structured, or
informally
 Questions should be focused, clear, and encourage open-ended
responses
 Interviews are mainly qualitative in nature

Example: One-on-one conversation with parent of at-risk youth


who can help you understand the issue
Questionnaires and Surveys
 Responses can be analyzed with quantitative methods by
assigning numerical values to Likert-type scales
 Results are generally easier (than qualitative techniques) to
analyze
 Pretest/Posttest can be compared and analyzed

Example: Results of a satisfaction survey or opinion survey


Observations
 Allows for the study of the dynamics of a situation, frequency
counts of target behaviors, or other behaviors as indicated by
needs of the evaluation
 Good source for providing additional information about a
particular group, can use video to provide documentation
 Can produce qualitative (e.g., narrative data) and quantitative
data (e.g., frequency counts, mean length of interactions, and
instructional time)

Example: Site visits to an after-school program to document the


interaction between youth and staff within the program
Focus Groups
 A facilitated group interview with individuals that have something
in common
 Gathers information about combined perspectives and opinions
 Responses are often coded into categories and analyzed
thematically

Example: A group of parents of teenagers in an after-school


program are invited to informally discuss programs that might
benefit and help their children succeed
Ethnographies, Oral History, and Case Studies
 Involves studying a single phenomenon
 Examines people in their natural settings
 Uses a combination of techniques such as observation, interviews,
and surveys
 Ethnography is a more holistic approach to evaluation
 Researcher can become a confounding variable

Example: Shadowing a family while recording extensive field notes


to study the experience and issues associated with youth who have
a parent or guardian that has been deployed
Documents and Records
 Consists of examining existing data in the form of databases,
meeting minutes, reports, attendance logs, financial records,
newsletters, etc.
 This can be an inexpensive way to gather information but may be
an incomplete data source

Example: To understand the primary reasons students miss school,


records on student absences are collected and analyzed
Some Basic Statistical Tools
 Normal Distribution
 Measures of Central Tendency
• Mean
• Median
• Mode

 Measures of Variability
• Range, Interquartile range (IQR)

 Variance and Standard Deviation


 Modality
Normal Distribution
The normal distribution is one of the most
important concepts in statistics since nearly all
statistical tests require normally distributed data.
It basically describes how large samples of data
look like when they are plotted. It is sometimes
called the “bell curve“ or the “Gaussian curve“.
Inferential statistics and the calculation of
probabilities require that a normal distribution is
given. This basically means, that if your data is
not normally distributed, you need to be very
careful what statistical tests you apply to it since
they could lead to wrong conclusions.
A normal Distribution is given if your data is
Measures of Central Tendency
 Mean
 Median
 Mode
Measures of Variability
The most popular variability measures are the range, interquartile range (IQR),
variance, and standard deviation. These are used to measure the amount of spread
or variability within your data.
The range describes the difference between the largest and the smallest points in
your data.
The interquartile range (IQR) is a measure of statistical dispersion between upper
(75th) and lower (25th) quartiles.

While the range measures where the beginning and end of your datapoint are, the
interquartile range is a measure of where the majority of the values lie.
Variance and Standard Deviation
The Standard Deviation (σ) and the Variance also measure, like the Range and IQR,
how spread apart our data is (e.g the dispersion). Therefore they are both derived
from the mean.

The variance is computed by finding the difference between every data point and
the mean, squaring them, summing them up and then taking the average of those
numbers.

The standard deviation (σ) is a statistic that measures the dispersion of a dataset
relative to its mean and is calculated as the square root of the variance. It is
calculated as the square root of variance by determining the variation between each
data point relative to the mean. If the data points are further from the mean, there is
higher deviation within the data set; thus, the more spread out the data, the higher
the standard deviation.
Standard Deviation
Modality
The modality of a distribution is determined by the number of peaks
it contains. Most distributions have only one peak but it is possible
that you encounter distributions with two or more peaks.
Some Important Statistical Tool for Analytics

 Elementary Probability

 Contingency Table, Scatterplot, Pearson’s r

 Basics of Regression

 Random Variables and Probability Distributions

 Normal Distribution, Binomial Distribution & Poisson Distribution


Some Important Statistical Tool for Analytics

Contingency Table
Some Important Statistical Tool for Analytics

Scatterplot
Some Important Statistical Tool for Analytics

Pearson’s r (Correlation)
Some Important Statistical Tool for Analytics
Basics of Regression
Understanding Solver Tool
Excel has the capability to solve linear (and often nonlinear) programming
problems with the SOLVER tool, which:
– May be used to solve linear and nonlinear optimization problems
– Allows integer or binary restrictions to be placed on decision variables
– Can be used to solve problems with up to 200 decision variables

Use Solver to find an optimal (maximum or minimum) value for a formula


in one cell — called the objective cell — subject to constraints, or limits,
on the values of other formula cells on a worksheet.
What is Solver
Excel Solver belongs to a special set of commands often referred to
as What-if Analysis Tools. It is primarily purposed for simulation and
optimization of various business and engineering models.

While Solver can't crack every possible problem, it is really helpful


when dealing with all kinds of optimization problems where you
need to make the best decision. For example, it can help you
maximize the return of investment, choose the optimal budget for
your advertising campaign, make the best work schedule for your
employees, minimize the delivery costs, and so on.
Solver Dialog
Box
Solver Dialog
Box
Solver Engine
Selecting Solving Method
GRG Nonlinear
 GRG stands for “Generalized Reduced Gradient”. In its most basic form, this
solver method looks at the gradient or slope of the objective function as the
input values (or decision variables) change and determines that it has reached
an optimum solution when the partial derivatives equal zero.
 Of the two nonlinear solving methods, GRG Nonlinear is the fastest. That speed
comes with a compromise though.
 The downside is that the solution you obtain with this algorithm is highly
dependent on the initial conditions and may not be the global optimum solution.
The solver will most likely stop at the local optimum value nearest to the initial
conditions, giving you a solution that may or may not be optimized globally.
 Another requirement for the GRG nonlinear solver to obtain a good solution is
for the function to be smooth. Any discontinuities caused by IF, VLOOKUP, or
ABS functions, for example, will cause problems for this algorithm.
GRG Nonlinear
Evolutionary
 The Evolutionary algorithm is more robust than GRG Nonlinear
because it is more likely to find a globally optimum solution.
However, this solver method is also VERY slow.
 The Evolutionary method is based on the Theory of Natural
Selection – which works well in this case because the optimum
outcome has been defined beforehand.
How Evolutionary Engine works
 In simple terms, the solver starts with a random “population” of
sets of input values. These sets of input values are plugged into
the model and the results are evaluated relative to the target
value.
 The sets of input values that result in a solution that’s closest to
the target value are selected to create a second population of
“offspring”. The offspring are a “mutation” of that best set of
input values from the first population.
 The second population is then evaluated and a winner is chosen
to create the third population.
 This goes on until there is very little change in the objective
function from one population to the next.
How Evolutionary Engine works
 What makes this process so time-consuming is that each member
of the population must be evaluated individually. Also,
subsequent “generations” are populated randomly instead of
using derivatives and the slope of the objective function to find
the next best set of values.
 Now Excel gives you some control over the algorithm through the
Solver options window. For instance, you can choose the Mutation
Rate and Population Size to potentially shorten the solution.
 However, this has diminishing returns because reducing the
population size and/or increasing the mutation rate may require
even more populations to achieve convergence.
How Evolutionary Engine works
GRG Multistart
 A nice compromise between the speed of the
GRG Nonlinear algorithm and the robustness of
the Evolutionary algorithm is GRG Nonlinear
Multistart. You can enable this option through
the Solver Options window, under the GRG
Nonlinear tab.
 The algorithm creates a randomly distributed
population of initial values that are each
evaluated using the traditional GRG Nonlinear
algorithm.
 By starting multiple times from different initial
conditions, there is a much greater chance that
the solution found is the global optimum.
Simplex LP
It’s limited in its application because it can be applied to problems
containing linear functions only.

However, it is very robust, because if the problem you are solving is


linear you can be assured that the solution obtained by the Simplex
LP method is always a globally optimum solution.
To summarize:
If your objective and constraints are linear functions of the decision variables, you can be
confident of finding a globally optimal solution reasonably quickly, given the size of your
model. This is a linear programming problem; it is also a convex optimization problem (since
all linear functions are convex). The Simplex LP Solving method is designed for these
problems.
If your objective and constraints are smooth nonlinear functions of the decision variables,
solution times will be longer. If the problem is convex, you can be confident of finding a
globally optimal solution, but if it is non-convex, you can only expect a locally optimal solution
– and even this may be hard to find. The GRG Nonlinear Solving method is designed for these
problems.
If your objective and constraints are non-smooth and non-convex functions of the decision
variables (for example if you use IF, CHOOSE and LOOKUP functions whose arguments depend
on decision variables), the best you can hope for is a “good” solution (better than the initial
values of the variables), not a locally or globally optimal solution. The Evolutionary Solving
method is designed for these problems.
You can use integer, binary, and all different constraints on variables with all three Solving
methods. However, these constraints make the problem non-convex and much harder to
solve.
Set Objective box
If you want the value of the objective cell to be as large as possible,
click Max.

If you want the value of the objective cell to be as small as possible,


click Min.

If you want the objective cell to be a certain value, click Value of,
and then type the value in the box.

In the By Changing Variable Cells box, enter a name or reference for


each decision variable cell range. Separate the non-adjacent
Something to keep in mind
Solver’s basic purpose is to find a solution – that is, values for the decision variables in your
model – that satisfies all of the constraints and maximizes or minimizes the objective cell
value (if there is one). The kind of solution you can expect, and how much computing time
may be needed to find a solution, depends primarily on three characteristics of your model:
a) Your model size (number of decision variables and constraints, total number of formulas)
b) The mathematical relationships (e.g. linear or nonlinear) between the objective and
constraints and the decision variables
c) The use of integer constraints on variables in your model
Other issues, such as poor scaling, can also affect solution time and quality, but the above
characteristics affect the intrinsic solvability of your model. Although faster algorithms and
faster processors can help, some non-convex or non-smooth models could take years or
decades to solve to optimality on the fastest imaginable computers.
Your model’s total size and the use of integer constraints are both relatively easy to assess
when you examine your model. The mathematical relationships, which are determined by the
formulas in your model, may be harder to assess, but they often have a decisive impact on
solution time and quality
Types of Regression
Linear Regression
A linear regression refers to a regression model that is completely
made up of linear variables. Beginning with the simple case, Single
Variable Linear Regression is a technique used to model the
relationship between a single input independent variable (feature
variable) and an output dependent variable using a linear model i.e
a line.
Types of Regression
Logistic Regression
Logistic regression is used to find the probability of event=Success
and event=Failure. We should use logistic regression when the
dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature.
Here the value of Y ranges from 0 to 1 and it can represented by
following equation.

odds= p/ (1-p) = probability of event occurrence / probability of not


event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
Types of Regression
Polynomial Regression
When we want to create a model that is suitable for handling non-
linearly separable data, we will need to use a polynomial
regression. In this regression technique, the best fit line is not a
straight line. It is rather a curve that fits into the data points. For a
polynomial regression, the power of some independent variables is
more than 1. For example, we can have something like:

Y = a_1*X_1 + (a_2)²*X_2 + (a_3)⁴*X_3 ……. a_n*X_n + b


Types of Regression
Ridge Regression
A standard linear or polynomial regression will fail in the case where
there is high collinearity among the feature variables. Collinearity is
the existence of near-linear relationships among the independent
variables.
Types of Regression
Lasso Regression
Lasso Regression is quite similar to Ridge Regression in that both
techniques have the same premise. We are again adding a biasing
term to the regression optimization function in order to reduce the
effect of collinearity and thus the model variance.
Types of Regression
ElasticNet Regression

ElasticNet is a hybrid of Lasso and Ridge Regression techniques. It


is uses both the L1 and L2 regularization taking on the effects of
both techniques:

min || Xw — y ||² + z_1|| w || + z_2|| w ||²

A practical advantage of trading-off between Lasso and Ridge is


that, it allows Elastic-Net to inherit some of Ridge’s stability under
rotation.
Developing Model for Employee
Retention
Employee turnover is a big problem for some companies – especially when it
comes to the company’s high potentials. The so-called ‘war on talent’ hits
everyone.

When an employee leaves the organization, the organization loses money.


There is an additional number of things that may happen as well:
Knowledge and contacts are lost.
Negative impact on colleagues.
Onboarding of new hires.
Hiring is expensive.
General Perception for Attrition
Better rewards.

Actively promote internally.

Better leadership.

Engage your workforce.

But that’s not all…


Turnover Predictor: Demographic
Marital status
Kinship responsibilities
Children
Age
Tenure

How to Measure: Demographic information is in general easily


accessible through an organization’s HR Information System.
Turnover Predictor: Stress
Role clarity
Role conflict
Role overload

How to measure stress: Job descriptions may give information


about role clarity. In addition, when someone temporarily takes over the
position of his/her manager, it is likely that there will be a role conflict. This
information is difficult to measure, but it is not impossible. Role overload and
overall stress are particularly hard to observe. Even more so since stress is
often subjective. In order to measure them you need to use surveys.
Turnover Predictor: Job content
Job content is all about how people experience their job.

Routinization

Promotional chances

Instrumental communication

How to measure job content: There is no system that offers the


possibility of entering job content or level of routinization, so surveys are the way to go.
Instrumental communication can be analyzed through certain text mining techniques,
which analyze written internal communication.
Turnover Predictor: External
environment
People constantly compare their situation with that of others. This also holds true for people’s jobs.
Alternative job opportunities

How to measure external environment: Some of the specialized HR consultancy


firms have access to large amounts of detailed function data. This data gives a relatively accurate
description of the demand for a certain job. People with popular jobs can more easily find an alternative
job, have more alternatives and are more likely to be approached by recruiters. By matching the job
functions at your company with their respective databases you can estimate which employees will be most
tempted to switch.
Turnover Predictor: Work and job
satisfaction
Job satisfaction
Job met expectations
Job involvement
Work satisfaction

How to measure work and job satisfaction: satisfaction is


subjective. In order to accurately measure work and job satisfaction, you can use
surveys. Alternatively, these variables are often also measured in engagement surveys.
Turnover Predictor: Compensation
Pay satisfaction

Distributive justice

How to measure compensation: you can benchmark payment data with


market data to find a pay-comparison. When someone is underpaid, he/she will be more
likely to be dissatisfied and leave. Benchmark data may give you an indication about
whether you under- or overpay your employees.
Turnover Predictor: Leadership
Supervisory satisfaction

Leader-member exchange (LMX)

How to measure leadership: Unfortunately, the leadership


variables are hard to measure. However, when you have large teams you can use the
team as a control variable in your analysis. If one team loses employees much more
rapidly than other teams, it might indicate that there’s something about the team
manager’s leadership style. A closer analysis is always needed to justify this conclusion,
of course.
Turnover Predictor: Co-Workers
Work group cohesion

Co-worker satisfaction

How to measure co-workers : Attitudes towards co-workers can only


be measured through surveys.
Turnover Predictor:
Indicators
Lateness:

Absenteeism

Performance

How to measure these indicators: Absenteeism data is usually


already recorded by organizations. This data can be used to predict turnover. In addition,
performance data is also easy to obtain and often available through the company’s
performance management system.
Lets Check if you have got a hand over
predicting
Data in the HR Analysis Orginal
Day 2 Schedule
Understanding Data Visualization
Using Tableau
Using Power BI

You might also like