Quality of Hire Machine Learning

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/357699579
Quality of Hire: Expanding the multi level fit employee selection using

Machine Learning
Preprint in International Journal of Organizational Analysis · January 2022

DOI: 10.1108/IJOA-06-2021-2843
CITATIONS READS
0 306
1 author:
Sateesh Shet
Narsee Monjee Institute of Management Studies
15 PUBLICATIONS 101 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Competency based Human Resource Management View project
Human resource analytics View project
All content following this page was uploaded by Sateesh Shet on 23 February 2022.
The user has requested enhancement of the downloaded file.

The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1934-8835.htm
Quality of hire
Quality of hire: expanding the
multi-level fit employee selection
using machine learning
Sateesh Shet and Binesh Nair
School of Business Management, NMIMS University, Mumbai, India
Received 29 June 2021
Revised 8 January 2022
Accepted 9 January 2022
Abstract
Purpose – Organizational psychologists and human resource management (HRM) practitioners often have
to select the “right fit” candidate by manually scouting data from various sources including job portals and
social media. Given the constant pressure to lower the recruitment costs and the time taken to extend an offer
to the right talent, the HR function has to inevitably adopt data analytics and machine learning for employee
selection. This paper aims to propose the “Quality of Hire” concept for employee selection using the person-
environment (P-E) fit theory and machine learning.
Design/methodology/approach – The authors demonstrate the aforementioned concept using a
clustering algorithm, namely, partition around mediod (PAM). Based on a curated data set published by the
IBM, the authors examine the dimensions of different P-E fits and determine how these dimensions can lead to
selection of the “right fit” candidate by evaluating the outcome of PAM.
Findings – The authors propose a multi-level fit model rooted in the P-E theory, which can improve the
quality of hire for an organization.
Research limitations/implications – Theoretically, the authors contribute in the domain of quality of
hire using a multi-level fit approach based on the P-E theory. Methodologically, the authors contribute in
expanding the HR analytics landscape by implementing PAM algorithm in employee selection.
Originality/value – The proposed work is expected to present a useful case on the application of machine
learning for practitioners in organizational psychology, HRM and data science.
Keywords Multi level fit, Quality of hire, Employee selection, Person–environment fit,
Machine learning, Partition around mediod (PAM), HR analytics, Clustering, Algorithm,
Unsupervised learning
Paper type Research paper
1. Introduction
Internet, social media, mobile devices and wireless connections facilitate a continuous
flow of information created by users and devices in real time worldwide. This data
can provide several insights in human resource management (HRM). From basic
metrics to organized data, human resources analytics (HRA) has gradually altered the
use of data in HRM (El-Rayes et al., 2020; Minbaeva, 2018; Ore and Sposato, 2021). As
the number of applicants who upload their data on job portals has increased, it has
become difficult for organizations to manage the data despite having e-HRM
processes. HR practitioners find it taxing to analyze various forms of data, including
text, audio and video, from sources such as social media, external job portals, vendor
portals and university campus portals to determine the “right fit” candidate for a job
role. Scrutinizing such huge volume and variety of data through physical means is not
an option. We find this to be an evidence of gaps in addressing the problem of International Journal of
Organizational Analysis
“Quality of Hire”; hence, we propose a multi-level fit approach for employee selection © Emerald Publishing Limited
1934-8835
to boost the quality of hiring. DOI 10.1108/IJOA-06-2021-2843
IJOA The concert of quality of hire has captured the interest of scholars and management
practitioners in the form of person-environment (P-E) fit, which refers to “compatibility
between an individual and a work environment that occurs when their characteristics are
similar” (Kristof-Brown et al., 2005; van Vianen, 2018). Nevertheless, the P-E fit can take
place in several forms comprising person-supervisor (P-S) fit, person-group (P-G) fit, person-
job (P-J) fit, person-organization (P-O) fit and person-vocation fit. Such fits are of basic
concern for both organizations as well as employees, as a good fit can reduce job stress,
enhance job performance and increase job satisfaction (Arthur et al., 2006; Kristof-Brown
et al., 2005; Oh et al., 2014).
The attraction-selection-attrition (ASA) model suggests that the natural tendency of self-
selection and attraction toward similarity forces organizations and individuals toward
homogeneity and fit (Schneider, 1987). The P-E fit is a helpful practical parameter, as it
highlights most practices of HRM. Familiarity with the P-E fit can assist managers to form,
evaluate and deliver the HRM practices in an effective manner (Chuang et al., 2015). Though
employee selection is a major practice of an organization, existing research does not focus on
selection decisions that take place in reality (Bolander and Sandberg, 2013; Zysberg and
Nevo, 2004) . Consequently, there is limited knowledge on how to make selection decisions,
which can cause difficulties in assessing how the selection tools and related procedures will
strengthen the employee selection process (Anderson et al., 2001).
HR Analytics is a modern concept that facilitates data-driven decision-making. Insights
from large volumes of data can enable predictions on employee selection, performance,
participation in social networks, interest in training, etc. In HRM, the novelty of multiple
source data is an important resource because it offers information (as knowledge) that
makes decision-making a data-driven process rather than a process based on general
wisdom or intuition (Roberts, 2013). The paradigm shift toward proactive decision-making
based on data and machine learning (ML) algorithms is based on continuous flow of current
data compared with the conventional decision-making driven by intuitions or instincts
(Chamorro-Premuzic et al., 2017). Researchers such as Nikolaou (2019) have proposed to
evaluate the influence of technology on recruitment, selection and assessment processes
with emphasis on big data. In view of these issues, scholars are increasingly employing
multi-level approaches rather than linear models used for the analysis of a single
phenomenon (van der Laken et al., 2018). Thus, we envisage the need to establish an ML
process for selecting a quality hire by applying the P-E fit approach on big data.
Although there has been substantial research on P-E fit, there are several unaddressed
areas from the research perspective. First, the research in this area has been mainly
emphasized on the association between a single fit and multiple fits (for example, P-O fit, P-
V fit, or P-J fit). Researchers have recently called for a more integrative research on different
types of fit as a “holistic” or “nested” view in fit research (Abdalla et al., 2018; Edwards,
2008; van Vianen, 2018). Second, while studying the P-E fit with multiple dimensions such
as P-S, P-G, P-O and P-J, researchers should focus on multiple content dimensions, such as
interests, personality, goals and values, of each individual dimension of the P-E fit (van
Vianen, 2018). Third, the latest developments in the theory of fit have revealed that the most
productive experiences are those wherein different types of fit exist concurrently, indicating
the P-E fit as a construct with multiple dimensions. These shortcomings have collectively
led to failure in identifying recruitment as a process of decision-making (Swider et al., 2015).
Further, researchers have advocated addressing this gap by expanding the dimensions of a
sub fit with the models of new fit development using technology. Finally, there have been
several research opportunities concerning the application of ML algorithms in the context of
human resources to convert the underused HR data into valuable insights (Angrave et al.,
2016). These gaps comprise the analysis of wide sources of new forms of data such as voice, Quality of hire
images, videos, spatial data and text. Based on the research gaps evidenced above, we seek
answers to the following research questions:
RQ1. What are the dimensions of “Quality of Hire” in the overall P-E fit domain of
employee selection?
RQ2. How can ML algorithm partition around mediod (PAM) be used to address
“Quality of Hire” in employee selection?
With the above research questions, our objective is to propose a model for “Quality of Hire”
in an organization and a technique for selecting the right fit candidate. In addressing the
above research questions, we attempt to make the following contributions in this work.
First, identification of multiple dimensions of each fit applicable to P-S, P-G, P-O and P-J to
bring all dimensions under the P-E fit and concurrent examination of the multiple variables.
Second, proposal of an end-to-end ML framework to address the challenge of identifying the
right candidate as quality of hire with multiple dimensions of the P-E fit. Third, application
of the clustering technique in the HRM domain of recruitment and selection for P-E fit using
the PAM algorithm. This research will be beneficial for practitioners, academicians and
researchers in addressing the perennial issues and challenges of identifying the right talent
for an organization. We contribute in the debates on quality of hire in an organization
besides developing a mechanism to use big data in the form of both structured and
unstructured data from multiple sources and deriving insights using ML algorithms.
2. Theoretical development
2.1 Machine learning in employee selection
ML, a branch of artificial intelligence (AI), is defined as a phenomenon wherein a machine
can learn by itself without being explicitly programmed. As workforce digitization is
generating “ever increasing volumes of data,” algorithms have become essential to the
interpretation of data because it has the capability to add value. Of all the business
functions, HR is known as a laggard in data-driven decision making, even though it has
huge potential in this field (Angrave et al., 2016). The availability of big data has led to
increased adoption of ML for the optimization and improvement of processes within the HR
function, with analytics being applied at every stage of HRM – from hiring to exiting the
organization.
ML assists in assessing talent management and identifying opportunities for a more
effective management of human capital. For instance, a few innovative applications have
been developed to assess talent via ML algorithms that translate the digital records of a
person (for example, their footprints on social media) into personality dimensions, observe
future performance and counterproductive behaviors and predict the leadership potential
(Chamorro-Premuzic et al., 2017). Identify fake responses by analyzing item response
patterns of a Big Five personality in personality tests (Calanna et al., 2019). The volume,
velocity, variety (three V’s) of big data expand the talent pool of future employees and
significantly minimize the costs related to talent selection (Landers and Schmidt, 2016;
Illingworth et al., 2015; Morrison and Abraham, 2015). The importance of social networking
sites such as LinkedIn, Facebook and Twitter has increased in recruitment and selection, as
these sites provide important information that is generally not captured in resumes.
The digital interview is another creative application that converts non-verbal and
verbal images of interviewees’ behavior into psychological profiles, which help in
making predictions, making the process less expensive and minimizing the partiality
IJOA of inexperienced interviewers (Chamorro-Premuzic et al., 2017). For instance, Unilever
uses AI in its hiring process to improve diversity in their campus hiring (Feloni, 2017)
using HireVue, a video interviewing platform. Similarly, Shell applies analytics to its
recruitment process, leading to efficient assessment, improved candidate quality,
improved candidate experience, increased flexibility and reduced cost. The HR
function, however, can capitalize on various data sources instead of focusing on a
specific data source to answer intricate questions. For example, data from Human
Resource Information System (HRIS) can be used for clustering employees into
distinct groups using various clustering algorithms to improve hiring efficiency and
employee engagement and to retain talent.
2.2 Quality of hire using Person-Environment fit theory

During the past few decades, several scholars and researchers have identified the “fit at
work” concept as a significant variable at the workplace (Cooman et al., 2015; Iyer et al.,
2019). We discuss the P-E fit dimensions in addressing the “quality of hire” and attempt to
question how P-E dimensions influence the “quality of hire” (Table 1). In the business world,
the P-E fit has drawn the attention of incumbent workers, job searchers and recruiters (van
Vianen, 2018). Kristof-Brown and Guay have described four types of fit that have emerged
with P-E fit research – P-S fit, P-G fit, P-O fit and P-J fit. Deng and Yao (2020) and Kristof-
Brown et al. (2005) have asserted that such P-E fit dimensions have contributed to the
literature on job search, managerial selection decisions, performance, turnover and work
attitudes.
2.2.1 Person-Group fit. P-G fit is one of the most under-researched areas in P-E fit
(Christina et al., 2018; Seong, et al., 2015). It can be described as congruence between a
person and other work group members. Both the group and individual performances
are influenced by the attainment of P-G fit, which ultimately affects organizational
effectiveness. Further, complementary P-G fit takes place when the weakness of a
group is compensated by the strength of an individual’s ability and vice versa, whereas
Person Characteriscs Job Fit

- Job Knowledge
Age - Job Skills
Gender - Job Abilities
Naonality - Job Competence
Religion - Job Complexity
Physical Appearance - Job Qualification
Personality
Value Orientaon Group Fit
Language - Group Congruence
Compensaon
- Group Values
Qualificaon
- Group Culture
Last Organizaon
- Complimentary Skills
Place
Instute /Alumni Connect
- Supplementary Skills
Organizaon Fit
Competencies
Experience - Organization Culture
- Organization Values
Supervisor Fit
Table 1. - Leadership congruence
Indicative fit - Supervisor workstyle
variables of person – - Supervisor Values
environment fit - Supervisor Goal Orientation
- Supervisor Personality
supplementary P-G fit takes place when a person shares attributes with other persons Quality of hire
of a group (Seong et al., 2015). The complementary fit is suitable for most of the
employment selection decisions.
2.2.2 Person-Job fit. P-J fit emphasizes assessment at the individual level and ensures
that employees are technical experts in executing their assigned work and contribute
significantly to the organization (Kim et al., 2018; Kuntz and Abbott, 2017; Thompson et al.,
2015). The term “P-J fit” indicates similarity between the requirements of a job (for instance,
abilities, skills and knowledge) and qualifications of candidates. Consequently, two different
types of P-J fit have been revealed. First, needs-supplies fit, indicates the congruity between
the needs of people and supplies that they receive from their job. Second, demands-abilities
fit, indicates the match between job demands and individuals’ knowledge, skills and abilities
(Cable and DeRue, 2002; Huang et al., 2019).
2.2.3 Person-Organization fit. P-O fit means an agreement between an individual and
organization that takes place when one party offers what the other party requires, with both
parties sharing similar basic traits (Dahleez et al., 2021; Ismail et al., 2019; Kristof-Brown
et al., 2005). It is considered a major influencer in the P-E fit. Thus, Abdalla et al. (2018)
proclaimed that researchers should investigate the similarity between people and
organizations while understanding and forecasting their behaviors and attitudes. The work
adjustment theory is arguably the most famous theoretical approach of P-O fit (Dawis and
Lofquist, 1984). As per this theory, fit can be explained as a reciprocal association wherein
people and work environments are equally responsive.
2.2.4 Person-Supervisor fit. P-S fit indicates the similarity between an individual and the
supervisor at the workplace (Astakhova, 2018; Kristof-Brown et al., 2005). The theory of
interpersonal attraction describes that a person is attracted toward another person because
of similar attributes such as personality, values, activity preferences and life goals. If an
employee and a supervisor are attracted to each other because of such similarities, then it
can be said that they are “fit” to each other. In organizational psychology, interpersonal
attraction is considered one of the most important research topics. Some studies have
investigated the role of leadership style, work style, personality and values, employee’s well-
being and leadership and employee commitment (van Vianen, 2018) as some dimensions of
the P-S fit.
The above discussions provide evidence for the absence of a “quality of hire” framework
and applications with ML algorithms for analyzing the best fit from the P-E fit perspective.
Based on the above theoretical understanding of different fits under the P-E fit environment,
we first identify the dimensions of each of the fit and create a combined list of dimensions
(Table 1) that affect quality of hire.
3. Research method
Data aggregated from multiple sources aid in answering various critical questions employed
in hiring decisions to identify the “best fit” candidates, such as “Is the job function most
appropriate for this candidate?”, “How likely a candidate may complete probation?” and
“Will a candidate stay longer if required opportunities are given or will they hop to another
pasture?” Responses to these questions will require understanding of the interactions of
various features in the underlying data and their influence on each other. However, because
the underlying data will have numerous features, manual identification of distinguishing
features is not feasible. We thus propose the adoption of an unsupervised learning approach
in the form of cluster analysis. Clustering is the process of grouping objects into clusters of
similar objects (Han et al., 2011). Clustering is useful especially when the probable
relationships between various features in the data are unclear. A clustering algorithm can
IJOA provide a structure to the underlying data by identifying features that distinguish different
clusters.
While no clustering model can respond to any of the above questions with complete
certainty, it can provide some vital cues to the patterns in the underlying data. This can be
achieved in two stages. First, build a clustering model using PAM algorithm within the k-
medoids clustering method for clustering employees. Second, map the candidates’ profile to
the clustering model. PAM algorithm has been applied to identify potential hotspots of
passengers and assist taxi companies in managing their fleets (Bhat, 2014; Han et al., 2011;
Li et al., 2017; Verma and Baliyan, 2017).
To represent clusters, k-medoids choose objects rather than the mean value of objects
using one representative object in every cluster. This makes k-medoids more robust to
outliers and noise, unlike the k-means clustering algorithm (Jin and Han, 2017). K-medoids
also inherit many of the advantages of k-means algorithm, namely, easy implementation
and scalability compared with other clustering approaches such as model-based clustering
and hierarchical clustering. While k-medoids may not be as optimized as some of the model-
based algorithms such as Gaussian Mixture Modelling (GMM), our decision to consider the
former has stemmed from two critical parameters in ML – ability to handle mixed types of
data and easy interpretation of models. Communicating insights derived from an ML model
is as important as building a model and this begins through interpretation of the results of a
model. The results of PAM are easier to interpret even for non-experts in ML. Figure 1
presents the PAM algorithm.
3.1 Data
Figure 1 presents an analytical framework for making hiring decisions based on the data.
Given the paucity of real-life HR data sets for research purposes (fueled by data privacy and
confidentiality norms), we chose the data set curated by IBM and available in the public
domain. Moreover, this dataset (published in Kaggle) is extensively used by researchers
working in the realm of HR analytics (Aswale and Mukul, 2020; Brockett et al., 2019).
Interestingly, the IBM HR data set also has features that can be mapped to the proposed
multi-level fit model. The data set has information of 1470 employees across 35 features.
While it was primarily curated to develop attrition models, we used this dataset because it
Input:
k: number of clusters,
D: data set containing n objects
Output:
A set of k clusters
Method:
1. Arbitrarily choose k objects in D as the initial representative objects or seeds;

Figure 1. 2. repeat
PAM, a k-medoids 3. assign each remaining object to the cluster with the nearest representative object;
partitioning 4. randomly select a nonrepresentative object, orandom;
algorithm (Han et al., 5. compute the total cost, S, of swapping representative object, oj, with orandom;
6. if S < 0 then swap oj with orandom to form the new set of k representative objects;
2011)
7. until no change;
has various features that represent “fit”; for instance, age (person-fit);education qualification Quality of hire
(academic-fit); job level, total working years and total years in current role (job-fit);and
number of past employers, number of years in the organization and years since last
promotion (organization-fit). For an effective demonstration of the proposed multi-level fit
model, we restricted the features to the abovementioned eight features.
The HR data aggregated from multiple data sources consists of numerous features
across various aspects, namely, demographics, performance, education, experience,
psychometric assessments and so on. Each employee represents a data point “n” in the
aggregated dataset D. PAM segregates these “n” data points into k clusters.
3.2 Analytical procedure

We used “cluster”, “dplyr” and “ggplot2”libraries within R Studio for the analysis. The data
was loaded into R Studio and the data set was sliced to include only the aforementioned
eight attributes. Clustering identifies similar and dissimilar objects by computing the
distance between these objects. We used Gower coefficient to compute the distance between
employees (records) because of its ability to handle various data types. The built-in
algorithm “daisy” used for this purpose returns a dissimilarity matrix, containing the
Employees Data
Data Cleaning and

HRIS Transformaon
Candidates’ Data
Data Aggregaon
Job Applicaon
Psychometric Tests
Idenfy P-E Fit

Features
Social Profiles
Candidates’
Social Media
Develop
Clustering Model
Discover
Figure 2.
‘Quality Fit’ Framework for
Candidates
making data-driven
hiring decisions
IJOA dissimilarities among the records of this HR data set. The dissimilarity matrix is computed
as given in equation (1):
Xp
ð Þ ð Þ
v kd ijk dijk
dij ¼ dði; jÞ ¼ Xk¼1
p (1)
ð Þ
v kd ijk
k¼1
where dij is weighted mean of dij(k) with weights v kd ij(k), v k = weights[k], d ij(k) is 0 or 1 and
dij(k), the contribution of kth variable to the total distance, is a distance between x[i, k] and x[j,
k]. Note that, as individual contributions dij(k) are in the range [0, 1], the dissimilarity matrix
dij will remain in this range.
Further, we used silhouette method to determine the optimal number of clusters, which is
required as an input to PAM. The silhouette value is a measure of similarity of an object to
its own cluster compared with that to other clusters. Equation (2) provides the mathematical
representation:
Silhouette coefficient ¼ ðx yÞ=maxðx; yÞ (2)
where x is the mean nearest cluster distance and y is the mean intra cluster distance.
Figure 3 depicts the graphical plot used to determine the optimal number of clusters (eight
clusters) required to represent the underlying data points.
4. Findings and analysis

The results of cluster analysis can be evaluated objectively, as demonstrated in Figure 4, by
determining both intra-cluster similarity and inter-cluster dissimilarity. Additionally, they
can be subjectively evaluated by domain experts (hiring specialists in this case) to
understand the business relevance of the clusters and ensure that none of the features
promotes discrimination. Once the evaluations are satisfactory, these clusters can be used
for determining the “fit” of a job applicant based on the relevant features across the
proposed multi-level fit model. Figure 4 depicts a scatter plot representation of the eight
clusters derived from the PAM algorithm. The axes represent the cohesiveness between
similar objects (employees) and separateness between distinct objects (employees). The
Figure 3.
Identifying optimal
number of clusters
based on silhouette
method
clusters are distinctly visible except for some overlapping, thereby confirming the validity Quality of hire
of the clustering process in increasing both intra-cluster similarity and inter-cluster
dissimilarity.
Each cluster is defined by specific features that distinguish it from the rest of the
clusters. For instance, Figure 5 provides a summary of the employees in the first
cluster. This cluster represents “high-performing experienced employees who have
been to college”. It contains employees who are at job level “2” (lower in the job
hierarchy), are in their late 30’s and have around 12 years of experience on an average,
of which they have spent an average of 7 years with the current organization. On an
average, these employees have been promoted every year. Similarly, the fourth cluster
represents “high-performing young professionals who hold a bachelors’ degree.” The
employees in this cluster have a mean age of 30, have 5 years of work experience and
have a bachelors’ degree. These employees also have been promoted every year on
average. The summary of fourth cluster is given in Figure 6.
Figure 4.
Clusters of employees
derived from PAM
cluster analysis
Figure 5.
Summary of first
cluster
IJOA 5. Discussion
With the influx of big data in HRM, the HR domain requires methodological contributions to
address the HRM challenges, as HR is lagging in promoting HR analytics (Angrave et al.,
2016), and there is a growing need for such contributions to justify and improve data-driven
HRM decision-making with new forms of data emerging from different sources. Our
proposed methodology applies the clustering method through PAM technique to examine
the multi-level mechanisms of employee fit at job, supervisor, group and organization levels
under the umbrella of P-E theory. This clustering technique can take inputs of different fit
variables concurrently and propose the right fit candidate, thus improving the quality of
hire process in HRM.
This research on employee selection has been conducted to illustrate the efficacy of this
technique over the existing techniques in HRM research. While PAM has been applied in
other domains as previously discussed, its application in the domain of HR is novel. The
solution provided here can save an organization’s investment in recruitment and selection
by avoiding a wrong hire, thus improving the retention rate of employees. Overall, the
contribution of individual employee efficiency to team effectiveness will have a positive
reflection on the organizational level indicators. Our studies support the need for parametric
models in HR analytics for the classification of the talent pool to predict the P-J fit, as this
will yield a high return of investment for the recruitment and selection of talent using HR
analytics (Chalutz Ben-Gal, 2019). Theoretically, we contributed in the amalgamation of the
different fits and concurrent determination of the “quality of hire” using algorithmic tools on
a real-time basis. Given the availability of diverse data sources and commoditization of ML
in recent years, it is now technically feasible for the HR function to adopt ML for optimizing
its’ processes, including hiring and talent acquisition.
5.1 Implications
Data alone will not suffice in successfully deploying PAM in the HRM. It will require a
robust strategy backed up with a buy-in from key stakeholders. We recommend starting
with small data science projects, which require fewer resources but can deliver insights to
improve some of the key performance indicators (KPIs). Further, it is imperative to cultivate
a data-driven culture within the HR function and identify various processes, including
hiring ML professionals in HRM. They can collaborate with HR specialists on various tasks,
including formulating the business problem, identifying key KPIs for evaluating the ML
models and identifying data sources. To implement this technique in HRM practice, we
invite the active collaboration of both HRBP and HR analytics in data sourcing, data
management and data processing as well as in interpreting outcomes. By adopting this
Figure 6.
Summary of fourth
cluster
model, HR resources can improve the quality of hire in a reduced timeframe. This model will Quality of hire
be highly effective, especially in mass recruitments from university campuses. It can also be
employed in delivering other HR objectives. For instance, by analyzing the cluster of “high-
engaging” employees, it will be possible to identify features that led to high engagement.
PAM algorithm is quite suitable for making informed hiring decisions, as there is no
single “perfect candidate profile.” The hiring managers can use the PAM outcomes as
described in the methodology section to determine the “fit” of the candidate. This can be
achieved by mapping the characteristics of a candidate to the features of each of the clusters.
Each cluster in this case represents a profile cutting across multiple “fit” features. Once the
managers have identified one or two clusters representing the candidate’s profile, they can
use this mapping to answer key questions such as, “How long this candidate may stay in the
organization?” or “How well he may fit in the team?” Such insights provide an opportunity
for hiring managers to ask these questions before making a hiring decision. This is a leap
compared with the conventional hiring practice, wherein whether a hire is “right” or
“wrong” is often left to the test of time. The practitioners however should note that the hiring
decision remains the prerogative of the hiring manager and insights of this nature only aid
the decision-making and hence, cannot replace the hiring manager.
5.2 Limitations and future research

Clustering is one of the most complex tasks in ML. We have applied PAM on a curated
dataset from IBM and as described, it did not have all the required features across various
“fits.” For instance, we had no data pertaining to influence of social media or results of
psychometric tests. These types of challenges will be encountered in practice as well; for
instance, identifying the required data sources itself can be a daunting exercise. Further, the
in-house HRIS may not be capturing all data or it may be capturing the wrong data. In
addition, some data points may have to be sourced from a third party such as social media
platforms while strongly adhering to data privacy guidelines. Hence, the HR function will
have to prioritize data management before adopting ML algorithms such as PAM.
Evaluating the outcome of PAM is not trivial because it learns patterns on its own (without
supervision). Hence, interpreting these patterns can be difficult for hiring managers.
Additionally, based on the selection of features, PAM may produce indistinguishable
clusters. However, this is true for any other clustering algorithm. Although PAM may be
less precise compared with some of the model-based clustering algorithms, its outcome is
easier to interpret than that of the latter. Hence, there is a tradeoff between interpretability
and precision as with any other ML algorithm. Finally, the outcome of PAM needs to be
evaluated by hiring specialists to determine the business relevance of the clusters. Moreover,
additional caution needs to be taken by both data scientists and hiring managers to avoid
any discriminatory features such as ethnicity, gender or any other variable of national
context.
The future research can include profiling the talents for high performance; this clustering
exercise will differentiate the existing employees based on their talent fit features.
Organizations can group their employees into three clusters, namely, high performing,
medium performing and low performing employees and accordingly create appropriate
HRM interventions for the respective cluster of employees. The organizations can even
explore the features that are facilitating employee engagement in the cohort of highly
engaged employees and replicate the practices among the non-engaged workforce to create a
productive workforce. Team effectiveness can be another area of future research. With IT/
ITES organizations working in large teams, PAM can examine the factors leading to
effective team functioning in an organization. The organizations can discover the degree of
IJOA inclusiveness experienced by employees from varied backgrounds, which can help in
evaluating diversity in an organization. Lastly, PAM can also examine which HRM
practices influence talent development in organizations.
5.3 Ethical issues

The use of personal data by ML algorithms creates privacy concerns. Privacy regulations
such as the General Data Protection Regulation aim to safeguard the citizens of the
European Union, empowering them to control their privacy by controlling the accessibility
and usage of their personal data, for instance, during hiring (Tankard, 2016; Wachter, 2018).
This regulation mandates the developers to provide information about the algorithms used
for making decisions (Kean, 2018). Regulations of similar nature are being adopted by
various countries; hence, it will be obligatory for the HR function to restrict the data
collected for hiring to that purpose and the candidates must be informed about the terms of
usage and the duration for which the data will be retained by the organization.
5.4 Conclusion
Multi-level modeling of quality of hire using ML can add value to several HRM practices.
We evidenced the use of this technique in employee selection for identifying right fit
candidates under the P-E fit including job-fit, group-fit, supervisor-fit and organization-fit.
The availability of a variety of data from diverse sources along with the accessibility of ML
algorithms in many data science software has provided the HR function an opportunity to
adopt data-driven decision-making. Hence, our methodological contributions in HRM are
highly relevant in this data-driven age of decision-making.
References
Abdalla, A., Elsetouhi, A., Negm, A. and Abdou, H. (2018), “Perceived person-organization fit and
turnover intention in medical centers: the mediating roles of person-group fit and person-job fit
perceptions”, Personnel Review, Vol. 47 No. 4, pp. 863-881.
Anderson, N., Herriot, P. and Hodgkinson, G.P. (2001), “The practitioner–researcher divide in
industrial, work and organizational (IWO) psychology: where are we now and where do we go
from here?”, Journal of Occupational and Organizational Psychology, Vol. 74 No. 4, pp. 391-411.
Angrave, D., Charlwood, A., Kirkpatrick, I., Lawrence, M. and Stuart, M. (2016), “HR and analytics: why
HR is set to fail the big data challenge”, Human Resource Management Journal, Vol. 26 No. 1,
pp. 1-11.
Arthur, W., Jr, Bell, S.T., Villado, A.J. and Doverspike, D. (2006), “The use of person-organization fit in
employment decision making: an assessment of its criterion-related validity”, The Journal of
Applied Psychology, Vol. 91 No. 4, pp. 786-801.
Aswale, N. and Mukul, K. (2020), “Role of data analytics in human resource management for prediction
of attrition using job satisfaction”, Data Management, Analytics and Innovation, Springer,
Singapore, pp. 57-67.
Bhat, A. (2014), “K-medoids clustering using partitioning around medoids for performing face
recognition”, International Journal of Soft Computing, Mathematics and Control, , Vol. 3 No. 3,
pp. 1-12.
Bolander, P. and Sandberg, J. (2013), “How employee selection decisions are made in practice”,
Organization Studies, Vol. 34 No. 3, pp. 285-311.
Brockett, N., Clarke, C., Berlingerio, M. and Dutta, S. (2019), “A system for analysis and remediation of
attrition”, 2019 IEEE International Conference on Big Data (Big Data), IEEE, pp. 2016-2019.
Cable, D.M. and DeRue, D.S. (2002), “The convergent and discriminant validity of subjective fit Quality of hire
perceptions”, Journal of Applied Psychology, Vol. 87 No. 5, pp. 875-884.
Calanna, P., Lauriola, M., Saggino, A., Tommasi, M. and Furlan, S. (2019), “Using a supervised machine
learning algorithm for detecting faking good in a personality self-report”, International Journal
of Selection and Assessment, Vol. 28 No. 2, pp. 176-185.
Chalutz Ben-Gal, H. (2019), “An ROI-based review of HR analytics: practical implementation tools”,
Personnel Review, Vol. 48 No. 6, pp. 1429-1448.
Christina, S., Li, A.L., Kristof-Brown, J. and Nielsen, D. (2018), “Fitting in a group: theoretical
development and validation of the multidimensional perceived person–group fit scale”,
Personnel Psychology, Vol. 72 No. 1, pp. 139-171.
Chuang, A., Hsu, R.S., Wang†, A.C. and Judge, T.A. (2015), “Does west ‘fit’ with east? In search of a
Chinese model of person–environment fit”, Academy of Management Journal, Vol. 58 No. 2,
pp. 480-510.
Dahleez, K.A., Aboramadan, M. and Bansal, A. (2021), “Servant leadership and affective commitment:
the role of psychological ownership and person–organization fit”, International Journal of
Organizational Analysis, Vol. 29 No. 2, pp. 493-511.
Dawis, R.V. and Lofquist, L.H. (1984), A Psychological Theory of Work Adjustment, University of MN
Press, Minneapolis.
Deng, Y. and Yao, X. (2020), “Person-environment fit and proactive socialization: reciprocal
relationships in an academic environment”, Journal of Vocational Behavior, Vol. 120, p. 103446.
Edwards, J.R. (2008), “Person–environment fit in organizations: an assessment of theoretical progress”,
The Academy of Management Annals, Vol. 2 No. 1, pp. 167-230.
El-Rayes, N., Fang, M., Smith, M. and Taylor, S.M. (2020), “Predicting employee attrition using
tree-based models”, International Journal of Organizational Analysis, Vol. 28 No. 6,
pp. 1273-1291.
Feloni, R. (2017), “Consumer goods giant Unilever has been hiring employees using brain games and
artificial intelligence - and it’s a huge success”, Business Insider, available at: www.
businessinsider.in/retail/consumer-goods-giant-unilever-has-been-hiring-employees-using-brain-games-
and-artificial-intelligence-and-its-a-huge-success/articleshow/59356757.cms (accessed 4 May 2020).
Han, J., Pei, J. and Kamber, M. (2011), Data Mining: Concepts and Techniques, Elsevier.
Huang, W., Yuan, C. and Li, M. (2019), “Person–job fit and innovation behavior: roles of job
involvement and career commitment”, Frontiers in Psychology, Vol. 10, p. 1134.
Illingworth, A.J., Lippstreu, M. and Deprez-Sims, A.S. (2015), “Big data in talent selection and
assessment”, in Tonidandel, S., King, E. and Cortina, J. (Eds), Big Data at Work. The Data
Science Revolution and Organizational Psychology, Routledge, New York, NY, p. 368.
Ismail, H.N., Karkoulian, S. and Kertechian, S.K. (2019), “Which personal values matter most? Job
performance and job satisfaction across job categories”, International Journal of Organizational
Analysis, Vol. 27 No. 1, pp. 109-124.
Iyer, P.V., Oostrom, J.K., Serlie, A.W., Van Dam, A. and Born, M.P. (2019), “The criterion-related validity
of a short commensurate measure of personality-based person-organization fit”, International
Journal of Selection and Assessment, Vol. 28 No. 2, pp. 143-162.
Jin, X. and Han, J. (2017), “K-medoids clustering”, Encyclopedia of Machine Learning and Data Mining.
Kim, T.Y., Schuh, S.C. and Cai, Y. (2018), “Person or job? Change in person-job fit and its impact on
employee work attitudes over time”, Journal of Management Studies, Vol. 57 No. 2,
pp. 287-313.
Kristof-Brown, A.L., Zimmerman, R.D. and Johnson, E.C. (2005), “Consequences of individual’s fit at
work: a meta-analysis of person-job, person-organization, person-group and person-supervisor
fit”, Personnel Psychology, Vol. 58 No. 2, pp. 281-342.
IJOA Kuntz, J.R.C. and Abbott, M. (2017), “Authenticity at work: a moderated mediation analysis”,
International Journal of Organizational Analysis, Vol. 25 No. 5, pp. 789-803.
Landers, R.N. and Schmidt, G.B. (2016), “Social media in employee selection and recruitment: an
overview”, in Landers, R.N. and Schmidt, G.B. (Eds), Social Media in Employee Selection and
Recruitment: Theory, Practice and Current Challenges, Springer, London, pp. 3-14.
Li, Z., Wang, G. and He, G. (2017), “Milling tool wear state recognition based on partitioning around
medoids (PAM) clustering”, The International Journal of Advanced Manufacturing Technology,
Vol. 88 Nos 5/8, pp. 1203-1213.
Minbaeva, D.B. (2018), “Building credible human capital analytics for organizational competitive
advantage”, Human Resource Management, Vol. 57 No. 3, pp. 1-13, doi: 10.1002/hrm.21848.
Morrison, J.D. and Abraham, J.D. (2015), “Reasons for enthusiasm and caution regarding big data
in applied selection research”, The Industrial–Organizational Psychologist, Vol. 52 No. 3,
pp. 134-139.
Nikolaou, I. (2019), “Editorial-current state and the future of international journal of selection and
assessment”, International Journal of Selection and Assessment, Vol. 27 No. 4, pp. 297-298.
Oh, I.S., Guay, R.P., Kim, K., Harold, C.M., Lee, J.H., Heo, C.G. and Shin, K.H. (2014), “Fit happens
globally: a meta-analytic comparison of the relationships of person-environment fit dimensions
with work attitudes and performance across East Asia, Europe and North America”, Personnel
Psychology, Vol. 67 No. 1, pp. 99-152.
Ore, O. and Sposato, M. (2021), “Opportunities and risks of artificial intelligence in recruitment and
selection”, International Journal of Organizational Analysis, available at: https://doi.org/10.1108/
IJOA-07-2020-2291.
Schneider, B. (1987), “The people make the place”, Personnel Psychology, Vol. 40 No. 3, pp. 437-454,
https://doi. org/10.1108/IJOA-07-2020-2291.
Seong, J.Y., Kristof-Brown, A.L., Park, W.-W., Hong, D.-S. and Shin, Y. (2015), “Person-group fit:
diversity antecedents, proximal outcomes and performance at the group level”, Journal of
Management, Vol. 41 No. 4, pp. 1184-1213.
Swider, B.W., Zimmerman, R.D. and Barrick, M.R. (2015), “Searching for the right fit: development of
applicant person-organization fit perceptions during the recruitment process”, Journal of Applied
Psychology, Vol. 100 No. 3, pp. 880-893.
Thompson, K.W., Sikora, D.M., Perrewé, P.L. and Ferris, G.R. (2015), “Employment qualifications,
person-job fit, underemployment attributions and hiring recommendations: a three-study
investigation”, International Journal of Selection and Assessment, Vol. 23 No. 3, pp. 247-262.
Uggerslev, K., Fassina, N. and Kraichy, D. (2012), “Recruiting through the stages: a meta-analytic test of
predictors of applicant attraction at different stages of the recruiting process”, Personnel
Psychology, Vol. 65 No. 3.
Van der Laken, P., Bakk, Z., Giagkoulas, V., van Leeuwen, L. and Bongenaar, E. (2018), “Expanding the
methodological toolbox of HRM researchers: the added value of latent bathtub models and
optimal matching analysis”, Human Resource Management, Vol. 57 No. 3, pp. 751-760.
Verma, N. and Baliyan, N. (2017), July). “PAM clustering based taxi hotspot detection for informed
driving”, 2017 8th International Conference on Computing, Communication and Networking
Technologies (ICCCNT), IEEE, pp. 1-7.
Further reading
Boon, C. and Biron, M. (2016), “Temporal issues in person-organization fit, person-job fit and turnover:
the role of leader-member exchange”, Human Relations; Studies towards the Integration of the
Social Sciences, Vol. 69 No. 12, pp. 2177-2200.
Data Aggregation (2021), IBM Knowledge Center, available at: www.ibm.com/support/knowledgecenter/
en/SSBNJ7_1.4.2/dataView/Concepts/ctnpm_dv_use_data_aggreg.html (accessed 30 April 2020).
de Jager, W., Kelliher, C., Peters, P., Blomme, R. and Sakamoto, Y. (2016), “Fit for self-employment? An Quality of hire
extended person–environment fit approach to understand the work–life interface of self-
employed workers”, Journal of Management and Organization, Vol. 22 No. 6, pp. 797-816.
Dulebohn, J.H. and Johnson, R.D. (2013), “Human resource metrics and decision support: a classification
framework”, Human Resource Management Review, Vol. 23 No. 1, pp. 71-83.
Gabriel, A.S., Diefendorff, J.M., Chandler, M.M., Moran, C.M. and Greguras, G.J. (2014), “The dynamic
relationships of work affect and job satisfaction with perceptions of fit”, Personnel Psychology,
Vol. 67 No. 2, pp. 389-420.
Zhang, S., Liang, J. and Zhang, J. (2019), “The relationship between person–team fit with supervisor–
subordinateguanxiand organizational justice in a Chinese state-owned enterprise”, International
Journal of Selection and Assessment, Vol. 27 No. 1, pp. 31-42.
Corresponding author
Sateesh Shet can be contacted at: svshet@hotmail.com
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
View publication stats

Quality of Hire Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quality of Hire Machine Learning

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Quality of Hire: Expanding the multi level ﬁt employee selection using

Preprint in International Journal of Organizational Analysis · January 2022

Competency based Human Resource Management View project

Human resource analytics View project

The user has requested enhancement of the downloaded file.

2.2 Quality of hire using Person-Environment ﬁt theory

Person Characteriscs Job Fit

D: data set containing n objects

1. Arbitrarily choose k objects in D as the initial representative objects or seeds;

3.2 Analytical procedure

Data Cleaning and

Idenfy P-E Fit

Silhouette coefficient ¼ ðx yÞ=maxðx; yÞ (2)

4. Findings and analysis

5.2 Limitations and future research

5.3 Ethical issues

View publication stats

You might also like