You are on page 1of 6

Detecting the Employee Satisfaction in Retail:

A Latent Dirichlet Allocation and Machine


Learning approach
Meenu Chaudhary Loveleen Gaur Amlan Chakrabarti
Amity International Business School Amity International Business School A.K. Choudhury School of IT
Amity University Amity University University of Calcutta
Noida, India Noida, India West Bengal, India
Email: chaudharymeenu1991@gmail.com Email: lgaur@amity.edu, Email: acakcs@caluniv.ac.in
gaurloveleen@yahoo.com

Abstract: Human resource management is one of the most retail industry is labour intensive and experiences high
critical functions but is often overlooked, which can employee turnover [2]. Frontline employees in retail
impact the organisation in a negative manner. Employee are the representatives of the employer and their
engagement, retention, job satisfaction, values, and
attitude can make or break the brand. Henceforth,
motivation are some of the important aspects. Proper
employee satisfaction cannot be overlooked as it is the
understanding and implementation in the workplace can
enable human resource managers or decision-makers to fundamental base for employees' perceptions and
make effective decisions regarding retention strategies, attitudes towards the job. Extensive research has been
recruitment and selection techniques, employer branding made on job satisfaction as it is one of the factors that
and budget allocation. Traditionally, surveys, exit highly impact the effectiveness and financial
interviews, annual evaluations, and informal interactions performance of the organisation. Employees with a low
were the primary source of data collection to identify level of satisfaction have less incentive to excel and, as
what employees feel about the organisation. With the such, may deliver lower service quality [3], affecting,
increase in the usage of platforms for posting online
in turn, corporate performance through the service
reviews, researchers and decision-makers are making
satisfaction-profitability link [4]. This increases the
appropriate use of text structured and unstructured
responses to identify and resolve issues. In this paper, we necessity to understand the factors that affect the
have explored the electronic word of mouth in form of degree of job satisfaction among employees. The
employee reviews of five companies in the retail sector literature contains significant studies that evaluate the
using Latent Dirichlet Allocation in the first phase. In the determinants of job satisfaction and its impact of a
second phase, we collected the data from employees for company's profitability [5].
predicting employee satisfaction using IBM SPSS
Modeler 18.2.1. The top twelve topics are identified using Though job satisfaction has long history but the
Latent Dirichlet Allocation and a word cloud is also measurement is limited to employee surveys and
formed. We have also made the efficient use of predictive interviews during entry and exit levels. The major
analytics to predict the satisfaction among employees that drawbacks of these traditional methods are biased
can contribute to the organisations in various forms. The feedback and usage of sample size that may be
methodology followed in this paper addresses the gap of
insufficient to draw the right inference. With the
using employee reviews and responses to evaluate the
employee satisfaction as researchers have focused on
development of information and communication
using LDA for customer reviews and predictive analytics technologies (ICTs) and the increasing prevalence of
is mostly used for employee churn. LDA technique is web 2.0 applications, people's interactions and sharing
often used in hospitality; however, no relevant literature of opinions on social media platforms have gradually
is available for retail employees. altered.
Keywords: Employee satisfaction, Latent Dirichlet Approximately, eight per cent of online data is in
Allocation (LDA), word cloud, Predictive analytics, text form and a large amount of data is being produced
machine learning every day in the form of user-generated content (UGC)
I. INTRODUCTION that foster unstructured data [6]. Online reviews are
often used to increase the scalability of the business by
Employee turnover is the outcome of job identifying the gaps and dissatisfaction among
dissatisfaction which can cost up to twenty per cent of customers. This study adds to our understanding of the
the annual salary including hiring till exit cost [1]. The aspects and dynamics of job determinants, as well as
their relationship with job happiness, by taking into [14]. Academicians have focused on customer reviews
account this previously untapped informational cue, to understand the dynamics to increase the profitability
which comes directly from current and past employees. of the organisation [15]. Extensive literature is not
Job listing sites like Glassdoor, Indeed and available, henceforth we are contributing to the
ambitionbox are helpful for job seekers, particularly existing literature for more detailed study.
when they lack internal connections to know about a
company's working circumstances. [7]. Multi-criteria decision-making techniques have been
used for satisfaction and e-satisfaction [16] among
This paper also focuses on the employer's interest to customers. Satisfaction of grocery e-retailers using
anticipate if the employees are satisfied or not using fuzzy TOPSIS and ECCSI model [17]. Moreover,
predictive analytics. Academicians and organisations extensive research concerning employee reviews is
are focusing on using machine learning and artificial missing.
intelligence primarily for employee churn analysis.
However, if leaders and policymakers focus on Trends in text classification have been increasing in
determining employee satisfaction, then it will reduce health, e-commerce, social media sentiment, and
employee churn to the utmost extent [8]. product sentiment among customers [18]. Machine
learning (ML) has been widely used in healthcare [19]
The rationale of this study is to: [20], employee churn analysis [21], keyword extraction
[22], customer relationship management [23].
• Usage of e-word of mouth in the form of Predictive analytics have been widely used in finance,
employee reviews for analysing employee marketing, and supply chain management and have
satisfaction using Latent Dirichlet Allocation. gradually been practised in human resource
• Predicting employee satisfaction using management. The trend shows that machine learning
machine learning algorithms. (ML) algorithms are used to predict employee churn;
• Contributing to the existing literature on the however, job satisfaction which is one of the prime
usage of online reviews and machine learning causes of employee churn can also be predicted. ML
for exploring job satisfaction in retail algorithms are more versatile and have an advantage
employees very less emphasis has been given over structural equation modelling (SEM) and
to date. hierarchical linear modelling (HLM) to handle
hundreds of variables in one statistical model [24].
Predictions not only serve to reduce staff attrition, but
II. RELATED WORKS also have an impact on a company's goodwill and
reputation. The author used four ML algorithms to
OECD [9] defines employee satisfaction as the sense predict job satisfaction levels [25]. The authors have
of accomplishment and gratification while working in researched analysing the product returns for improving
a specific profession. Job listing services like customer experience using LDA.
Glassdoor and Indeed are helpful for job seekers,
especially if they don't have inside connections to learn Based on the existing literature, authors have found
about a company's working circumstances [7]. In the that much emphasis has not been given by
context of retail literature, online employee reviews academicians and researchers to using machine
present new prospects for the research of employee learning algorithms to predict job satisfaction among
satisfaction and performance; yet, this informational employees in the retail sector. This research gap has
cue has remained untapped until now [10]. The author prompted the authors to include predictive analytics as
examined 400,000 reviews of over 2,200 companies on one of the objectives of this study.
Glassdoor that organizations with high employee
III. RESEARCH METHODOLOGY
ratings performed 1.35% higher (than the market
average) in terms of returns [11]. The author reviewed This paper is divided into two phases to analyse
38,000 reviews from former, current and potential satisfaction among retail employees. The authors have
employees to understand what factors to evaluate used the employee online reviews extracted through
employers using IBM Watson [12]. Through topic web scrapping for Latent Dirichlet Allocation. Online
modelling using LDA on employee reviews, Lee and reviews were analysed using the programming
Kang [13] discovered that work satisfaction language Python. In the second phase of the study, the
components were positively associated with retention. authors have used the employee dataset to predict the
One of the topic modelling techniques, Latent Dirichlet satisfaction at the workplace using IBM SPSS Modeler
Allocation (LDA), posits that each document contains for predictive analytics. The research workflow is
probabilistically dispersed subjects and that each topic diagrammatically shown in Fig. 1.
can be represented by probabilistically dispersed words
Number Number Number
Age
Gender of of Job level of
range
employee employe employee
Female 176 23-30 44 Job level 1 68
Male 191 31-40 43 Job level 2 73
41-54 62 Job level 3 70
Job level 4 80
Job level 5 76

Dataset was divided into training and testing datasets


in the ratio 80:20 using the partition node in IBM SPSS
Modeler version 18.2.1. XGradient Boost was selected
among support vector machine, linear regression,
CatBoost, and random forest based on the overall
accuracy of the classifiers/model. algorithm.

IV. DATA ANALYSIS

A. Latent Dirichlet Allocation

LDA provides relevant topics having the maximum


weights. The topics are presented by grouping the
Fig. 1: Methodology workflow keywords with the maximum possible occurrence,
distinguishing all topics from one another. The
prominence of the topics depends upon the size of the
A. Data Collection and Pre-processing bubbles. Further, the overlap of bubbles depicts the
The authors have used extracted 66,723 online presence of common keywords in the overlapping
reviews of five retail chains through web scrapping as topics. The bar chart attached with the map in LDA
shown in Table 1. Selenium library in Python was used visualization illustrates the local frequency of
to extract the review pages (unstructured data), and keywords found in topic 1.
then beautifulsoup library was used to gather the
reviews from the unstructured HTML data. The entire
dataset was then saved in a .csv file. The next step was
data pre-processing, which involves data cleaning,
where the unnecessary clutter was removed from the
data set.
TABLE 1: WEB-SCRAPPED DATA
_Occupatio Employm Ratin
n Role _Rank Author Company ent Occupation Place Review Raw g
Retail
Eazy cash for student who want to
Assistant
earn extra pocket money. Flexible
(Former Charles-&- Former Retail
Sales City Hall schedule. Required to work at least 3
Employee) - Keith Employee Assistant
one weekend per week. Nice people
City Hall -
to work with.
November 23,
Fashion Working in Charles and Keith is
Advisor definitely a good opportunity to meet
(Former Charles-&- Former Fashion Jurong with different kinds of people from
Sales 5
Employee) - Keith Employee Advisor point different profile , they provide proper
Jurong point - customer service trainings for the
July 7, 2020 staff and the best part is the working
Boutique Charles and Keith is a well known
Manager local brand in Singapore and Asian
Manage (Current Charles-&- Current Boutique Singapor countries. The working culture is fast-
3
r Employee) - Keith Employee Manager e paced and a quite stressful though
Singapore - since we are popular to tourist.
April 1, 2020 Everyday is a busy day at work but,

The questionnaire was sent to 479 people, out of which


398 responded. However, we have used only 367
responses as thirty-one responses were removed due to
incomplete information. The employee statistics on the
basis of age, gender and job level is mentioned in Table
2.
TABLE 2: EMPLOYEE STATISTICS
Fig. 2. LDA visualisation of top 6 topics
Fig. 2 depicts the results of LDA using employee accuracy. We further analysed the factors that are
reviews to evaluate satisfaction. It illustrates the topic contributing to the satisfaction among employees.
distribution on one side and a bar chart of the key
phrases that reflect the selected topic on the other. The
frequency of each word is represented by the blue
colour bar, while the local frequency is represented by
the red colour bar.

A. Word Cloud

Word cloud depicts the most frequently words


expressed in the dataset and is a data visualisation
method to represent the insights of the data concerning
the associated frequency [26]. The word cloud
effectively helps identify the top words at first glance
than spotting the keywords from the table. Fig. 3 shows
the word cloud formed for the selected database. This
Fig. 4: Analysis of XGradient Boost for the used dataset
word cloud illustrates that word like 'work', 'job',
customer', management' has the highest frequency. On XGradient Boost algorithm predicted for target
the other hand, words like 'hard', 'typical day', variable 'satisfied' using ten input variables i.e., job
enjoyable part', and 'fun' have a lower frequency. The level, salary, recruitment type, department, work
word cloud represents that emphasis is on work, location, education, awards, age, and rating. Fig. 4
customers are given priority, and management. depicts that 95.97% of predictions are made correctly.
The area under the curve (AUC) defines the ability of
the classifier/algorithm to differentiate between
classes. The higher the AUC, the better is the
performance of the classifier/algorithm in
distinguishing between negative and positive classes.
Gini coefficient measures the performance of the
binary classifiers and its value varies between 0 and 1.
The higher the Gini coefficient value, the better is the
model. AUC and Gini value for the XGradient is 0.991
and 0.982 respectively.

Figure 3: Word cloud

C. Machine learning algorithm

The employee dataset on job satisfaction was


analysed using five algorithms: support vector
machine, linear regression, CatBoost, random forest
and XGradient Boost. Result of these five algorithms is
shown in Table 3.
TABLE 3: COMPARITIVE RESULT OF MACHINE
LEARNING ALGORITHMS

Machine learning algorithms


Rando Support Linear
XGradien m Vector Regressi
Performance metrics

t Boost Forest Machine on CatBoost


Overall
Accurac
y 98.95% 93.05% 94.56% 82.78% 93.12%
Figure 5: Graphical visualisation of predictor importance
AUC 0.991 0.953 0.976 0.867 0.934

We selected XGradient boost for further analysis


based on the highest score (98.95%) of overall
Fig. 5 clearly states that 'job level' mostly contributes [2] A. Lindblom, S. Kajalo and L. Mitronen, “Does a retailer’s
charisma matter? A study of frontline employee perceptions of
to the satisfaction followed by the 'salary' of an
charisma in the retail setting,” Journal of Services Marketing, vol.
employee. 30, no. 3, pp. 266-276, 2016.

V. CONCLUSION [3] R. McPhail, A. Patiar, C. Herington, P. Creed and M. Davidson,


“Development and initial validation of a hospitality employees' job
Although, online customer reviews have been satisfaction index: Evidence from Australia,” International Journal of
extensively used by researchers, however, employee Contemporary Hospitality Management, vol. 27, no. 8, 1814–1838,
2015.
reviews have not been explored much. LDA technique
is often used in hospitality; however, no relevant [4] T. R. De Mendonca and Y. Zhou, “Environmental performance,
literature is available for retail employees. Studies customer satisfaction, and profitability: A study among large US
companies,” Sustainability, vol. 11, no. 19, pp. 5418, 2019.
reveal that employee satisfaction in retail is on a lower
degree which results in employee churn on a higher [5] A. Panagiotakopoulos, “Exploring the link between management
scale. This research gap is addressed in this paper by training and organizational performance in the small business
analysing online employee reviews using Latent context,” Journal of Workplace Learning, vol. 27, no. 8, pp. 1814-
1838, 2020.
Dirichlet Allocation to identify the top 12 topics in the
corpus. Based on word cloud analysis, it can be [6] P. Alarcón-Urbistondo, M. M. Rojas-de-Gracia and A. Casado-
concluded that employees in retail are not much Molina, “Proposal for Employing User-Generated Content as a Data
Source for Measuring Tourism Destination Image,” Journal of
satisfied with the work and organisations should focus Hospitality & Tourism Research, 2021, doi:
on employee satisfaction to decrease the employee 10.1177/10963480211012756.
turnover. Non-monetary factor 'job-level' impacts the
[7] A. Ladkin and D. Buhalis, “Online and social media recruitment:
satisfaction of an employee as compared to monetary Hospitality employer and prospective employee considerations,”
factor 'salary'. International Journal of Contemporary Hospitality Management, vol.
28, no. 2, pp. 327–345, 2016.
Data collected through a questionnaire was executed
using XGradient Boost to predict employee [8] M. Chaudhary, L. Gaur, N. Z. Jhanjhi, M. Masud and S.
Aljahdali, “Envisaging Employee Churn Using MCDM and Machine
satisfaction. The advantage of using ML algorithms is Learning,” Intelligent Automation & Soft Computing, vol. 33, no. 2,
that it provides the prediction for each employee. This pp. 1009-1024, 2022.
model can be implemented in the organisations to find
[9] OECD, TALIS 2013 Results: An International Perspective on
the employees who are not satisfied and can formulate
Teaching and Learning. OECD: Paris, 2014.
strategies accordingly.
[10] Y. Jung and Y. Suh, “Mining the voice of employees: A text
Our paper provides a better insight on satisfaction mining approach to identifying and analyzing job satisfaction factors
among retail employees using LDA and machine from online employee reviews,” Decision Support Systems, vol. 123,
113074, 2019.
learning which is missing in the existing literature. The
reasons are: employee satisfaction is analysed for [11] A. Moniz, “Inferring Employees’ Social Media Perceptions of
hospitality employees using LDA and machine Corporate Culture and the Link to Firm Value,” 2017.
learning algorithms have not been efficiently used [12] A. Dabirian, J. Kietzmann and H. Diba, “A great place to work!?
predicting employee satisfaction. Understanding crowdsourced employer branding,” Business
Horizons, vol. 60, no. 2, pp. 197–205, 2017.
VI. LIMITATIONS AND DIRECTIONS FOR
FUTURE RESEARCH [13] J. Lee and J. Kang, “A study on job satisfaction factors in
retention and turnover groups using dominance analysis and LDA
We have used employee reviews of five companies; topic modeling with employee reviews on Glassdoor.com,” in
Proceedings of the 38th International Conference on Information
however, the horizon can be increased in future studies. Systems, 2017.
Also, more data can be processed using the ML
algorithm for the prediction of employee satisfaction in [14] U. Chauhan and A. Shah, “Topic modeling using latent Dirichlet
allocation: A survey,” ACM Computing Surveys (CSUR), vol. 54,
the IT sector as it experiences the highest employee no. 7, pp. 1-35, 2021.
churn.
[15] S. Zoghbi, I. Vulić and M. F. Moens, (2016). “Latent Dirichlet
REFERENCES allocation for linking user-generated content and e-commerce data,”
Information Sciences, vol. 367, pp. 573-599, 2017.

[16] K. Anshu and L. Gaur, “E-Satisfaction estimation: a


[1] A. Garmendia, U. Elorza, A. Aritzeta and D. Madinabeitia‐ comparative analysis using AHP and intuitionistic fuzzy TOPSIS,”
Olabarria, “High‐involvement HRM, job satisfaction and Journal of Cases on Information Technology (JCIT), vol. 21, no. 2,
productivity: A two wave longitudinal study of a Spanish retail pp. 65-87, 2019.
company,” Human Resource Management Journal, vol. 31, no. 1, pp.
341-357, 2021. [17] K. Anshu, L. Gaur and D. Khazanchi, “Evaluating satisfaction
level of grocery E-retailers using intuitionistic fuzzy TOPSIS and
ECCSI model,” in 2017 international conference on infocom
technologies and unmanned systems (Trends and future directions) Topic Modeling and Keyword Extraction,” in 2022 2nd International
(ICTUS), 2018, pp. 276-284. Conference on Innovative Practices in Technology and Management
(ICIPTM), 2022, vol. 2, pp. 435-440.
[18] G. Singh, B. Kumar, L. Gaur and A. Tyagi, “Comparison
between multinomial and Bernoulli naïve Bayes for text [23] D. Arora and L. Gaur, (2006). “Data mining-An emerging
classification,” in 2019 International Conference on Automation, technique for customer relationship management (CRM),” Indian
Computational and Technology Management (ICACTM), 2019, pp. Journal of Marketing, vol. 36, no. 3, 2006.
593-596.
[24] J. E. Yoo and M. Rho, (2020). “Exploration of predictors for
[19] L. Gaur, U. Bhatia, N. Z. Jhanjhi, G. Muhammad and M. Masud, Korean teacher job satisfaction via a machine learning technique,
“Medical image-based detection of COVID-19 using deep Group Mnet,” Frontiers in Psychology, vol. 11, pp. 441, 2020.
convolution neural networks,” Multimedia systems, pp. 1-10, 2021.
[25] F. Rustam et al., “Review prognosis system to predict
[20] K. C. Santosh and L. Gaur, “Artificial Intelligence and Machine employees job satisfaction using deep neural network,”
Learning in Public Healthcare: Opportunities and Societal Impact,” Computational Intelligence, vol. 37, no. 2, pp. 924-950, 2021.
Springer Nature, 2022.
[26] A. I Kabir, K. Ahmed and R. Karim, “Word Cloud and
[21] M. Chaudhary, L. Gaur, N. Z. Jhanjhi, M. Masud and S. Sentiment Analysis of Amazon Earphones Reviews with R
Aljahdali, “Envisaging Employee Churn Using MCDM and Machine Programming Language,” Informatica Economica, vol. 24, no. 4, pp.
Learning. Intelligent Automation & Soft Computing, vol. 33, no. 2, 55-71, 2020.
pp. 1009-1024, 2022.

[22] L. Gaur, N. Z. Jhanjhi, S. Bakshi and P. Gupta, (2022, February).


“Analyzing Consequences of Artificial Intelligence on Jobs using

You might also like