You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357548985

AI-based Behavioural Analyser for Interviews/Viva

Conference Paper · September 2021


DOI: 10.1109/ICIIS53135.2021.9660757

CITATION READS

1 694

7 authors, including:

Lahiru Lakshan Pradeepa Samarasinghe


Sri Lanka Institute of Information Technology Sri Lanka Institute of Information Technology
1 PUBLICATION 1 CITATION 60 PUBLICATIONS 209 CITATIONS

SEE PROFILE SEE PROFILE

Madhuka Nadeeshani
Sri Lanka Institute of Information Technology
10 PUBLICATIONS 82 CITATIONS

SEE PROFILE

All content following this page was uploaded by Madhuka Nadeeshani on 07 February 2023.

The user has requested enhancement of the downloaded file.


AI-based Behavioural Analyser for Interviews/Viva
Dulmini Yashodha Dissanayake1, Venuri Amalya2, Raveen Dissanayaka3, Lahiru Lakshan4, Pradeepa Samarasinghe5,
Madhuka Nadeeshani6, Prasad Samarasinghe7
1
Faculty of Computing, Sri Lanka Institute of Information Technology, Sri Lanka
1
dulminid1107@gmail.com, 2venurihettiarachchi@gmail.com, 3raveend1997@gmail.com, 4lahirulakshan780@gmail.com,
5
pradeepa.s@sliit.lk, 6madhuka.n@sliit.lk, 7prasad.s@lankabell.com

Abstract— Globalization and technology have made virtual models that predict the overall interview scores with other
interviews to be the choice of recruitment. Even though online interview-specific traits as excitement, friendliness,
interviews/viva have eliminated time, budgetary, and engagement, and awkwardness. Support Vector Regression
geographical barriers, the lack of comprehension regarding the (SVR), Lasso and Random Forest (RF) models have
interviewee’s behavioural aspects is yet to overcome. Therefore,
produced the best regression and classification results.
a machine-based approach is proposed in this research for
detecting and assessing changes in interviewees' behaviour and Interview Performance Analyzer is a suggested system
personality traits based on nonverbal cues. Additionally, a that analyses an interviewee's performance by combining
group analysis of other applicants, as well as a comparison of emotion detection with speech fluency recognition [3]. A
the interview environment with the non-interview environment
Convolutional Neural Network (CNN) model that uses the
is also being obtained. To achieve this, we focus on the
candidate’s emotion, eye movement, smile, and head HaarCascade classifier and Gabor filters to recognize seven
movements. The system was carried out using deep learning and primary emotions and another model employing Mel
machine learning models which achieved accuracies over 85% Frequency Cepstral Coefficient (MFCC) characteristics and
for all smile, eye gaze, emotion, and head pose analysis. logistic regression to classify speech into four categories:
Furthermore, several machine learning models were developed Fluent, Stuttering, Cluttering, and Pauses have been
based on the analysed behavioural outcomes of the interviewee developed. Predictions from both models had been
to identify big five personality traits with Random Forest model combined to give the interviewee a performance rating.
yielding highest accuracy rate of over 75%. Our findings They had utilized the FER2013 dataset and CK+ dataset for
indicate that nonverbal behavioural cues can be utilized to
speech fluency and facial expression detection.
determine personality traits.
Studies have looked at the importance of both verbal and
Keywords—Deep learning, Personality traits, Emotion nonverbal cues through three different methods as audio,
analysis, Head movement, Eye gaze, Smile analysis video, and a questionnaire in determining the hirability of a
marketing and business analyst job positions [4, 5]. The
I. INTRODUCTION
dataset has been manually labeled and a questionnaire had
Traditional methods of conducting viva/interviews have been used to classify personality traits along with several
been transformed into a virtual format in the recent past. classification models. Nonverbal behavior along with
When compared to other traditional approaches, the virtual prosodic features were found to have predictive validity for
style of interviewing presents additional challenges for both hirability and stress resistance in the jobs positions.
the interviewer and the interviewee. Understanding the
interviewee's psychological state during the viva/interview is According to current systems, the “Fetcher” AI
one of the most complicated tasks. Virtual interviews, unlike recruitment platform employs AI to monitor the database and
face-to-face interviews, fail to provide a complete description delivers a supply of diverse and qualified candidates based on
of the interviewee as it limits the interviewer from observing the filled-out description [6]. The system only displays the
the interviewee's subtle behavioural changes due to the contact percentage, good fit, and bad fit percentage along
restricted appearance and low quality videos within poor with the number of views on the job post.
network connections. However Artificial Intelligence can “MyInterview” is another AI based solution that assists in
overcome the above issues and even eliminate interviewer getting to know candidates and identifying the best fit for the
bias. According to the survey mentioned in section III-A, the job role [7]. It builds a personality profile of the candidates
majority of respondents stated that they had a low confidence based on vocals and behaviour, where their algorithm
in tracking the interviewee's behaviour throughout a virtual assesses their matchability. It uses the Big Five personality
interview in terms of nonverbal behaviours. model to provide personality insights about the individual.
Most of the previous researches have considered one or These studies have primarily relied on audio analysis and
two nonverbal features at a time to address the problem raised emotions to identify various nonverbal signs, with time as a
in this study. Prosodic, lexical, and facial aspects have been variable. They have been able to predict the best fit using
gathered to develop systems for predicting and analysing job personality features. Furthermore, most of existing systems
interviews using interview videos of students at MIT have targeted on candidate resume filtering, screening,
university [1,2]. Prosodic reflecting the speaking style and scanning, and reference checks, as well as automating Human
rhythm of the speech, lexical providing data on the counts of Resource (HR) management procedures. For instance,
specific words, where facial expressions such as smiles, and Paradox [8] uses AI-powered processes to arrange interviews
head gestures. The three types of features have been then with reminders, and systems like XOR [9] and Humanely
concatenated and used to train regression and classification [10] use chatbots as a modern communication tool.

978-1-6654-2637-4/21/$31.00©2021 IEEE
Furthermore, technologies such as Loxo [11] and Eye blink    
Seekout [12] aid in the initial screening and evaluation of details detectio
n
applicant resumes, consequently becoming irrelevant to the Emotion    
scope of our system. detection
through video
Currently, none of the existing researches focus on a
Head nodding    
question-based approach to analyse personality traits based and shaking
on the interviewee's behaviour. And recent studies have been detection
limited to analysing interviewees through verbal qualities in Head pose    
addition they have focused solely on whether the candidate is features (Pitch,
Roll, and
good or not for the organization rather than a detailed Yaw) detectio
behaviour analysis. The main challenge is identifying n
personal traits, which are regarded as the most significant Used for viva    
aspects when evaluating a person. Our method identifies analysis
personal traits and provides a fair evaluation of the individual
by combining behaviours gathered through emotional state, II. METHODOLOGY
smile analysis, eye gazing, and head movement analysis.
Which are then employed to detect personal attributes and A. Datasets
make an unbiased assessment of the candidate. Since the proposed system is consisted of four main parts
as smile, emotion, eye, and head analysis, each segment was
Furthermore, the system also focuses on providing a developed and trained using its own dataset. The ‘SMILEs’
group analysis where the performance of an individual dataset which included 13665 images of smile and non-smile
candidate can be compared with the average performance. was used to train the CNN model for smile detection
Also, the proposed system provides a comparison of the [13]. The ‘SPOS’ dataset comprised of 84 posed and 147
candidate performance in an interview environment with spontaneous facial expression clips and USTC-NVIE dataset
none interview (normal environment). Apart from HR were used to train the CNN model for detecting the
recruitment interviews, which are currently the main focus of genuineness of the smile [14-16].
existing systems, our approach can be used to evaluate
student performance in viva examinations as Kayvan Shan's ‘Eye-Dataset’ with 14500 photos divided
well. The system may also be used as a substitute or in into forward look, left look, right look, and down look
conjunction with physical interviews. A more detailed categories was used for the eye gaze direction CNN [17].
comparison of the existing systems and researchers are ‘Eye Aspect Ratio’ dataset which contained 771 records was
tabulated in Table I and Table II respectively. used to detect eye blinks [18]. Eye blink status for eye open
and eye blink, were designated as 0 and 1. The ‘Chon Kanade'
dataset, which contains 5876 images classified into Happy,
TABLE I. EXISTING VS PROPOSED SYSTEM COMPARISON Angry, Contempt, Disgust, Fear, Sadness, and Surprise
Proposed categories, was used for the emotion detection CNN [19]. The
Features Fetcher [6] MyInterview [7] ‘Biwi’ and ‘Helen’ datasets were used for the head movement
System
Role fit    component, with 10,000 images randomly selected from each
Personality Traits    dataset [20]. On yaw, the head position range is +-75, and on
pitch, it is +-60.
Behavioural Analysis   
Description 1) Interview Dataset
Comparison with the   
average behaviour Mock interviews were conducted as online virtual
interviews over the ‘Zoom’ platform. Initially, 10 participants
Comparison with the   
normal environment participated in the mock interviews. Each participant was
Used for viva analysis    asked an identical set of 26 questions during the session by
the interviewer. The recorded videos were analysed and
TABLE II. LITERATURE REVIEW COMPARISON labelled by experienced HR personnel and was split into 8:2
ratio for training and testing data. The personality traits of the
Automated interviewees were ranked on a five-point scale. The
Leveraging Interviewee personality traits which were ranked included agreeableness,
prediction Proposed
Features Multimodal performance
Framework
Analysis [2] Analyzer [3]
System openness, neuroticism, conscientiousness, and extraversion
MIT [1] [21].
Speaker    
diarization B. Preliminary Phase
through
audio Speech-To-Text API offered by Google Cloud Platform
    was used for speaker diarization to differentiate the speakers
Smile analysis and separate the questions in the video [22]. Which were then
used to calculate the time it took the interviewee to
Eye Gaze    
direction
respond. Afterwards, the video frames were extracted to the
detection exact time period given by the question's start and finish
timings. Subsequently, the dlib face detector was used to
identify the landmarks of the interviewee’s face [23]. All the The average person blinks between 12 and 15 times per
identified frames with associated question number and face minute, but this can vary depending on the individual [25].
landmarks details were sent to all other subcomponents. Therefore, attention was calculated by storing the blinks per
C. Subsystem minute for each question in an array and utilizing it to
determine the interviewee's quartiles one and three blinks per
minute values. The blinks per minute obtained for the
subsequent questions are compared with the values obtained
for quartiles one and three. If blinks per minute are less than
the quartile one value, it indicates increased attention, and if
blinks per minute are greater than the quartile three value, it
shows decreased attention, while the remaining values are
within the normal range [26-28].
3) Emotion Analysis:
In order to detect the emotion, a CNN model with
multiclass classification as happy, sad, angry, contempt,
Fig. 1. System Diagram surprise, disgust and fear was implemented. Since the
emotion cannot be changed within 30 seconds, the frames
S0 - Smiling percentage and smile genuineness level were grouped into one second and were analysed to
E0 - Eye blinking rate, attention and drowsiness determine the emotion. Subsequently, the average predictions
Em0 – Prominent Emotion were used by the system to determine the most prominent
H0 – Head nodding and head shaking count emotion of the interviewee in a question-based manner by
identifying the variation of the emotion.
1) Smile Analysis:
4) Head Movement Analysis:
Detecting the smile and identifying its genuineness were
completed using two CNN models. The smile detection The CNN regression model serves as the foundation for
model was built through Lenet architecture and binary the head movement analysis component. The head motions
classification was used to predict results as ‘smiling’ or ‘non- were not precise to 30 milliseconds, and to minimize the error
smiling’ [24]. Another CNN model was implemented using rate of the prediction, the component evaluates at a rate of 18
Resnet-18 and convLSTM networks in order to identify frames per second. The processed image was scaled to
discriminative smiling features and predict whether the 224x224 pixels as the deep learning model's input size. The
observed smile was 'spontaneous' or 'posed'. If the output was scaling was done using the bounding box coordinates
predicted to be 'smiling,' the frame associated with the provided by the dlib face detector. The model was then used
prediction, as well as its previous and next frames, were to determine the yaw, pitch, and roll angles for that video
scaled to 48x48 pixels and sent to smile category detection frame.
CNN model. Consecutive frames were employed to aid in the Head nodding and head shaking gestures were computed
identification of discriminative features of the smile. based on the data obtained from the preceding phase. The
Average percentages of smiles and non-smiles from the pitch and yaw values represent the rotation around the X-axis
frames were calculated by the system depending on the Y-axis respectively. According to functional testing, a 5-
duration for each question after the model predictions were degree threshold value difference on three consecutive series
completed. of pitch and yaw values was considered a head-nodding
movement and head-shaking action respectively. The yaw,
2) Eye Gaze Analysis and Blink Detection:
pitch, and roll data were utilized to calculate the average head
The three subcategories of eye component comprises of rotation in response to a specific question. The average
determining the interviewee’s eye gaze direction, values were determined by dividing the sum of individual
ascertaining the eye blinking average and finally identifying displacement magnitudes by the whole frame.
the interviewee attention and drowsiness. Initially the points
of the right and left eyes were detected using dlib. The Eye 5) The Personality Model:
Aspect Ratio (EAR) was calculated (1) using six points: The interviewee's personality traits can be determined to
starting from the left corner of the eye with two points in the assess the candidate's strengths, weaknesses, and adaptability
upper lid to the right corner of the eye and two points in the to consequently enabling the interviewers in identifying the
lower lid (p1– p6) in both left and right eyes. The EAR was most suitable candidate. The system follows Big Five
then sent into the blink detection model to determine whether traits as neuroticism, extraversion, openness, agreeableness,
a blink occurred in that frame according to the EAR value. and conscientiousness when considering the interviewee’s
EAR = || p2 – p6 || + || p3 – p5 || / 2 ||p1 – p4|| (1) personality characteristics [21].

Subsequently, the image was cropped by identifying the RF model per each personality trait was utilized to
top, left, right, and down points of both left and right eyes provide a rating for the interviewee due to its capacity to
separately which was then passed to a CNN model to forecast high-dimensionality datasets. The inputs to the
determine gaze direction. The model identifies whether the prediction model are determined by the outputs of the four
interviewee is looking at the screen, down, left, or right. major components including smile percentage, genuineness
of the smile, most prominent and second most prominent gaze
direction and emotion, as well as blinking rate, and head 95%. Also, the precision, recall and f1-score gained for this
gesture motions. model ranging from 25% 45%.
6) Group Behavioural Analysis and Comparison with the However, when the ‘Adadelta,’ optimizer was used the
Normal Environment training accuracy increased to 97.51%. Despite the fact that
model trained by both optimizers training and validation
The interviewer is presented with an overall analysis that accuracies were nearly identical, the model's testing accuracy
compares all of the interviews in a candidate group. Average was found to be 85%. This model also has superior Precision,
group values are calculated by adding all the results obtained Recall, and F1-Score.
from the subcomponents of each interview done under the
given group and comparing them to the individual findings The machine learning model developed to determine the
obtained. blinking status, given the input feature as EAR value was
compared with 4 other models to determine which model
Interviewers have the option of adding a video of the
provided the best accuracy. The Support Vector Machine
interviewee in a non-interview context to compare with the
interview environment where non-interview context is an (SVM) was made using the linear kernel. A number of 2
environment with no interview questions. The video is neighbours were used for the K-Nearest Neighbour. This was
identified to be the least effective. The RF model with a
broken down into frames and they are provided into the
maximum depth of 2 obtained the best accuracy; as the depth
normal environment component where it identifies the
was raised, the accuracy dropped to 93%, but the random
average values for nonverbal cues.
state had no effect on the accuracy. With high sensitivity and
III. RESULT AND DISCUSSION specificity, the RF Classifier model outperforms all others.
A. Survey TABLE IV. EYE BLINKING DETECTION MODEL
A total of 30 professionals comprised interviewers from Specificity
major companies and lecturers who conduct viva participated Model Accuracy (%) Sensitivity (%)
(%)
in the study. The survey included a questionnaire that SVM 90 91.3 89.6
compared physical and virtual interviews, as well as their Decision Tree 93 98.4 87.7
Naïve Bays 90 91.2 89.6
preferences as shown in TABLE III.
RF 94 98.4 89.6
KNN 79 69.8 91.5
TABLE III. SURVEY QUESTIONS
Question Yes No 3) Emotion Analysis
(%) (%)
Do you prefer traditional interview over virtual 70 30 The model was initially trained using the SPOS dataset.
interview Despite the fact that the SPOS dataset yielded a high accuracy
Can you determine the interviewee genuineness 10 90
the results were limited to six output labels. Therefore, the
through a virtual interview
Can you monitor interviewee’s eye gaze and head 23.5 76.5 CK+ and FER datasets were used to train the model [29].
movement throughout the interview Since the data labelling for the datasets differ, both datasets
Can you track the smile and emotion of interviewee 71.5 28.5 were trained separately to acquire the most suitable training
dataset for classifying the emotion. TABLE V summarizes
B. Result
the model results obtained for the datasets and the augmented
1) Smile Analysis CK+ was chosen as the most suitable dataset for training the
model.
The smile has been analysed using separate CNN models.
The smile detection CNN model was able to attain its best TABLE V. EMOTION DETECTION MODEL RESULTS
performance when it was trained for 15 epochs along with Dataset Name Number of Training Validation
using the 'Adam optimizer'. The model was able to obtain an Emotion Accuracy Accuracy
overall accuracy of 93% along with precision, recall, and f1- Categories (%) (%)
scores ranging from 80%-90%. SPOS 6 96 94
FER 7 80 80
The second CNN model for identifying the genuineness Original CK+ 7 65 62
of a smile was first trained using SPOS dataset. Due to the Cropped & Augmented 7 90 90
small size of the smile data in the SPOS dataset, the model CK+
only achieved a low training accuracy rate of 72%. Then the
size of the training dataset was increased by combining the 4) Head Analysis
USTC-NVIE dataset, resulting in a training accuracy of 88%
and a testing accuracy of 85%. Due to the fact that both CNN The images which were used to train the model, were pre-
models provided a binary classification, binary crossentrophy processed and resized using YOLO head object detection.
was used while training the models. The backbone of the model is EffiecientNetB0 architecture.
The evaluation of the model was done using Mean Squared
2) Eye Gaze Analysis Error (MSE) and Root Mean Squared Error to identify error
The base model used for the CNN model was VGG16 and of predicted value.
model obtained by training with a value of 21 epochs was The Mean Absolute Error Percentage (MAEP) was
identified as the best performed. When this model was trained measured to compare the models which were derived from
with the ‘Adam’ optimizer the training accuracy obtained was BIWI dataset. The results were derived from a set of models
which were trained on random data selection. MAEP of the Table IX. Two studies which obtain the personality traits
model was identified to be 0.15%. In addition, the WHENET through text [31] and facial images [32] were utilized.
model is tested using transfer learning methods to predict
head movement angles, since its weights are trained with TABLE IX. COMPARATIVE ANALYSIS WITH EXISTING SYSTEM
larger number of images sets, the model provides a high Through
accuracy [30]. TABLE VI summarizes the methods used to Through facial Proposed
determine the optimal CNN model strategy. Method text[31] images[32] system
Extraversion 77.18% 73.23% 74.42%
TABLE VI. REGRESSION CNN MODEL EVALUATION- MSE FOR YAW, Neuroticism 61.47% 64.35% 86.05%

Accuracy
PITCH AND ROLL.
Agreeableness 75.51% 60.68% 79.07%
Method Yaw MSE Pitch MSE Roll MSE Average
MSE Conscientiousness 70.34% 69.56% 72.09%
Using Load weights of 4.39 4.45 3.46 4.1 Openness 80.38% 61.48% 74.42%
WHENET
Complexity of the system Low High Fair
Trained model – using 5.05 6.2 4.8 5.35
300W_LP, BIWI Ease of use High Low High
Input features Sentences Facial images Videos
The model includes three output layers to measure yaw, ChaLearn
pitch and roll values. The ‘Adam’ optimizer was used to Several First
essay Impression Interview
minimize the cost function of the model when training. The Input dataset datasets [33] dataset
MSE approach was utilized as a step in the custom
implemented loss function.
6) Sample test results
TABLE VII shows the model performances that were
Fig. 2 illustrates images acquired from the extracted test
used to assess interviewee behavioural analysis. All datasets
results through each smile detection and categorization, eye
used to train the models in sub systems were divided into
gaze, emotion, and head movement detection models. A set
subsets for training, validation, and testing at a ratio of 3:1:1.
of frames per specific duration were analysed and utilized to
TABLE VII. CNN MODEL PERFORMANCE SUMMARY
obtain predictions for each personality trait. The results of the
personality trait analysis were stated as a rating obtained from
CNN Model Validation F1 a five-point scale.
Precision Recall score
Name Accuracy (%)
Smile 93 0.86 0.88 0.87
Detection
Smile 86 0.79 0.88 0.83
Category
Eye Gaze 97.41 0.95 0.94 0.95
Detection
Emotion 90 0.85 0.90 0.87
Detection

5) Personality trait model


The personality analysis component was built using five
RF models. The component is generated from the five
previously stated personality characteristics. A certain trait
was assigned a score between 1 to 5 for each personality type.
In order to perform an extensive evaluation and identify the Fig. 2. Sample test results for an interview video
best performing model, five different classifiers were trained
to predict the big five personality traits. Table VII IV. CONCLUSION AND FUTURE WORK
summarizes the classification accuracies obtained using While the importance of virtual interviews has grown with
SVM, RF, KNN, and Multi-layer perceptron (MLP) models. time and technological advancements, they continue to fall
Accordingly, RF model surpasses all other classifiers except short of providing a comprehensive evaluation of the
for neuroticism and conscientiousness. interviewee’s behaviour. Current systems merely determine if
an applicant is suitable for the position without demonstrating
TABLE VIII. TEST ACCURACY COMPARISON OF PERSONALITY a behavioural analysis. Also, the contribution of nonverbal
ANALYSIS METHODS
cues in personality assessments in such systems are minimal.
Personality trait SVM (%) RF (%) KNN MLP Therefore, through this study we identified that, personality
(%) (%) traits can be determined through facial and head movements
Neuroticism 86.05 86.05 86.05 86.6 rather than just lexical and prosodic characteristics
Extraversion 65.12 74.42 69.77 68.1 subsequently allowing the interviewers to recognize
Openness 72.09 74.42 72.09 70.77 nonverbal characteristics through an online interaction.
Agreeableness 76.64 79.07 76.74 69.77
Additionally, providing a continuous behavioural analysis
Conscientiousness 72.79 72.09 65.12 78.01
along with a group analysis with the normal environment
A comparative analysis of existing and proposed which are not offered by any other existing system.
approaches for personality trait detection is presented in
Through the proposed system, several CNN and machine expression recognition framework,” in 2011 IEEE International
learning models were used consisting of smile detection and Conference on Computer Vision Workshops (ICCV Workshops).
categorization, eye gaze direction and eye blinking, emotion IEEE, 2011, pp. 868– 875.
detection and head movement where an accuracy over 85% [15] S. Wang, Z. Liu, S. Lv, Y. Lv, G. Wu, P. Peng, F. Chen, and X. Wang,
“A natural visible and infrared facial expression database for
were achieved. Furthermore, the RF model was identified to
expression recognition and emotion inference,” IEEE Transactions on
be the best classification model for determining big Multimedia, vol. 12, no. 7, pp. 682–691, 2010.
five personality traits, as it performs well with high- [16] S. Wang, Z. Liu, Z. Wang, G. Wu, P. Shen, S. He, and X. Wang,
dimensionality input features caused by a variety of non- “Analyses of a multimodal spontaneous facial expression database,”
verbal cues. IEEE Transactions on Affective Computing, vol. 4, no. 1, pp. 34–46,
Our immediate next stage will be to upgrade the system to 2012.
[17] K. Shah, “Eye-dataset,” Apr 2020, Accessed on: August 15, 2021.
suggest questions for the interviewer by analysing the
[Online]. Available: https://www.kaggle.com/kayvanshah/eye-dataset
behavioural state of the candidate. Additionally, the system
[18] K. N. Bhavana, “Eye aspect ratio dataset,” Sep 2020, Accessed on:
should be developed for the personalization of required
August 15, 2021. [Online]. Available:
personality attributes for a specific job position. This approach
https://www.kaggle.com/knavyabhavana/eye-aspect-ratio-dataset
has the potential to be expanded to provide significant insights [19] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I.
into employment interviews, online viva examinations, and Matthews, “The extended cohn-kanade dataset (ck+): A complete
human behaviour in general. dataset for action unit and emotion-specified expression,” in 2010 ieee
computer society conference on computer vision and pattern
REFERENCES recognition-workshops. IEEE, 2010, pp. 94–101.
[1] I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque, “Automated [20] K. S. Mader, “Biwi kinect head pose database,” May 2019, Accessed
prediction and analysis of job interview performance: The role of what on: August 15, 2021. [Online]. Available:
you say and how you say it,” in 2015 11th IEEE international https://www.kaggle.com/kmader/biwi-kinect-head-pose-database
conference and workshops on automatic face and gesture recognition [21] P. T. Costa Jr and R. R. McCrae, “Personality stability and its
(FG), vol. 1. IEEE, 2015, pp. 1–6. implications for clinical psychology,” Clinical Psychology Review,
[2] A. Agrawal, R. A. George, S. S. Ravi et al., “Leveraging multimodal vol. 6, no. 5, pp. 407–423, 1986.
behavioral analytics for automated job interview performance [22] “Release notes | cloud speech-to-text documentation | google cloud,”
assessment and feedback,” arXiv preprint arXiv:2006.07909, 2020. Accessed on: August 15, 2021. [Online]. Available:
[3] Y. Adepu, V. R. Boga, and U. Sairam, “Interviewee performance https://cloud.google.com/speech-to-text/docs/release-notes
analyzer using facial emotion recognition and speech fluency [23] D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of
recognition,” in 2020 IEEE International Conference for Innovation in Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
Technology (INOCON). IEEE, 2020, pp. 1–5. [24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
[4] L. S. Nguyen, D. Frauendorfer, M. S. Mast, and D. Gatica-Perez, “Hire learning applied to document recognition,” Proceedings of the IEEE,
me: Computational inference of hirability in employment interviews vol. 86, no. 11, pp. 2278–2324, 1998.
based on nonverbal behavior,” IEEE transactions on multimedia, vol. [25] A. A. Abusharha, “Changes in blink rate and ocular symptoms during
16, no. 4, pp. 1018–1031, 2014. different reading tasks,” Clinical optometry, vol. 9, p. 133, 2017.
[5] A. T. Rupasinghe, N. L. Gunawardena, S. Shujan, and D. Atukorale, [26] T. Sakai, H. Tamaki, Y. Ota, R. Egusa, S. Inagaki, F. Kusunoki, M.
“Scaling personality traits of interviewees in an online job interview by Sugimoto, H. Mizoguchi et al., “Eda-based estimation of visual
vocal spectrum and facial cue analysis,” in 2016 Sixteenth attention by observation of eye blink frequency,” International Journal
International Conference on Advances in ICT for Emerging Regions on Smart Sensing and Intelligent Systems, vol. 10, no. 2, pp. 296–307,
(ICTer). IEEE, 2016, pp. 288–295. 2017.
[6] Fetcher, “Fetcher,” Accessed on: August 15, 2021. [Online]. Available: [27] A. Maffei and A. Angrilli, “Spontaneous eye blink rate: An index of
https://fetcher.ai/ dopaminergic component of sustained attention and fatigue,”
[7] “Intelligent candidate video screening,” Accessed on: August 15, 2021. International Journal of Psychophysiology, vol. 123, pp. 58–63, 2018.
[Online]. Available: https://www.myinterview.com/ [28] J. Oh, S.-Y. Jeong, and J. Jeong, “The timing and temporal patterns of
[8] “The ai assistant for recruiting, olivia,” Accessed on: August 15, 2021. eye blinking are dynamically modulated by attention,” Human
[Online]. Available: https://www.paradox.ai/ movement science, vol. 31, no. 6, pp. 1353–1365, 2012.
[9] “Ai recruiting software and platform,” Accessed on: August 15, 2021. [29] M. Sambare, “Fer-2013,” Jul 2020, Accessed on: August 15, 2021.
[Online]. Available: https://www.xor.ai/ [Online]. Available: https://www.kaggle.com/msambare/fer2013
[10] “Conversational ai for recruiting platform,” Accessed on: August 15, [30] Y. Zhou and J. Gregson, “Whenet: Real-time fine-grained estimation
2021. [Online]. Available: https://humanly.io/ for wide range head pose,” arXiv preprint arXiv:2005.10353, 2020.
[11] “Loxo - recruiting automation,” Accessed on: August 15, 2021. [31] Jkwieser, “Jkwieser/personality-detection-text: Predicting big five
[Online]. Available: https://www.loxo.co/ personality traits from a given text.” Accessed on: August 15, 2021.
[12] “Ai-powered recruiting software - talent 360,” Accessed on: August [Online]. Available: https://github.com/jkwieser/personality-
15, 2021. [Online]. Available: https://seekout.com/ detectiontext
[13] Hromi, “Hromi/smilesmiled: Open source smile detector haarcascade [32] Miguelmore, “Miguelmore/personality,” Accessed on: August 15,
and associated positive amp; negative image datasets,” Accessed on: 2021. [Online]. Available: https://github.com/miguelmore/personality
August 15, 2021. [Online]. Available: [33] “Competition,” Accessed on: August 15, 2021. [Online]. Available:
https://github.com/hromi/SMILEsmileD https://competitions.codalab.org/competitions/9181
[14] T. Pfister, X. Li, G. Zhao, and M. Pietikainen, “Differentiating sponta-
¨ neous from posed facial expressions within a generic facial

View publication stats

You might also like