Professional Documents
Culture Documents
03 06 2021 Updated Paper
03 06 2021 Updated Paper
models can effectively analyze human nonverbal signs and assign their personality
qualities with the use of a camera, thanks to advances in computer vision and pattern
system was developed using asynchronous video interview (AVI) processing and a
features extracted from the AVIs and true personality scores from the facial expressions
and self-reported questionnaires of 120 real job applicants in this study. The trial
findings reveal that our AI-based interview agent can correctly distinguish an
interviewee's "big five" characteristics with an accuracy of 90.9 percent to 97.4 percent.
Our research also shows that, despite the lack of large-scale data, the semi supervised
despite the absence of time-consuming human annotation and labelling. Existing self-
reported personality evaluation techniques that job candidates may falsify to obtain
agent.
Index Terms- Big five, convolutional neural network (CNN), personality computing,
TensorFlow
I. Introduction
and organizational (I/O) psychologists [1]. Some businesses utilize self-reported surveys to
assess job candidates' personalities; nevertheless, job seekers may lie about their personality
qualities in order to increase their chances of landing a job [2]. Because candidates have a
hard time faking nonverbal signs, some employers utilize their facial expressions and other
nonverbal indicators during job interviews to assess candidates' personalities [3]. Due to cost
and time constraints, it is not feasible for every job candidate to attend a live job interview in
person or participate in interviews done through telephone or web conferences [4]. One-way
asynchronous video interview (AVI) software may be used to interview job candidates
automatically at a single point in time. Employers may evaluate the audio-visual recordings at
a later date using this method [5]. Human raters find it difficult to appropriately analyze
applicants' personality attributes based on video pictures when utilizing AVI [6]. Human
raters were unable to effectively identify an applicant's personality just by viewing recorded-
and predictive power than human raters, both I/O psychology and computer science scholars
have suggested that artificial intelligence (AI) may surpass humans in recognising or
predicting an applicant's personality for screening job applicants [8]–[11]. “AI is a discipline
of computer science that tries to create intelligent computers that act in a way comparable to
human intelligence” [12], and it “aims to expand and enhance human capability and
efficiency in activities of remaking nature” [13]. Machine learning (ML) is a key component
of AI, since it “allows computers to learn without having to be explicitly programmed” [14].
Deep learning (DL) is an ML implementation methodology that can “mimic the human brain
process to comprehend input including pictures, sounds, and texts” [15]. DL feature
three types of ML/DL [12]. Unsupervised learning may automatically learn the right answers
from a huge quantity of data without needing predetermined labels [10], [16]. Supervised
training data (called "ground truth"). Semi-supervised learning combines the two techniques
by employing a lesser quantity of unlabeled data along with some labelled data for pattern
recognition; as a result, this strategy may minimize labelling efforts while maintaining high
accuracy.
machine learning (ML), which requires human labelling and is time intensive [17]. Because
convolutional neural networks (CNNs) have been shown to be high-performing models for
automatically processing images and inferring first impressions from camera images, this
agent that can automatically recognize a job applicant's personality based on relatively
smaller datasets.The following is how the rest of the article is organized: The backdrop of
APR from audio-visual data is discussed in Section 2. Our data processing strategy is
described in Section 3. Section 4 contains a thorough model and its outcomes. Finally, in
Section 5, we discuss and summarize our results as well as our future work.
A. Personality Taxonomy
referred to as personality [19]. This concept is often used to predict whether a job seeker will
do well in a particular work function and participate effectively in a potential cultural
environment [20]. The "big five" features, also known as the five-factor model (FFM) or
OCEAN model, offer academics and practitioners with a well-defined taxonomy for choosing
job candidates [20]. The big five fundamental characteristics of openness, conscientiousness,
extraversion, agreeability, and neuroticism (poor emotional stability) are classified and
Imaginative and creative openness: the degree to which a person is imaginative and
creative.
thoughtfulness.
fondness.
Neuroticism: a person's stress, moodiness, and anxiety are reflected in this trait.
B. Personality Computing
autonomous approach, personality refers to the reasons, intentions, emotions, and actions
stated by a person. From the observer point of view, personality includes information on the
social reputation of an individual, but intimate contacts such as partners, friends or colleagues
should ideally be able to acquire reliable observation ratings [19]. In I/O psychology, the
authentic evaluations are the basic information utilized to anticipate individual workplace
behaviour and performances, when it is difficult to determine the valid five major features of
the observer. Selbstbewertungen may also be used to forecast whether a job applicant is in a
zero-knowledge setting, like a job interview [19], in good match for the job criteria and the
business culture.
The respondents use distal signals to externalize their apparent personality (i.e., any
observable behaviour that can be perceived by the interviewer, such as facial expression,
gaze, posture, body movement, speaking, and prosody). Alternatively, the interviewer uses
proximal cues (i.e., any interviewee behaviour that are actually perceived by the interviewer,
including indirect observable cues) to attribute the unobservable personality traits of the
interviewee; however, these cues can translate into perceptions by the interviewer [6], [23].
Despite the fact that only the candidates' heads and torsos are visible in AVI,
personality psychology researchers have discovered that an interviewer or rater may still
employ nonverbal clues to determine the candidates' personality qualities [24]. Individuals
may infer genuine personality attributes to zero acquaintances based on short video snippets,
according to several experimental investigations [23]. The lens model is being used to
automatically detect, detect, and synthesize human behavioral signals and personality using
personality [6].
distal signals by extracting characteristics from the audio-visual data of AVI [9], [25]–[27].
proximate clues [28]. Because investigating proximal cues is difficult, APP relies on distal
cues as a proxy, as discussed in [8, [10], [17], [29]–[33]. Artificial agents, avatars, or robots
are employed in APS to exhibit human-like distal signals, with the purpose of people
recognizing and inferring these signals as preset attributes [34], [35]. AVI may be modified
and integrated with APR to auto-recognize interviewees' "actual scores" compared to their
C. Mel-Frequency Cepstrum
representation. To get it, we'll use a nonlinear Mel frequency scale and the linear cosine
(MFCCs). A nonlinear spectrum of spectrum, a sort of cepstral depiction of the audio clip, is
used to create MFCCs. In the case of MFC, the frequency bands are evenly spaced on the
Mel scale to resemble the human auditory system's response closer than the linearly separated
2. Use triangular overlapping panes to map the spectrum's powers onto the Mel scale.
4. Take the discrete cosine transform of the Mel log powers from the list above.
communication skills and personal qualities. For information to be passed accurately, both
verbal and nonverbal means of communication are used. These include gestures, posture, and
tone. In order to achieve this, the researcher aims to suggest a revolutionary technique that
A. Data Collection
We created AVI cloud-based software, similar to the work in [42], to build our dataset
in a genuine job interview setting. The AVI server may accept recorded video prompts,
construct interview scripts, send video prompts from the interview, and receive video answers
using Google cloud storage. The video replies' content may subsequently be utilized to do
algorithmic studies, such as audio and visual data analytics of the video replies. The replies of
interviews may be recorded at one moment in time but afterwards assessed by an algorithm,
(HR) organization. Our sponsor made a job description available to its members on a web
website for recruiting 2–3 HR experts from a connected firm in Ahmedabad, Rajkot and
Baroda. Members that were interested sent their resumes to the researchers, who vetted the
resumes based on the job description. A total of 120 candidates were asked to log into the
mobile or computer devices at any time of day through a browser. The replies of the
respondents were captured for examination, including both audio and visual information. Our
algorithms would record and evaluate the full of the candidates' interview procedures and
replies, including audio and video, and utilize it as a reference for hiring recommendations.
During the AVI, the interview questions were arranged in a regular fashion. The same
five questions were given to all candidates, and they were behaviorally geared to evaluate the
candidates' communication abilities in light of the job description. Each question was shown
on a separate screen, and as the candidates entered the screen, the audio for the text questions
began immediately. The candidates were allowed a maximum of 3 minutes to answer each
question, which were displayed on-screen one at a time in order. Within the 3-minute time
limit, candidates might opt to skip to the following question. After 3 minutes, a new screen
with the following question opened automatically. The whole video interview procedure took
B. Data Labelling
[43] to evaluate the applicants' self-rated big five qualities to acquire genuine ratings for the
individual big five features [6]. All candidates were requested to take the IPIP survey online
before to participating in the AVI and were told that the survey data would be sent to
researchers alone and would have no bearing on the hiring recommendation. This method
was carried out to lessen the impacts of social desire, which may cause self-rated personality
C. Feature Extraction
We began using the pretrained Inception-v3 dataset acquired for ImageNet, which has
over 14 million photos organized into 1,000 classes, to capture the candidates' facial
expressions. In addition, as shown in Figure-2, we trained our face recognition model using
OpenCV and Dlib while monitoring 86 facial landmark points every frame. Furthermore,
since this landmark point is unaffected by facial emotions, we utilized it as the anchor point
location during feature extraction to limit background noise and decrease mistakes such head
motion [45].
create the feature extractor. All of the pictures were normalized to 640 pixels in width, with
the height of each picture set by the vision device's pixel ratio. As shown in Figure-3, we
retrieved the characteristics of the 86 landmark points from each frame during a 5-second
period from all of the AVI data for each applicant. We turned all the photos to grayscale to
enhance picture categorization and eliminate background interference from hair and
cosmetics. More than 10,000 photos were utilized in this experiment's test scenarios.
A. Model Building
To train our APR model, which was developed using a sophisticated CNN developed
in Python and the TensorFlow DL engine [46], we integrated the 120 applicants' personality-
labeled data with their extracted characteristics. We normalized the features by rescaling the
range of feature values to [0, 1] before feeding them into the neural network model. The
initial few layers of the neural network model, which had over 72 thousand neurons, were
designed to automatically extract information from photos that had been preprocessed as
stated in the preceding section. The retrieved features were then concatenated with additional
features before being sent to the output layer for classification. Our CNN structure included
four convolutional layers, three pooling layers, ten mixed layers, a fully connected layer, and
The neural network's inputs were the extracted elements of the applicants' facial
expressions, and the output was their self-assessed big five personality trait scores. As
illustrated in Figure-6, the completely linked network was formed utilizing the relationships
between distinct nodes. A grayscale picture was used to represent the input. Three filter
functions were utilized in each of the four convolutional layers. Convolutional layer 2 has 64
convolution filters, up from 32 in convolutional layer 1. A pooling layer was added after each
convolutional layer. The dropout rate was set at 0.1 and both pooling layers (one average-
pooling and two max-pooling) had a stride of 2 2. There were 2,048 neurons in the last
completely linked layer, with dropouts of 0.4 and 0.5. The suggested CNN's last layer was a
softmax layer with 50 potential outputs (10 interval- scale classes from 1.1 to 6.0 that
reflected the big five personality classifications). To avoid overfitting, we included dropout
after the fully connected layers at the end of our convolutional network, beginning with a
probability of 0.5 and progressively decreasing the dropout rate until the performance was
optimized.
We divided the data in these studies as follows: the test set was 50%, and the
validation set was 50%, based on the same sampling. In this dataset, each candidate
possessed five distinctive attributes (the big five qualities). 4,000 training iterations were
completed. The training batch size was 256, the learning rate was 0.01, the evaluation
To analyze the performance of our APR, we employed concurrent validity to see how
well the novel measuring technique (the APR) corresponded with the well-established
measuring process (the self-reported inventory). In this experiment, we utilized the Pearson
correlation coefficient (r) to quantify concurrent validity, as described in [6] and [47]. We
also used [8] to calculate the coefficient of determination (R2) and mean square error (MSE).
The R2 reflects the amount of variation in the dependent variable (y) that the predictor can
predict or explain. The better the model, the higher the R 2. The MSE quantifies the regression
model's quality of fit, the higher the value, the bigger the error.
C. Results
Prior to evaluating our APR's performance, we tested the construct validity and
statistical software for the social sciences (SPSS v 23). A confirmatory factor analysis
revealed that each factor loading was larger than 0.6, while the Kaiser-Meyer-Olkin (KMO)
value was larger than 0.8, indicating that the concept validity was adequate in this research.
The internal consistency reliability was good because the Cronbach's alpha (α) values were
all larger than 0.7 as follows: Openness to experience (α = .75), conscientiousness (α = .83),
The AI TensorFlow engine effectively learnt and predicted all of the dimensions of
the big five qualities, as shown in Table I. APR could predict all of the genuine big five
personality self-assessment scores. Each dimension's Pearson correlation was between 0.966
and 0.976. Each dimension's R2 ranged from 0.933 to 0.952. The MSE for each dimension
was between 0.053 and 0.120, and all correlations were determined to be significant (p 0:01).
The better the estimator, the greater the R2 (100 percent is excellent). In contrast, the smaller
the estimator error is, the lower the MSE is (0 is ideal). Furthermore, the average accuracy of
the classifiers (ACC) was 95.36 percent, according to the classification accuracy data.
p < .01
VI. Discussions
This research is in response to a request for personality computing research [40], [48],
[49]. Validating APR using manually labelled features from any discernible distal cues was
fairly difficult in classical personality computing [6]. As a result, several recent research have
used deep learning architectures to predict personality using third-party datasets like
Amazon's Mechanical Turk or ChaLearn's First Impressions dataset [40]. However, most of
these experiments employed APP, in which the DL engines acted as observers detecting
nonverbal clues from interviews and making inferences about the interviewees' personality
Previous research has showed that deep neural networks that learn multimodal
variables (image frames and audio) perform better than unimodal features in predicting the
big five qualities. To learn how to discern an interviewee's personality, we may combine our
visual technique with prosodic elements in future research. Furthermore, the participants in
this research were a specialized sort of professional, which may restrict the generalizability of
the findings. A more varied participant pool should be included in future studies.
VII. Conclusion
Based on just 120 genuine samples of job candidates, this article created an AVI
our APR technique produced an accuracy of over 90%, exceeding earlier relevant laboratory
experiments with accuracy ranging from 61% to 75% [6]. The high-performing APR utilized
approaches that may be skewed by job candidates owing to societal pressure to be hired.
References
[1] A. M. Ryan et al., "Culture and testing practices: is the world flat?" Appl.
[3] T. DeGroot and J. Gooty, "Can nonverbal cues be used to make meaningful
[4] I. Nikolaou and K. Foti, "Personnel selection and personality," in The SAGE
judgments of job applicant personality traits," Personnel Psychology, vol. 53, no. 4,
pp. 925–951, Dec. 2000.
prediction of job interview performance," IEEE Trans. Affective Comput., vol. 9, no. 2,
[11] M. Langer, C. J. König, and K. Krause, "Examining digital interviews for personnel
[12] Y. Xin et al., "Machine learning and deep learning methods for cybersecurity," IEEE
[13] J. Liu et al., "Artificial intelligence in the 21st century," IEEE Access, vol. 6, pp.
prospects," Science, vol. 349, no. 6245, pp. 255–260, Jul. 2015.
classifiers for human posture recognition," in Int. Conf. Comput. Commun. Eng.
Association, 2015.
[20] L. M. Hough and S. Dilchert, "Personality: its measurement and validity employee
distribution of big five personality traits," J. Cross-Cultural Psychology, vol. 38, no. 2,
[23] S. Nestler and M. D. Back, "Applications and extensions of the lens model to
[24] C. A. Gorman, J. Robinson, and J. S. Gamble, "An investigation into the validity of
traits in human- computer collaborative tasks," in Proc. 14th ACM Int. Conf.
Multimodal Interaction, Santa Monica, CA, 2012, pp. 39–46.
[26] L. M. Batrinca, N. Mana, B. Lepri, F. Pianesi, and N. Sebe, "Please, tell me about
13th Int. Conf. Multimodal Interfaces, Alicante, Spain, 2011, pp. 255–262.
traits from human spoken conversations," in Proc. 12th Annu. Conf. Int. Speech
[28] Y. Güçlütürk et al., "Multimodal first impression analysis with deep residual
networks," IEEE Trans. Affective Comput., vol. 9, no. 3, pp. 316–329, Sep. 2018.
Conf. Affective Comput. Intelligent Interaction Workshops and Demos (ACIIW), San
[31] L. Chen et al., "Automatic scoring of monologue video interviews using multimodal
cues," in Proc. INTERSPEECH 2016, the 17th Annu. Conf. Int. Speech Commun.
conversational video resumes," IEEE Trans. Multimedia, vol. 18, no. 7, pp. 1422–
facial cue analysis," in Sixteenth Int. Conf. Advances in ICT for Emerging Regions
(ICTer), Negombo, Sri Lanka, 2016, pp. 288–295.
[34] M. McRorie et al., "Evaluation of four designed virtual agent personalities," IEEE
based on physical appearance," Personality and Social Psychology Bulletin, vol. 35,
[38] R. Petrican, A. Todorov, and C. Grady, "Personality at face value: facial appearance
predicts self and other personality judgments among strangers and spouses," J.
[39] A. Basu et al., "A portable personality recognizer based on affective state
classification using spectral fusion of features," IEEE Trans. Affective Comput., vol.
personality analysis," IEEE Trans. Affective Comput., vol. 9, no. 3, pp. 299–302, Jul.
2018.
automated facial expression analysis," IEEE Trans. Affective Comput., vol. 9, no. 3,
377.
[43] L. R. Goldberg, "The development of markers for the big-five factor structure,"
[44] B. W. Swider, M. R. Barrick, and T. B. Harris, "Initial impressions: what they are,
what they are not, and how they influence structured interview outcomes," J. Appl.
[45] S. Yang and B. Bhanu, "Understanding discrete facial expressions in video using an
emotion avatar image," IEEE Trans. Syst., Man, Cybern. Part B, vol. 42, no. 4, pp.
neural networks on computationally restricted devices," IEEE Access, vol. 6, no. pp.
[47] L. Zheng et al., "Reliability and concurrent validation of the IPIP big-five factor
[48] D. Xue et al., "Personality recognition on social media with label distribution
[49] M. M. Tadesse, H. Lin, B. Xu, and L. Yang, "Personality predictions based on user
behavior on the facebook social media platform," IEEE Access, vol. 6, pp. 61959–