03 06 2021 Updated Paper

TensorFlow-based Automatic Personality Recognition Used in
Asynchronous Video Interviews
Abstract- The automated analysis of video interviews to determine individual
personality characteristics has been an important topic of study with applications in
personality computing, human-computer interaction, and psychological evaluation since
the emergence of artificial intelligence (AI). Convolutional neural network (CNN)
models can effectively analyze human nonverbal signs and assign their personality
qualities with the use of a camera, thanks to advances in computer vision and pattern
recognition based on deep learning (DL) approaches. An end-to-end AI interviewing
system was developed using asynchronous video interview (AVI) processing and a
TensorFlow AI engine to perform automatic personality recognition (APR) based on
features extracted from the AVIs and true personality scores from the facial expressions
and self-reported questionnaires of 120 real job applicants in this study. The trial
findings reveal that our AI-based interview agent can correctly distinguish an
interviewee's "big five" characteristics with an accuracy of 90.9 percent to 97.4 percent.
Our research also shows that, despite the lack of large-scale data, the semi supervised
DL technique performed unexpectedly well in terms of automated personality detection,
despite the absence of time-consuming human annotation and labelling. Existing self-
reported personality evaluation techniques that job candidates may falsify to obtain
socially desired outcomes may be supplemented or replaced by the AI-based interview
agent.
Index Terms- Big five, convolutional neural network (CNN), personality computing,
TensorFlow
I. Introduction
Personality is a worldwide predictor utilized in job selection, according to industrial
and organizational (I/O) psychologists [1]. Some businesses utilize self-reported surveys to
assess job candidates' personalities; nevertheless, job seekers may lie about their personality
qualities in order to increase their chances of landing a job [2]. Because candidates have a
hard time faking nonverbal signs, some employers utilize their facial expressions and other
nonverbal indicators during job interviews to assess candidates' personalities [3]. Due to cost
and time constraints, it is not feasible for every job candidate to attend a live job interview in
person or participate in interviews done through telephone or web conferences [4]. One-way
asynchronous video interview (AVI) software may be used to interview job candidates
automatically at a single point in time. Employers may evaluate the audio-visual recordings at
a later date using this method [5]. Human raters find it difficult to appropriately analyze
applicants' personality attributes based on video pictures when utilizing AVI [6]. Human
raters were unable to effectively identify an applicant's personality just by viewing recorded-
video interviews, according to Barrick et al. [7].
Because applying AI techniques to audio-visual datasets can achieve more reliable
and predictive power than human raters, both I/O psychology and computer science scholars
have suggested that artificial intelligence (AI) may surpass humans in recognising or
predicting an applicant's personality for screening job applicants [8]–[11]. “AI is a discipline
of computer science that tries to create intelligent computers that act in a way comparable to
human intelligence” [12], and it “aims to expand and enhance human capability and
efficiency in activities of remaking nature” [13]. Machine learning (ML) is a key component
of AI, since it “allows computers to learn without having to be explicitly programmed” [14].
Deep learning (DL) is an ML implementation methodology that can “mimic the human brain
process to comprehend input including pictures, sounds, and texts” [15]. DL feature
extraction is automated rather than laborious, in contrast to classical ML [12].
Supervised learning, unsupervised learning, and semi-supervised learning are the
three types of ML/DL [12]. Unsupervised learning may automatically learn the right answers
from a huge quantity of data without needing predetermined labels [10], [16]. Supervised
learning tasks are generally accomplished by classification using predetermined labelled
training data (called "ground truth"). Semi-supervised learning combines the two techniques
by employing a lesser quantity of unlabeled data along with some labelled data for pattern
recognition; as a result, this strategy may minimize labelling efforts while maintaining high
accuracy.
Previous automated personality recognition (APR) research relied on supervised
machine learning (ML), which requires human labelling and is time intensive [17]. Because
convolutional neural networks (CNNs) have been shown to be high-performing models for
automatically processing images and inferring first impressions from camera images, this
study used semi-supervised DL methods, including CNNs, to develop an AI-based interview
agent that can automatically recognize a job applicant's personality based on relatively
smaller datasets.The following is how the rest of the article is organized: The backdrop of
APR from audio-visual data is discussed in Section 2. Our data processing strategy is
described in Section 3. Section 4 contains a thorough model and its outcomes. Finally, in
Section 5, we discuss and summarize our results as well as our future work.
II. Related Work
A. Personality Taxonomy
Individual variances in distinctive patterns of thinking, feeling, and behaviour are
referred to as personality [19]. This concept is often used to predict whether a job seeker will
do well in a particular work function and participate effectively in a potential cultural
environment [20]. The "big five" features, also known as the five-factor model (FFM) or
OCEAN model, offer academics and practitioners with a well-defined taxonomy for choosing
job candidates [20]. The big five fundamental characteristics of openness, conscientiousness,
extraversion, agreeability, and neuroticism (poor emotional stability) are classified and
utilized in many cultural situations [21].
 Imaginative and creative openness: the degree to which a person is imaginative and
creative.
 Conscientiousness: an individual's level of organization, thoroughness, and
thoughtfulness.
 Extraversion: It refers to how chatty, lively, and forceful a person is.
 Agreeability: a person's willingness to compromise, sympathy, kindness, and
fondness.
 Neuroticism: a person's stress, moodiness, and anxiety are reflected in this trait.
B. Personality Computing
Different methodologies, including self-rating and observation, exist to measure the
large five qualities of a person. Self-rate reveals a personality, whereas observer-rate
represents other people's subjective judgments of the personality of a person(s). In an
autonomous approach, personality refers to the reasons, intentions, emotions, and actions
stated by a person. From the observer point of view, personality includes information on the
social reputation of an individual, but intimate contacts such as partners, friends or colleagues
should ideally be able to acquire reliable observation ratings [19]. In I/O psychology, the
authentic evaluations are the basic information utilized to anticipate individual workplace
behaviour and performances, when it is difficult to determine the valid five major features of
the observer. Selbstbewertungen may also be used to forecast whether a job applicant is in a
zero-knowledge setting, like a job interview [19], in good match for the job criteria and the
business culture.
The respondents use distal signals to externalize their apparent personality (i.e., any
observable behaviour that can be perceived by the interviewer, such as facial expression,
gaze, posture, body movement, speaking, and prosody). Alternatively, the interviewer uses
proximal cues (i.e., any interviewee behaviour that are actually perceived by the interviewer,
including indirect observable cues) to attribute the unobservable personality traits of the
interviewee; however, these cues can translate into perceptions by the interviewer [6], [23].
Despite the fact that only the candidates' heads and torsos are visible in AVI,
personality psychology researchers have discovered that an interviewer or rater may still
employ nonverbal clues to determine the candidates' personality qualities [24]. Individuals
may infer genuine personality attributes to zero acquaintances based on short video snippets,
according to several experimental investigations [23]. The lens model is being used to
automatically detect, detect, and synthesize human behavioral signals and personality using
personality computing, an emerging research topic connected to AI and personality
psychology. APR, automated personality perception (APP), and automatic personality
synthesis (APS) are three methodologies in personality computing for auto-assessing
personality [6].
APR is designed to auto-recognize an interviewee's self-assessed personality from
distal signals by extracting characteristics from the audio-visual data of AVI [9], [25]–[27].
APP, on the other hand, is designed to anticipate an interviewee's personality based on
proximate clues [28]. Because investigating proximal cues is difficult, APP relies on distal
cues as a proxy, as discussed in [8, [10], [17], [29]–[33]. Artificial agents, avatars, or robots
are employed in APS to exhibit human-like distal signals, with the purpose of people
recognizing and inferring these signals as preset attributes [34], [35]. AVI may be modified
and integrated with APR to auto-recognize interviewees' "actual scores" compared to their
self-rated personality attributes to generate AI-based APR [6].
C. Mel-Frequency Cepstrum
Mel-Frequency Kepstrum (MFC) is a long-term power spectrum-based audio
representation. To get it, we'll use a nonlinear Mel frequency scale and the linear cosine
transform of a log power. Mel-frequency cepstral coefficients make up MFC as a whole
(MFCCs). A nonlinear spectrum of spectrum, a sort of cepstral depiction of the audio clip, is
used to create MFCCs. In the case of MFC, the frequency bands are evenly spaced on the
Mel scale to resemble the human auditory system's response closer than the linearly separated
frequency bands used in conventional cepstrum.
The steps to generate Mel-frequency cepstral coefficients are as follows (Fig-1):
1. A windowed snippet of an audio signal is Fourier transformed.
2. Use triangular overlapping panes to map the spectrum's powers onto the Mel scale.
3. Keep track of the strength of each melt frequency.
4. Take the discrete cosine transform of the Mel log powers from the list above.
5. The amplitudes of the resultant spectrum are the needed MFCCs.

Fig-1 Steps to derive MFC
III. Problem Statement
Work productivity and organizational effectiveness are associated with interpersonal
communication skills and personal qualities. For information to be passed accurately, both
verbal and nonverbal means of communication are used. These include gestures, posture, and
tone. In order to achieve this, the researcher aims to suggest a revolutionary technique that
uses numerous machine learning algorithms to identify the personality.
IV. Data Processing
A. Data Collection
We created AVI cloud-based software, similar to the work in [42], to build our dataset
in a genuine job interview setting. The AVI server may accept recorded video prompts,
construct interview scripts, send video prompts from the interview, and receive video answers
using Google cloud storage. The video replies' content may subsequently be utilized to do
algorithmic studies, such as audio and visual data analytics of the video replies. The replies of
interviews may be recorded at one moment in time but afterwards assessed by an algorithm,
human raters, or both at a later point in time throughout the AVI.

We ran an experiment with the help of Ahmedabad based nonprofit human resources
(HR) organization. Our sponsor made a job description available to its members on a web
website for recruiting 2–3 HR experts from a connected firm in Ahmedabad, Rajkot and
Baroda. Members that were interested sent their resumes to the researchers, who vetted the
resumes based on the job description. A total of 120 candidates were asked to log into the
AVI, which sent pre-programmed interview questions to interviewees' web camera-equipped
mobile or computer devices at any time of day through a browser. The replies of the
respondents were captured for examination, including both audio and visual information. Our
algorithms would record and evaluate the full of the candidates' interview procedures and
replies, including audio and video, and utilize it as a reference for hiring recommendations.
During the AVI, the interview questions were arranged in a regular fashion. The same
five questions were given to all candidates, and they were behaviorally geared to evaluate the
candidates' communication abilities in light of the job description. Each question was shown
on a separate screen, and as the candidates entered the screen, the audio for the text questions
began immediately. The candidates were allowed a maximum of 3 minutes to answer each
question, which were displayed on-screen one at a time in order. Within the 3-minute time
limit, candidates might opt to skip to the following question. After 3 minutes, a new screen
with the following question opened automatically. The whole video interview procedure took
around 20 minutes, including one practice try.
B. Data Labelling
We utilized a 50-item international personality item pool (IPIP) assessment created in
[43] to evaluate the applicants' self-rated big five qualities to acquire genuine ratings for the
individual big five features [6]. All candidates were requested to take the IPIP survey online
before to participating in the AVI and were told that the survey data would be sent to
researchers alone and would have no bearing on the hiring recommendation. This method
was carried out to lessen the impacts of social desire, which may cause self-rated personality
qualities to be distorted in the pursuit of a job [44].
C. Feature Extraction
We began using the pretrained Inception-v3 dataset acquired for ImageNet, which has
over 14 million photos organized into 1,000 classes, to capture the candidates' facial
expressions. In addition, as shown in Figure-2, we trained our face recognition model using
OpenCV and Dlib while monitoring 86 facial landmark points every frame. Furthermore,
since this landmark point is unaffected by facial emotions, we utilized it as the anchor point
location during feature extraction to limit background noise and decrease mistakes such head
motion [45].
Fig-2 Image Annotation

Using FFmpeg, we extracted the pictures frame by frame from our AVI dataset to
create the feature extractor. All of the pictures were normalized to 640 pixels in width, with
the height of each picture set by the vision device's pixel ratio. As shown in Figure-3, we
retrieved the characteristics of the 86 landmark points from each frame during a 5-second
period from all of the AVI data for each applicant. We turned all the photos to grayscale to
enhance picture categorization and eliminate background interference from hair and
cosmetics. More than 10,000 photos were utilized in this experiment's test scenarios.
Fig-3 Extracted Video Frames

V. Modeling and Results
A. Model Building
To train our APR model, which was developed using a sophisticated CNN developed
in Python and the TensorFlow DL engine [46], we integrated the 120 applicants' personality-
labeled data with their extracted characteristics. We normalized the features by rescaling the
range of feature values to [0, 1] before feeding them into the neural network model. The
initial few layers of the neural network model, which had over 72 thousand neurons, were
designed to automatically extract information from photos that had been preprocessed as
stated in the preceding section. The retrieved features were then concatenated with additional
features before being sent to the output layer for classification. Our CNN structure included
four convolutional layers, three pooling layers, ten mixed layers, a fully connected layer, and
a softmax layer as the output, as shown in Figure -4.
Fig-4 Steps in CNN
The neural network's inputs were the extracted elements of the applicants' facial
expressions, and the output was their self-assessed big five personality trait scores. As
illustrated in Figure-6, the completely linked network was formed utilizing the relationships
between distinct nodes. A grayscale picture was used to represent the input. Three filter
functions were utilized in each of the four convolutional layers. Convolutional layer 2 has 64
convolution filters, up from 32 in convolutional layer 1. A pooling layer was added after each
convolutional layer. The dropout rate was set at 0.1 and both pooling layers (one average-
pooling and two max-pooling) had a stride of 2 2. There were 2,048 neurons in the last
completely linked layer, with dropouts of 0.4 and 0.5. The suggested CNN's last layer was a
softmax layer with 50 potential outputs (10 interval- scale classes from 1.1 to 6.0 that
reflected the big five personality classifications). To avoid overfitting, we included dropout
after the fully connected layers at the end of our convolutional network, beginning with a
probability of 0.5 and progressively decreasing the dropout rate until the performance was
optimized.
We divided the data in these studies as follows: the test set was 50%, and the
validation set was 50%, based on the same sampling. In this dataset, each candidate
possessed five distinctive attributes (the big five qualities). 4,000 training iterations were
completed. The training batch size was 256, the learning rate was 0.01, the evaluation
frequency was 10, and the learning rate was 0.01.
B. Criteria for the Assessment
To analyze the performance of our APR, we employed concurrent validity to see how
well the novel measuring technique (the APR) corresponded with the well-established
measuring process (the self-reported inventory). In this experiment, we utilized the Pearson
correlation coefficient (r) to quantify concurrent validity, as described in [6] and [47]. We
also used [8] to calculate the coefficient of determination (R2) and mean square error (MSE).
The R2 reflects the amount of variation in the dependent variable (y) that the predictor can
predict or explain. The better the model, the higher the R 2. The MSE quantifies the regression
model's quality of fit, the higher the value, the bigger the error.
C. Results
Prior to evaluating our APR's performance, we tested the construct validity and
internal consistency reliability of the self-reported personality characteristics using IBM's
statistical software for the social sciences (SPSS v 23). A confirmatory factor analysis
revealed that each factor loading was larger than 0.6, while the Kaiser-Meyer-Olkin (KMO)
value was larger than 0.8, indicating that the concept validity was adequate in this research.
The internal consistency reliability was good because the Cronbach's alpha (α) values were
all larger than 0.7 as follows: Openness to experience (α = .75), conscientiousness (α = .83),
extraversion (α = .88), agreeableness (α = .80), and neuroticism (α = .84).
The AI TensorFlow engine effectively learnt and predicted all of the dimensions of
the big five qualities, as shown in Table I. APR could predict all of the genuine big five
personality self-assessment scores. Each dimension's Pearson correlation was between 0.966
and 0.976. Each dimension's R2 ranged from 0.933 to 0.952. The MSE for each dimension
was between 0.053 and 0.120, and all correlations were determined to be significant (p 0:01).
The better the estimator, the greater the R2 (100 percent is excellent). In contrast, the smaller
the estimator error is, the lower the MSE is (0 is ideal). Furthermore, the average accuracy of
the classifiers (ACC) was 95.36 percent, according to the classification accuracy data.
TABLE-1 Experiment Results
Personality Traits R R2 MSE ACC%
Openness to experience 0.966 0.933 0.053 97.4

Conscientiousness 0.976 0.952 0.094 96.7
Extraversion 0.974 0.949 0.120 97.0
Agreeability 0.971 0.943 0.069 90.9
Neuroticism 0.968 0.937 0.092 94.8
p < .01
VI. Discussions
A. Comparison with Related Works
This research is in response to a request for personality computing research [40], [48],
[49]. Validating APR using manually labelled features from any discernible distal cues was
fairly difficult in classical personality computing [6]. As a result, several recent research have
used deep learning architectures to predict personality using third-party datasets like
Amazon's Mechanical Turk or ChaLearn's First Impressions dataset [40]. However, most of
these experiments employed APP, in which the DL engines acted as observers detecting
nonverbal clues from interviews and making inferences about the interviewees' personality
characteristics in the setting of zero-acquaintance judgments. In other words, these studies
employed subjective personality perceptions as independent variables rather than real
personality scores [6], which might have created biases [18].
VI. Future Works
Previous research has showed that deep neural networks that learn multimodal
variables (image frames and audio) perform better than unimodal features in predicting the
big five qualities. To learn how to discern an interviewee's personality, we may combine our
visual technique with prosodic elements in future research. Furthermore, the participants in
this research were a specialized sort of professional, which may restrict the generalizability of
the findings. A more varied participant pool should be included in future studies.
VII. Conclusion
Based on just 120 genuine samples of job candidates, this article created an AVI
integrated with a TensorFlow-based semi-supervised DL model to successfully auto-

recognize an interviewee's genuine personality. In the context of nonverbal communication,
our APR technique produced an accuracy of over 90%, exceeding earlier relevant laboratory
experiments with accuracy ranging from 61% to 75% [6]. The high-performing APR utilized
in this AVI may be used to enhance or replace self-reported personality evaluation
approaches that may be skewed by job candidates owing to societal pressure to be hired.
References
[1] A. M. Ryan et al., "Culture and testing practices: is the world flat?" Appl.
Psychology, vol. 66, no. 3, pp. 434– 467, Mar. 2017.
[2] M. J. W. McLarnon, A. C. DeLongchamp, and T. J. Schneider, "Faking it! Individual
differences in types and degrees of faking behavior," Personality and Individual
Differences, vol. 138, pp. 88–95, Feb. 2019.
[3] T. DeGroot and J. Gooty, "Can nonverbal cues be used to make meaningful
personality attributions in employment interviews?" J. Bus. Psychology, vol. 24, no.
2, pp. 179–192, Feb. 2009.
[4] I. Nikolaou and K. Foti, "Personnel selection and personality," in The SAGE
Handbook of Personality and Individual Differences, V. Zeigler-Hill and T.
Shackelford, Eds., London, UK: Sage, 2018, pp. 659– 677.
[5] F. S. Brenner, T. M. Ortner, and D. Fay, "Asynchronous video interviewing as a new
technology in personnel selection: the applicant’s point of view," Frontiers in
Psychology, vol. 7, no. 863, pp. 1–11, Jun. 2016.
[6] A. Vinciarelli and G. Mohammadi, "A survey of personality computing," IEEE
Trans. Affective Comput., vol. 5, no. 3, pp. 273–291, Jul. 2014.
[7] M. R. Barrick, G. K. Patton, and S. N. Haugland, "Accuracy of interviewer
judgments of job applicant personality traits," Personnel Psychology, vol. 53, no. 4,
pp. 925–951, Dec. 2000.
[8] O. Celiktutan and H. Gunes, "Automatic prediction of impressions in time and
across varying context: personality, attractiveness and likeability," IEEE Trans.
Affective Comput., vol. 8, no. 1, pp. 29–42, Jan. 2017.
[9] N. Y. Asabere, A. Acakpovi, and M. B. Michael, "Improving socially-aware
recommendation accuracy through personality," IEEE Trans. Affective Comput., vol.
9, no. 3, pp. 351–361, Aug. 2018.
[10] I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque, "Automated analysis and
prediction of job interview performance," IEEE Trans. Affective Comput., vol. 9, no. 2,
pp. 191–204, Apr. 2018.
[11] M. Langer, C. J. König, and K. Krause, "Examining digital interviews for personnel
selection: applicant reactions and interviewer ratings," Int. J. Selection Assessment,
vol. 25, no. 4, pp. 371–382, Dec. 2017.
[12] Y. Xin et al., "Machine learning and deep learning methods for cybersecurity," IEEE
Access, vol. 6, no. pp. 35365–35381, May 2018.
[13] J. Liu et al., "Artificial intelligence in the 21st century," IEEE Access, vol. 6, pp.
34403–34421, Mar. 2018.
[14] M. I. Jordan and T. M. Mitchell, "Machine learning: trends, perspectives, and
prospects," Science, vol. 349, no. 6245, pp. 255–260, Jul. 2015.
[15] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning,"
Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[16] K. K. Htike and O. O. Khalifa, "Comparison of supervised and unsupervised learning
classifiers for human posture recognition," in Int. Conf. Comput. Commun. Eng.
(ICCCE'10), Kuala Lumpur, Malaysia, 2010, pp. 1–6.
[17] B. Aydin, A. A. Kindiroglu, O. Aran, and L. Akarun, "Automatic personality

prediction from audiovisual data using random forest regression," in 23rd Int. Conf.
Pattern Recognition (ICPR), Cancun, Mexico, 2016, pp. 37–42.
[18] H. J. Escalante et al., "Explaining first impressions: modeling, recognizing, and
explaining apparent personality from videos," arXiv:1802.00745, 2018.
[19] M. Mikulincer, P. R. Shaver, M. L. Cooper, and R. J. Larsen, APA Handbooks in
Psychology. APA Handbook of Personality and Social Psychology. Personality
Processes and Individual Differences. Washington, DC: American Psychological
Association, 2015.
[20] L. M. Hough and S. Dilchert, "Personality: its measurement and validity employee
selection," in Handbook of Employee Selection, J. L. Farr and N. T. Tippins, Eds.,
New York, NY: Routledge, 2017, pp. 298–325.
[21] D. P. Schmitt, J. Allik, R. R. McCrae, and V. Benet- Martínez, "The geographic
distribution of big five personality traits," J. Cross-Cultural Psychology, vol. 38, no. 2,
pp. 173–212, Mar. 2007.
[22] J. B. Walther, "Theories of computer-mediated communication and interpersonal
relations," in The Handbook of Interpersonal Communication, M. L. Knapp and J. A.
Daly, Eds., Thousand Oaks, CA: Sage, 2011, pp. 443–479.
[23] S. Nestler and M. D. Back, "Applications and extensions of the lens model to
understand interpersonal judgments at zero acquaintance," Current Directions
Psychological Sci., vol. 22, no. 5, pp. 374–379, Sep. 2013.
[24] C. A. Gorman, J. Robinson, and J. S. Gamble, "An investigation into the validity of
asynchronous web- based video employment-interview ratings," Consulting
Psychology J., vol. 70, no. 2, pp. 129–146, Jun. 2018.
[25] L. Batrinca, B. Lepri, N. Mana, and F. Pianesi, "Multimodal recognition of personality
traits in human- computer collaborative tasks," in Proc. 14th ACM Int. Conf.
Multimodal Interaction, Santa Monica, CA, 2012, pp. 39–46.
[26] L. M. Batrinca, N. Mana, B. Lepri, F. Pianesi, and N. Sebe, "Please, tell me about
yourself: automatic personality assessment using short self-presentations," in Proc.
13th Int. Conf. Multimodal Interfaces, Alicante, Spain, 2011, pp. 255–262.
[27] A. V. Ivanov, G. Riccardi, A. J. Sporka, and J. Franc, "Recognition of personality
traits from human spoken conversations," in Proc. 12th Annu. Conf. Int. Speech
Commun. Assoc. (INTERSPEECH 2011), Florence, Italy, 2011, pp. 1549–1552.
[28] Y. Güçlütürk et al., "Multimodal first impression analysis with deep residual
networks," IEEE Trans. Affective Comput., vol. 9, no. 3, pp. 316–329, Sep. 2018.
[29] L. Chen, F. Zaromb, Z. Yang, C. W. Leong, and M. Martin-Raugh, "Can a machine
pass a situational judgment test measuring personality perception?" in Seventh Int.
Conf. Affective Comput. Intelligent Interaction Workshops and Demos (ACIIW), San
Antonio, TX, 2017, pp. 7–11.
[30] L. Chen et al., "Automated video interview judgment on a large-sized corpus
collected online," in Seventh Int. Conf. Affective Comput. Intelligent Interaction
(ACII), San Antonio, TX, 2017, pp. 504–509.
[31] L. Chen et al., "Automatic scoring of monologue video interviews using multimodal
cues," in Proc. INTERSPEECH 2016, the 17th Annu. Conf. Int. Speech Commun.
Assoc., Princeton, NJ, 2016, pp. 32–36.
[32] L. S. Nguyen and D. Gatica-Perez, "Hirability in the wild: analysis of online
conversational video resumes," IEEE Trans. Multimedia, vol. 18, no. 7, pp. 1422–
1437, Jul. 2016.
[33] A. T. Rupasinghe, N. L. Gunawardena, S. Shujan, and D.S. Atukorale, "Scaling
personality traits of interviewees in an online job interview by vocal spectrum and
facial cue analysis," in Sixteenth Int. Conf. Advances in ICT for Emerging Regions
(ICTer), Negombo, Sri Lanka, 2016, pp. 288–295.
[34] M. McRorie et al., "Evaluation of four designed virtual agent personalities," IEEE
Trans. Affective Comput., vol. 3, no. 3, pp. 311–322, Jul. 2012.
[35] S. K. Ö tting and G. W. Maier, "The importance of procedural justice in human–
machine interactions: intelligent systems as new decision agents in organizations,"
Comput. Human Behavior, vol. 89, pp. 27–39, Dec. 2018.
[36] L. P. Naumann, S. Vazire, P. J. Rentfrow, and S. D. Gosling, "Personality judgments
based on physical appearance," Personality and Social Psychology Bulletin, vol. 35,
no. 12, pp. 1661–1671, Sep. 2009.
[37] J. C. S. J. Junior et al., "First impressions: a survey on computer vision based
apparent personality trait analysis," arXiv:1804.08046, 2018.
[38] R. Petrican, A. Todorov, and C. Grady, "Personality at face value: facial appearance
predicts self and other personality judgments among strangers and spouses," J.
Nonverbal Behavior, vol. 38, no. 2, pp. 259–277, Jan. 2014.
[39] A. Basu et al., "A portable personality recognizer based on affective state
classification using spectral fusion of features," IEEE Trans. Affective Comput., vol.
9, no. 3, pp. 330–342, Jul. 2018.
[40] S. Escalera, X. Baro, I. Guyon, and H. J. Escalante, "Guest editorial: apparent
personality analysis," IEEE Trans. Affective Comput., vol. 9, no. 3, pp. 299–302, Jul.
2018.
[41] G. Pons and D. Masip, "Supervised committee of convolutional neural networks in
automated facial expression analysis," IEEE Trans. Affective Comput., vol. 9, no. 3,
pp. 343–350, Jul. 2018.
[42] S. Rasipuram, S. B. P. Rao, and D. B. Jayagopi, "Asynchronous video interviews vs.
face-to-face interviews for communication skill measurement: a systematic study,"

in Proc. 18th ACM Int. Conf. Multimodal Interaction, Tokyo, Japan, 2016, pp. 370–
377.
[43] L. R. Goldberg, "The development of markers for the big-five factor structure,"
Psychological Assessment, vol. 4, no. 1, pp. 26–42, Mar. 1992.
[44] B. W. Swider, M. R. Barrick, and T. B. Harris, "Initial impressions: what they are,
what they are not, and how they influence structured interview outcomes," J. Appl.
Psychology, vol. 101, no. 5, pp. 625–638, May 2016.
[45] S. Yang and B. Bhanu, "Understanding discrete facial expressions in video using an
emotion avatar image," IEEE Trans. Syst., Man, Cybern. Part B, vol. 42, no. 4, pp.
980–992, Aug. 2012.
[46] K. Bregar and M. Mohorcic, "Improving indoor localization using convolutional
neural networks on computationally restricted devices," IEEE Access, vol. 6, no. pp.
17429–17441, Mar. 2018.
[47] L. Zheng et al., "Reliability and concurrent validation of the IPIP big-five factor
markers in China: consistencies in factor structure between Internet-obtained
heterosexual and homosexual samples," Personality and Individual Differences, vol.
45, no. 7, pp. 649–654, Nov. 2008.
[48] D. Xue et al., "Personality recognition on social media with label distribution
learning," IEEE Access, vol. 5, pp. 13478–13488, Jun. 2017.
[49] M. M. Tadesse, H. Lin, B. Xu, and L. Yang, "Personality predictions based on user
behavior on the facebook social media platform," IEEE Access, vol. 6, pp. 61959–
61969, Oct. 2018.

03 06 2021 Updated Paper

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 06 2021 Updated Paper

Uploaded by

Copyright:

Available Formats

TensorFlow-based Automatic Personality Recognition Used in

Asynchronous Video Interviews

Abstract- The automated analysis of video interviews to determine individual

personality characteristics has been an important topic of study with applications in

personality computing, human-computer interaction, and psychological evaluation since

the emergence of artificial intelligence (AI). Convolutional neural network (CNN)

recognition based on deep learning (DL) approaches. An end-to-end AI interviewing

TensorFlow AI engine to perform automatic personality recognition (APR) based on

DL technique performed unexpectedly well in terms of automated personality detection,

socially desired outcomes may be supplemented or replaced by the AI-based interview

Personality is a worldwide predictor utilized in job selection, according to industrial

video interviews, according to Barrick et al. [7].

Because applying AI techniques to audio-visual datasets can achieve more reliable

extraction is automated rather than laborious, in contrast to classical ML [12].

Supervised learning, unsupervised learning, and semi-supervised learning are the

learning tasks are generally accomplished by classification using predetermined labelled

Previous automated personality recognition (APR) research relied on supervised

study used semi-supervised DL methods, including CNNs, to develop an AI-based interview

II. Related Work

Individual variances in distinctive patterns of thinking, feeling, and behaviour are

utilized in many cultural situations [21].

 Conscientiousness: an individual's level of organization, thoroughness, and

 Extraversion: It refers to how chatty, lively, and forceful a person is.

 Agreeability: a person's willingness to compromise, sympathy, kindness, and

Different methodologies, including self-rating and observation, exist to measure the

large five qualities of a person. Self-rate reveals a personality, whereas observer-rate

represents other people's subjective judgments of the personality of a person(s). In an

personality computing, an emerging research topic connected to AI and personality

psychology. APR, automated personality perception (APP), and automatic personality

synthesis (APS) are three methodologies in personality computing for auto-assessing

APR is designed to auto-recognize an interviewee's self-assessed personality from

APP, on the other hand, is designed to anticipate an interviewee's personality based on

self-rated personality attributes to generate AI-based APR [6].

Mel-Frequency Kepstrum (MFC) is a long-term power spectrum-based audio

transform of a log power. Mel-frequency cepstral coefficients make up MFC as a whole

frequency bands used in conventional cepstrum.

The steps to generate Mel-frequency cepstral coefficients are as follows (Fig-1):

1. A windowed snippet of an audio signal is Fourier transformed.

3. Keep track of the strength of each melt frequency.

5. The amplitudes of the resultant spectrum are the needed MFCCs.

III. Problem Statement

Work productivity and organizational effectiveness are associated with interpersonal

uses numerous machine learning algorithms to identify the personality.

IV. Data Processing

human raters, or both at a later point in time throughout the AVI.

AVI, which sent pre-programmed interview questions to interviewees' web camera-equipped

around 20 minutes, including one practice try.

We utilized a 50-item international personality item pool (IPIP) assessment created in

qualities to be distorted in the pursuit of a job [44].

Fig-2 Image Annotation

Fig-3 Extracted Video Frames

a softmax layer as the output, as shown in Figure -4.

Fig-4 Steps in CNN

frequency was 10, and the learning rate was 0.01.

B. Criteria for the Assessment

internal consistency reliability of the self-reported personality characteristics using IBM's

extraversion (α = .88), agreeableness (α = .80), and neuroticism (α = .84).

TABLE-1 Experiment Results

Personality Traits R R2 MSE ACC%

Openness to experience 0.966 0.933 0.053 97.4

A. Comparison with Related Works

characteristics in the setting of zero-acquaintance judgments. In other words, these studies

employed subjective personality perceptions as independent variables rather than real