Book 1

Sn No
4
5
9
10
11
12
3
13
14
15
16
17
25
19
20
23
18
21
22
24
26
27
28
29
30
31
32
33
34
36
37
38
39
40
41
43
44
45
46
47
49
51
50
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
72
73
74
75
76
77
78
79
80
81
85
82
83
84
86
87
Methodology/ Contribution
Lexical Hypothesis
Buchanan and Smith proposed use of internet based data for personality research. The reported results suggested evidence of
support for web based personality assesment.
Pennbaker and King [11], established linguistic style as an independent and meaningful way of automat-ic personality recognition.
They developed the ESSAYS consisting of 2400 stream of consciousness texts labelled with personality which stays a standard
dataset for text based personality computation till date.
Gill and Oberlander studied the role of lexical features in predicting Extroversion. They report LIWC and MRC represents
Psychotism and Nueroricism better than Extroversion. They also conclude Extrovert lan-guage is less formal, positive, and longer
than the Introvert. However both are closed vocabulary features and do not take account of the context.
Gosling, et al proposed two mechanisms for personality expression in environment, Identity claims and behavioral residues.
Ashton, Lee and Goldberg provide a hierarchial examination of 1710 adjectives in english to provide evidence for the Five
factor model.
Vazire and Gosling [114] established personality identification by looking at personal websites, which mainly consist of identity
claims. The results suggest Openness to be the easiest trait to be judged. Further it was found out that the scores of Extraversion and
Agreeableness were slightly enhanced, yet accurate.The correlations provide stong evidence that Identity claims alone let us know
about the personality traits of a person.
One of the first works that used text for personality profiling using machine learning was Argamon et al [12] (Arg05). Support
vector machine with linear kernel (SMO) is used to learn linear separators for high and low classes of neuroticism and extraversion
in authors of informal text. The corpus used was based on essays written by psychology students at University of Texas from 1997 to
2003. The reported accuracy for binary classification was around 58 percent which was an absolute 8 percent improvement over
their baseline. (Division by thirds, upper third is high, lower third is low, middle third is ignored).
Mairesse and Walker tried modelling personality traits using a combination of lexical features and Speech acts. They performed
regression as well as Binary Classification of Essays and EAR Corpus datasets and theirresults suggest improvements over the
baseline. The improvements however seem better for the informal/spoken language from EAR Corpus than the formal texts from
the Essays.
Mehl, Gosling and Pennbaker evaluated the association between Big 5 traits and informal language used in daily context. They
collected language usage and several other features using an Electroic Activated Recorder, correlating it with Big five personality
traits. The results suggested the time spent talking was indicative of students Extroversion, class attendence was positively
correlated with Conscientiousness and swear word use was negatively correlated to Agreableness. It was also observed that the
correlation of language and traits has a remarkable diffirence across genders.
Oberlander et al [16] (Ob06), in an attempt of personality profiling with Weblogs [18], used Naïve Bayes (NB) model for binary as
well as multi class classification with n-grams as features. Due to skewed distribution over the yes and the no classes, openness trait
was excluded for this work. For their hardest multiclass classification task, they improve the classification accuracy of Extraversion
by 10.9 percent (32.2% relative) and Agreeableness by 30.4% (77.2% relative) from Arg05 baseline. In [24] Estival et al (Est07) used
email data of 1033 respondents with a combined total of 9836 email messages. They used Part of Speech (POS) tags and a custom
named entity organizer (NE). Five demographic traits were distinguished along with the Big5 traits. The best reported accuracies
for the psychometric traits are 56.7 percent for LibSVM.
A seminal work in text based personality computation by Mairesse et al [17] (Mai07), extracted 88 LIWC [13] word categories and
14 MRC Psycholinguistic database [14] categories from two datasets, ESSAYS [11] and conversation extracts from EAR Corpus
[15]. Taking the Big 5 traits as reference, they tackle personality recognition from text as classification, regression and ranking
problems. For the classification, six different algorithms: C4.5 decision tree learning (J48), Nearest Neighbour (NN), NB, Ripper
(JRip), Adaboost, and SMO were used. SMO gave the best results for ESSAYS with 62.11 percent accuracy for Openness to
experience, while as for EAR Corpus NB and Adaboost performed the best with an accuracy of 73 percent for Extraversion. For
regression a linear regression model, an M5’ regression model tree, an M5’ regression model tree returning a linear model, a
REPTree decision tree, and a model based on Support vector machines with linear kernels (SMOreg) were used. The M5’ model
tree performs the best with an error decrease of 6.7 percent from the baseline for ESSAYS dataset while as models perform badly
over the small sized EAR Corpus. For the ranking problem, RankBoost [17] was used. For the ESSAYS data, Openness to
experience produced the best ranking model with loss of 0.39 and agreeableness produced the worst with loss of 0.46, whereas for
EAR Corpus Agreeableness and Contentiousness produced loss of 0.31 and 0.33 outperforming the rest.
. In [24] Estival et al (Est07) used email data of 1033 respondents with a combined total of 9836 email messages. They used Part of
Speech (POS) tags and a custom named entity organizer (NE). Five demographic traits were distinguished along with the Big5 traits
The best reported accuracies for the psychometric traits are 56.7 percent for LibSVM
Gosling et al in [] established looking at facebook profiles as a valid means of personality identification. Theeir findings suggest
moderate to strong consensus for identification of all personality traits from facebook profiles. However, it also suggested self-
enhancement for Emotional Stability and Openness to Experience.
Pen - LIWC
, Golbeck et al in [22] (Gol11a) worked on prediction on personality from social media. They extracted at least 2000 tweets
of 279 users along with their self-reported personality assessments using BFI. LIWC and MRC features were obtained using
a text analysis tool along with a word by word sentiment analyser of the tweets. Gaussian process (GP) and ZeroR (ZR)
algorithms were used for regression analysis of the features to predict personality information. Openness was the best
computed trait and neuroticism was the worst with mean absolute errors (MAE), 12 and 18 percent respectively on a
normalized 0-1 scale.
In [23] (Gol11b) they used 161 stastical features from Facebook profile pages of the 279 people for personality analysis.
LIWC and certain hand crafted features were used for regression using M5’Rules (M5R) and GP. The MAE for each
personality trait was around 11 percent for both the algorithms used.
Iacobelli et al (Ia11) performed a personality estimation of bloggers based on a variation of features involving bi-grams, n-
grams, stop-words and LIWC categories. They used SVM for classification with best accuracy for Openness 84.4 percent,
and the worst accuracy for Neuroticiam at 70.5 percent
Quercia et al in [26, 2011] (Qu11) also have worked on personality prediction from social media. They tried to map the influence
scores of 335 twitter users to personality traits. Regression analysis was done by using M5R algorithm with 10 fold cross validation
that achieved results with around 0.88 root mean square error (RMSE). They analysed the correlations between personality traits
and user behaviour, classifying the users as listeners, popular, hi-read and influential
Chittaranjan et al [20] (Chi11) study smartphone usage statistics to derive personality information. They used a subset of
dataset Lausanne data collection campaign (LDCC), which has a passive, continuous, non-intrusive collection of smartphone
usage logs. Binary classification was trained on two models, C4.5 and SVM with RBF kernel. From a baseline that chooses
the majority class all the time. The reported accuracy averages for C4.5 and SVM are 64.8 and 72.1 % respectively.
Golbeck et al [78] (Gol12) analysed behaviour on social media for predicting personality traits of people. They used GP and
ZeroR algorithms on twitter based data and the reported MAE for Big 5 traits was in the range of 11 to 19 percent.
An attempt to predict personalities from facebook was made by Bachrach et al in [27] (Bac12). They extracted profile
stastical features from 5000 facebook accounts and after correlating the same to personality traits perform a linear
multivariate regression using 10 fold cross validation. Their results around 0.28 RMSE conclude significant evidence in
extraction of personality trait from social media profile statistics.
Bai el have also predicted personality from user behaviour on social networks in [28, 2012] (Bai12). They used Chinese
social network platform Renren. A custom set of 41 features including basic information, emotion and time distribution
were used on 209 individuals. C4.5 classification algorithm with gain ration was used and achieved accuracy around 0.8 for
at least four of the five traits.
. Kermanidis, L in [30] (Ker12) used SVM with Bagging for recognition of personality traits from Greek essays using lingual
features combined from LIWC and MRC. The reported accuracies range from 64 to 86 percent.
Jacopo et al in [21] (Ja12) explore the use of social network structure for inferring personality traits. They used a custom
dataset consisting of cell phone usage of 53 subjects over a period of 3 months. Random Forest ensemble classifiers are used
for binary classification of each personality trait and the average reported accuracy for the Big5 traits using call features was
68.96. Extroversion and Agreeableness were best predicted trait (72.9, 71.2), Conscientiousness and Neuroticism the worst
(65.4, 66.1).
Wald et al [66] (Wal12) used several machine learning models, RepTree, Dtable and Linear Regression for ranking individuals
according to their personality traits using their own data collected from Facebook.
In an attempt of unsupervised personality recognition from social networks, Celli used corellations from Mairesse et al to create a
personality model for a social network based dataset. The accuracy reported is in line with that of Mairesse et al. This suggests it is
possible to exploit existing correlations between lingual markers and personality traits to develop comprehensive personality models
which do not require subject details.
In the same year, [29], Celli and Rossi (Cel12) used a hybrid approach combining the features used by earlier works Mai07
and Qu11. They developed an unsupervised personality recognition system that exploits correlations to build which does
not require previously annotated data.
Wald et al [66] (Wal12) used REPTRee for ranking individuals according to their personality traits using their own data
collected from Facebook.
Qiu et al examined the associations between personality traits and linguistic cues in microblogs.Due to insignificant
correlation, the self-reports and single observer reports are not sufficent. They also calculated Inter-Observer agreement,
showing moderate to strong agreement on all the big 5 traits.
Exploed use of crowdsourcing in obtaining personality impressions,
Shen, Brdiczka and Liu used Emails for personality prediction using three generative models, Joint, sequential and Survival models
NB, DT and SVM techniques were used for training and the reported results suggest Survival mode, in which the features are
selected independently for each label performs the best. Joint model has the worst performance as it assumes features being selected
jointly by all labels. For the selection model, labels are sequentially selected and they in turn select the features. They also tried to
measure trait specific predictive power of their extracted emial features using Information Gain.
Firoj et al in their effort to recognise personality traits in [38] (Fir13) followed bag of words approach using unigrams as features.
They trained SMO, Bayesian Logistic Regression (BLR), and Multinomial Naïve Bayes (mNB) classifiers with MyPersonality
dataset.
Verhoeven et al [31] (Vdd13) used SVM and ensemble learning to train classifiers for each of the five traits. SMO based meta-
learning experiments using most frequent character trigrams (MFT) were used for Facebook as well as Essays datasets
independently. The reported results have an average f-score of 0.72 over a 50 percent baseline.
In [32] Farnadi et al (Fa13) predicted personality traits for both the datasets using four sets of features including LIWC categories,
temporal features like frequency and number of status updates, social network features and other features like number of statuses
per user, number of capitalised words/letters/urls/proper names and words used more than once. They used SVM, kNN and NB
algorithms with a weighted average of precision, recall and f-measure based on class average as evaluation measures. The reported
results suggest that time features perform poorly with NB, whereas other features/algorithm combinations work quite well. Cross
domain generalisations from myPersonality to Essays provided poorer results than the other way around, still outperforming the
baselines. This concludes that the training set size matters in performing personality prediction using cross domain datasets
Tomlison et al in their effort to predict conscientiousness [33, 2013] (To13) from Facebook statuses. They relied on identifying the
verbal specificity and objectivity of verbs used for predictive accuracy of personality traits. It was found out that conscientiousness
is normally distributed within the dataset and as such better results may be attained by an approach that predicts only the outliers.
A linear regression model was used to analyse the features and the best reported RMSE was 0.63.
Facebook data was also used by Markovikj et al in [34] (Mar13) for personality trait prediction. They used a variety of lexical and
affective features as well as the social and demographic information from the MyPersonality dataset. SVM with Boosting was used
for predicting personality traits and the reported accuracy was almost 90 percent.
Iacobelli et al in [35] (Iac13) tried personality trait recognition on both the datasets. The posts in MyPersonality corpus were
combined on the basis of their authors, and unigrams, bi grams and n grams were combined as features for each document, and the
same was done for Essays. Condition Random Fields were used for structured classification in addition to NB, SMO and Logistic
Regression (LR). The reported accuracies fall in between 44.8 – 66 percent for MyPersonality and 51.62 – 61.81 percent for Essays.
Speech act annotation was used for MyPersonality corpus by Appling et al [36] (App13). They reported the correlations to
personality traits, which suggested some of the traits positively being predicted with speech acts and others negatively related. As
per their work Openness does not predict any speech act though.
Saif et al in [37] (Sai13) predicted personality traits using different lexical features like the ones used by Mai07, Token Unigrams,
Average Information Content (AIC), NRC and Turney Lexicons. They trained a SVM classifier with the Essays dataset and the
reported accuracy ranges from 55-60 percent
Poria et al [40], (Por13) combined common sense based features with psycholinguistic features and frequency based
features to be employed for classification using SMO based supervised classifiers. Essays dataset was used and the reported
results in the range of 63 to 66 percent.
In [41] Zuo et al used a weighted machine learning k-Nearest neighbour ML-kNN to predict personality traits from Essays
dataset. They used custom linguistic and emotion base features and the reported accuracy is about 65 percent
. In Montjoye et al [82] (Mont13) used smartphone based data for personality prediction. SVM with custom call record and
location based features resulted in an prediction accuracy of Big 5 traits in the range of 29 to 56 percent
Shwartz et al [83] (Shwa13) used Facebook based data for open vocabulary personality analysis and got significant
correlations for the big 5 traits with R score of up to 42 percent compared to the LIWC baseline score of 29 percent.
Shared challenge for computational Personality recognition 2
Farnadi et al used variations of Multi-Variate regression for personality analysis of YouTube Vlogs using a combination of Audio-
Visual and Text Features. However the Multi Regression techniques did not clearly outperform a single target approach, even with
a promiaing correlation between the traits.
Park, Kosinki etet al proposed a Language based Accesment (LBA) for modelling personality traits using facebook statuses from
extended MyPersoanlity dataset. They trained a Ridge regressor on late fusion of word/phrase relative frequency with Topic usage,
reduced using univariate feature selection and randomised PCA. The reported results suggest significant correlation between the LBA
results, self and observer personality reports.
In [42], Gou et al (Gou14) performed automatic recognition of personality from twitter data. They used LIWC word
categories for the Big 5 trait profiling of 256 twitter using employees of IBM. Lima and Castro in [81] (Lim14) used social
media data for personality prediction. Supervised learning for binary classification with NB, SVM and MLP was applied on
three available datasets based on twitter data. The reported prediction accuracy is about 83 percent.
One of the first works using neural networks in text based personality computation is Kalghati et al [46] (Kal15). They used
a twitter API to fetch tweets which after pre-processing are fed to a multi-layer perceptron (MLP), one for each trait among
the Big 5.
Chorley [80] (Cho15) used location based social network data for analysing personality. They tried to correlate the big 5
personality traits of 174 Foursquare users and got significant results.
In Pratama et al [85] (Prat15), a modified version of MyPersonality dataset was used for personality analysis using MNB, KNN,
SVM and a combination of the three. The reported results are around 60 percent. They also performed respondent testing for the
automatic personality prediction system using twitter data and the accuracy for that was around 65 percent
Shared Task for Author profiling from Twitter data.
Arroju, Hassan and Farnadi proposed a Ensemble Regression Chains Corrected (ERCC), a multivariate regression model for
predicting personality traits using N-grams and extended LIWC categories. They used multi-lingual twitter data from PAN 2015.
. Poddar et al in [43] (Pod15) introduced adjectival marker technique for personality recognition from ONSW. They used
biographical data of 574 personalities, extracting information from websites using a python web crawler. Regression using
LASSO resulted in personality recognition accuracy ranging from 81 to 92 percent.
Liu and Zhu proposed use of stacked AutoEncoders for unsupervised learning of Linguistic Represntation Feature Vector (LRFV)
based on SLIWC and FFT from Sina microblog. The features obtained were used to train a Linear Regression model and results
outperform the selected baselines.
Tighe et al used feature reduction texhniques for personality trait classification on Essays. Features extracted by LIWC categories
were reduced with Information gain and PCA. Linear Logistic Regression, libSVM and SMO were used to train and reported
results were comparable to the predeccosors with a far lower number of features.
Lukito [44] (Luk16) used feature reduction on the Essays dataset. Information gain and PCA resulted in 30 to 95 percent
reduction in original feature size of LIWC categories. The algorithms used were SMO, LibSVM and Linear Logistic Regression
(LLR) against a baseline ZeroR and the results reported are in the range of 51 to 62 percent
Pramodh et al [45] (Pra16) used Essays and MyPersonality datasets for personality recognition of authors. They employed
polarity based matching, POS tagging and parsing for personality score computation and the reported accuracies range
from 62 to 66 percent
In [47] Su et al, (Su16) used Chinese Linguistic Inquiry and word count (CLIWC) and Chinese Knowledge Information
Processing (CKIP) POS tags to extract linguistic features from MCDC dataset. 10 RNN’s were independently constructed to
model the items of the BFI-Inventory and then a Hidden Markov Model (C-HMM) was used to predict the personality traits
i8n a dyadic conversation with reported results up to about 88 percent.
Liu et al [48] (Liu16) used RNN based character to word to sentence feature methodology for prediction of personality traits
from a multi lingual dataset from PAN-AM-2015. This model is feature engineering free and language independent and
personality traits can be predicted purely based on text without any additional features.
Xianyu et al in [65] used Heterogenity-Entropy Nueral Network (HENN) based on Deep Boltzman Network (DBN) and Auto-
encoders for prediction of personality traits from data collected from Renren and Sina websites. Bag-of-Textual-Words was
used to extract a 1000 dimensional feature vector and Support vector regression achieved Mean Absolute Error (MAE) in
the range of 0.12 to 0.72 for text modality
. Skowron et al in [67] (Sko16) used textual data from Twitter and Instagram for personality recognition. LIWC categories
combined with network based features were used with Random Forest regressed with average reported RMSE of 0.73.
In [49] Majumder et al (Maj17) used CNN models for deep semantic feature extraction from the Essays dataset, they also
used word2vec with Mairesse baseline features and SVM, MLP and sMLP/FC algorithms were used for classification with
reported results ranging from 51 to 63 percent
. Yu and Markov in [86, 2017] (Yu17) used CNN and RNN for the shared task of WPCR-13 and the results are significantly
better, with a F1 score in range of 60 to 65 percent.
Ong et al proposed a personality prediction system using N-grams of twitter information. They use XGBoost and SVM on a custom
Twitter data and the best reported results are 76.23 and 97.99 per cent.
Tandera et al [89] (Tan17) used both shallow and deep learning algorithms for personality classification on mypersonality and a
custom facebook based dataset. The traditional alogorithms used are NB, SVM, LR, GB and LDA with LIWC, SPLICE and SNA
feature sets with an average accuracy from 61 to 67 percent. They also used deep learning algorithms MLP, LSTM, GRU and CNN
with word embedding features obtained with GloVe and reported average accuracies from 59 to 83 percent
Zaaba et al used facebook data for personality prediction of Malaysian users. Linguistic features fromk the LIWCcategories
alongwith activities and structural features were analysed for correlation with the personality to find the optimal sets per trait.The
reported results suggest that
. Giménez et al in [93] (Gim17) used CNN with word embeddings from Glove for personality analysis on the PAN-AP-2015
corpus. They trained five models varying the number and height of kernels and the reported accuracy is in the range 0.144
to 0.227 RMSE
Alsadhan and Skillcorn devised a word count based technique for personality prediction from small texts. They use Singular
value decomposition of document-word matrices for classification by a Frobenius norm. Evaluations on Big 5 as well as
MBTI on a varied range of datasets suggest that the proposed technique can be used for low resource languages as well due
to its language independent nature.
Tighe and Cheng experimented with about 600 models using various combinations of features and outlier focus for personality
recognition from twitter based data. They used Linear, Ridge, Logistic Regressors and SVM for regression as well as classification
tasks. The reported results suggest that focussing on outliers improves the results in case of classification, and feature reduction
reduced the accuracy.
Celli and Lepri performed a comparative personality prediction using separate twitter based multi-lingual datasets having Big 5
and MBTI labels. The results suggest algorithms trained on MBTI could have better performance.
Mao et al proposed a particle swarm optimisation technique for identifying the best features for Personality trait
estimation. They extract a combination of Stylometric and TF-IDF features, which are subjected to PSO to identify the best
set of features. Three classification algorithms kNN, NB and DT were trained with these features and the reported results
suggest significant improvements with the use of style features and PSO.
Xue et al. [50], presented a textual context personality recognition method based on deep learning. They suggested a
hierarchical structure based on AttRCNN for this purpose, which can learn deep semantic properties using user posts. The
results are promising that demonstrate that the suggested deep semantic attributes surpass the baseline characteristics.
Adi et al in [61] (Adi18) used Stochastic Gradient Descent (SGD), XGBoost (XGB) and Super Learner (SL) algorithms with
Hyperparameter tuning and Feature selection for personality detection from tweets in Bahasa language. The results
achieved show significant improvement over their predecessors
Sun et al [87] (Sun18) proposed a bidirectional LSTM model concatenated with a CNN called 2CLSTM for personality
prediction in Essays and Youtube datasets. Glove was used for obtaining word embedding. Latent sentence group (LSG) was
introduced for modelling structural features at sentence level and CNN used to learn the features. The average reported
results for Essays and Youtube are 55 and 60 percent respectively
Vu eta al explored vsrious lexical-semantic features based to improve Personality prediction on four diverse text datasets.
The utilised combinations of WordNet based semantic knowledge with Word sense disambiguation to train SVM binary
classifiers for personality trait estimation. The reported results show improvement over the Mairesse baseline for Essays
dataset.
An and Levitan in [88] (AnLe18) proposed a model that learns a linear combination of predictions made by MLP and LSTM to
produce a dinal fused prediction. They used mypersonality dataset with a combination of LIWC categories to the word
embeddings generated by DAL and word vectors from Skipgram and Glove. The reported results have an average prediction
accuracy of 67 percent
Carducci et al used pretrained embeddings with transfer learning for personality prediction from Twitter based text. They trained a
SVM classifier with pretrained FasText embeddings on MyPersonality model and custom twitter data was used for testing. The
reported result outpermoed the baseline.
Alamsyah et al proposed a Ontology based approach for personality prediction from Twitter data.
Santos et al compared the use of Psycholinguistic, data driven and facet based approaches for personality prediction in
Brazillian-Portoguese, a low resource language. The suggested results suggest that translations of the low resource
language texts can be used with English based Psycho-linguistic categories. However the accuracy of using TF-IDF features
and facet based approach exceeds the one that uses either translated version of LIWC or using LIWC on a translation of the
original text.
Tadesse et al used Pearson Coreltation for selection of optimal features in prediction of personality. They used a
combination of LIWC, Splice and SNA features to be trained on a XGBoost classifier. The results suggest improved
perfromance over the selected baseline models.
In [91, 2018] Carducci et al (Car18) used MyPersonality and a Twitter based dataset for personality analysis using SVR,
LASSO and LR. Distributed word embedding’s were used with Transfer learning and the reported prediction had a MSE of
33 to 70 percent for MyPersonality dataset and 13 to 38 percent for the custom Twitter data.
In [68] Yilmaz et al (Yil19) used LSTM based model for personality detection from Turkish texts. They translated an existing
dataset into Turkish and then an RNN based model was used to predict the big five traits from the Turkish texts. The test accuracies
reported are in the range of 55 to 68 percent
. Rahman et al [79] (Rah19) used Deep CNN for personality classification on the Essays dataset. They used three different activation
functions in which leaky ReLU outperformed for four, and tanh outperformed for CON over sigmoid function, with average F1
scores of 33.11 %, 47.25 % and 49.07 % for sigmoid, tanh and ReLU respectively.
In Akrami et al [84] (Akr19) used Support Vector Regression (SVR) and Language embedding (ULMFiT) algorithms for
personality classification from text based on four different discussion forums and news sites in Swedish. Two datasets were
assembled, one large with lower reliability (DLR) and another smaller with high reliability (DHR).
. Zheng et al [90] (Zhe19) used Pseudo Multi-view Co-training (PMC) as a semi-supervervised technique for personality classification
using MyPersonality as the labelled dataset and a custom Facebook based unlabelled dataset. The features vectors were obtained
using LIWC, TF-IDF of n-grams and a word embedding using PV-DM. The reported prediction was in the range of 62 to 71 percent
F1 score
Alamsyah et al built an Ontology model to analyse personality traits from Twitter based text data.
Darlinsyah et al proposed SENTIPEDE, a sentiment based model combining CNN and LSTM for personality prediction from short
texts. They extracted Sentiment features from Twitter based data using VADER, and then utilised pretrained GloVe embeddings for
training the combined model. The reported results suggest improved accuracy over rhe chosen baselines.
Artissa, Asror and Faraby tried variations of preprocessing techniques like Stemming and Lemmatisation using a MNB
Classifier on MyPersonality dataset.
Santos and Paraboni proposed machine learning models for predicting subtraits in place of actual Big 5 personality. The authors
suggest developing supervised learning models with more specific levels of abstraction might improve prediction accuracy.
In Ergu et al. [51], used Turkish tweets for personality computation. A variety of machine learning models (KNN, Decision tree
(DT), RF, AdaBoost, Stochastic Gradient Descent (SGD), Gradient Boosting (GB), and SVM) were tested to achieve this goal. The
best accuracy was reported with models trained on users' most recent 50 tweets, ranging from 0.76 to 0.97
Rahman et al [79] (Rah19) used Deep CNN for personality classification on the Essays dataset. They used three different activation
functions in which leaky ReLU outperformed for four, and tanh outperformed for CON over sigmoid function, with average F1
scores of 33.11 %, 47.25 % and 49.07 % for sigmoid, tanh and ReLU respectively.
Fernandes et al uses adjective selection from Sausier's test to predict personality. Gradient Boostedregression and Classification
Trees were used on a custom datset obtained by crowdsourcing. The reported results suggest this to be a significant method for
personality trait estimation using the adjectives best suited for a particular trait.
. Rohit et al. [52], conducted research to determine a user's personality based on their social media profile status information. Then
they further categorized the users’ personalities into one of the OCEAN model's categories based on the analysis results and their
model achieved good accuracy.
Tighe et al explored use of Nueral networks for personality prediction of Filipino twitter users. They used various combinations of
MLP with pretrained as well as self trained embeddings. They reported Nueral networks using crafted features like TF-IDF
performed better than word embeddings. Further, the performance was very poor for Openness models.
Mehta et al. [53], developed a unique deep learning-based approach to assess personality from the Essays dataset for OCEAN traits
and the Kaggle dataset for MBTI using language modelling features in conjunction with conventional psycholinguistic features. For
both datasets, the results demonstrated that language model embedding’s frequently outperformed traditional psycholinguistic
features, and suggested models outperformed state-of-the-art
. Kazameini et al. [54], created an efficient and robust text-based deep learning model for predicting personality. They used context
independent embedding’s by word2vec and BERT, along with psycholinguistic features, to a Bagged SVM (BB-SVM) classifier,
reporting accuracy of about 59 percent.
In Jayaratne et al. [55], used open-vocabulary natural language processing techniques combined with machine learning to
deduce someone’s based on their use of words during a job interview. For this purpose, they employed five alternative text
representation approaches to generate regression models for each HEXACO trait. The accuracy of text representation based
on terms and topics was the best. They next tested their model on a sample of 117 volunteers, who scored the individual
trait descriptors created as a result of the model's outputs using the yes/no/maybe agreement scale, and each of the six
personality traits received an average of 87.83% agreement from the participants.
Leanardi et al proposed sentense embeddings for Personality prediction from text. They use an optimised BERT architecture for
extracting sentense embeddings from text and also propose a Stacked Nueral Network. In three diffirent experiments, they used
sentense Embeddings to train SVM regressor, FastText word embeddings to train their stacked NN and finally Sentense
Embeddings to train their stacked NN. The results of the Sentense embeddings with Stacked NN outperformed the other two as well
as other existing works on MyPersonality dataset.
Guan et al experimented with Newtwork Representation Learning for prediction of Personality traits from text data. The poposed,
Personality2Vec model uses a biased walk, modeified Huffman Tree and Skip-Gram model to attribute a personality vector for each
user in the data. Support vector Regression model was trained using these vectors to predict personality traits and the reported
results outperform the baseline methods chosen by the authors.
Wang et al propose a Graph Convolutional Network for personality prediction from text. They constructed a heterogenous graph of
user documents using TF-IDF to capture the correlations between the text and the personality traits.The reported results show
improved results over the chosen baselines for Essays and MyPersonality datasets.
Jiang et al in [92] (Jian20) used several deep learning algorithms for personality prediction on ESSAYS dataset. The algorithms
used were Attention based CNN (ABCNN), Attention based LSTM (ABLSTM), Heirarchial Attention Network (HAN), BERT and
RoBERTa. They also adapted the models for a fresh corpus developed called FriendsPersona. The reported accuracies range from
55 to 67 percent for ESSAYS and 53 to 65 percent for FriendsPersona dataset
Başaran et al [59] (Bas21) used Back propagation Neural Network (BPNN) on MyPersonality dataset for computation of
personality. They utilised the Facebook network features for personality trait estimation with reported accuracy of about 85
percent.
.In Xue et al. [56] (Xue21), created a unique semantic-enhanced personality recognition neural network (SEPRNN) that can identify
numerous personality features. They employed context-learning-based word-level semantic representation followed by a fully
connected layer to acquire text's higher-level semantics and when compared to different baselines their suggested strategy improves
accuracy significantly.
Pabon and Arroyave experimented with diffirent word level embeddings for predicting personality traits. Word2Ved, GloVe and
BERT embeddings from YouTube dataset were used for training SVM for three diffirent tasks Regression, Binary Clasification
and a Tri-Class mainifestation of personality traits (High, Medium and Low). The reported results suggest that Word2Vec and
GloVe may be combined to provide better results.
Demerdash et al. [57], (Dem-21) presented a deep learning approach for personality evaluation using fusion approaches pre-trained
Language models for transfer learning, and their proposed model outperforms the baseline results.
Hassanein et al used twitter data to interpoloate the relationship between Big 5 traits, Human Needs and Personal
values.Pearson correlation was used to mesure the corelation between the attributes and the traits.
Christian et al. [58, 2021] (Chr21), proposed a new feature extraction strategy for various social networking data sources based on a
multi-model deep learning architecture paired with numerous pre-trained language models such as BERT, RoBERTa, and XLNet,
to develop personality prediction systems, and their model achieved good accuracy.
Jeremy et al [107] (Jer21) used LSTM, Bi-LSTM and GRU models for personality prediction of Indonesian twitter users. The
highest reported F Score is 83 percent.
Alamsyah et al improve their previously developed Ontology model by using N-Grams in place of Radix tree
Ong et al tried personality prediction for Indonesian users with twitter information. They use XGBoost and SVM on a custom
Twitter data and the reported results show good performance for Agreaableness and Openness traits.
Mavis, Torslu and Karagoz used Turkish twitter data for personality trait prediction. They experimented with several
classical models like SVC, DT and kNN using embeddings based on TF-IDF, word2Vec. Comparative analysis of the results
with MyPersonality dataset on the same models suggests that the approach can be used on other languages as well. They
also trained a LSTM network with the Turkish data but no significant improvemnt was achieved except for Agreeableness
trait.
Wang et al used a XLNet-Capsule for time series based prediction of personality from text data. The proposed model is seperately
used for extracting deep level features from the text data and then classification of personality traits. The reported results suggest
improved performance over a wide range of baselines selected by the authors.
Stajner and Yenikent studied why it is difficult to model personality based on MBTI scores as compared to Big 5.
In Kosan [108] (Kos22) used LSTM and Bi-LSTM models for personality prediction from a twitter based dataset. They
experimented with FastText embedding and varying pre-processing techniques to arrive at their final model, which uses specific
pre-processing methods like stemming, lemmatization with fasText and Bi-LSTM. The average reported results for all the Big 5
traits is RMSE 0.1681.
Giorgi et al established regional diffirences in persoanlity traits with social media language. A language based accesment was
performed on over 6 million twitter users from diffirent counties of the United states. The results suggest use of language can help us
identify the personality trait diffirences between people across regions.
Shah et al used logistic regression for personality trait estimation for twitter data.
Ramezani et al. [20] propose a Graph enabled text-based automatic personality prediction. They built a knowledge graph of low-
level text features to be used for training CNN, LSTM and Bi- LSTM Neural networks achieving accuracies of up to 77 per cent.
Safitri and Setiawan experimented with diffirent various Kernels using LIWC and TF-IDF features to predict personality traits
from custom Twitter data.The results suggested RBF kernel with combined LIWC and Tf-IDF as the best performer.
Li et al propose a Multi task classification for Personality and emotion detection using deep learning models.They experimented
with diffirent sharing gates for information exchange between the LSTM based models. It was found that the model using SoG with
Model Agnostic Meta Learning outperformed all the other variants and selected baselines.
Deliami, sadr and Nazari used AdaBoost on top of CNN for personality prediction from text. They proposed use of separate pooling
and classifcation layers to extract relevent features, and AdaBoost for increasing the learning rate.
Deliami, sadr and Tarkhan used a combination of CNN and AdaBoost with varying filters for personality prediction form Essays
and YouTube datasets. The reported results outperform the selected baselines. It was concluded that using AdaBoost efficently
increases the classifcation accuracy of classification.
Feature Text
Journal/ Conference Year NRL
Mode Features
1981
Journal,
1999
BJP
1999
LIWC/MRC
2002 Hybrid
Bi-Grams
2002
2004
Journal
2004
APA PsychNet
Systematic Function
2005 Top-Down
Grammer
LIWC/MRC
Conference,
2006 Top-Down
AMCSS
Speech Acts
2006
2006
LIWC
MRC
2007 Top-Down
2007
ICWSM
2007
Conference
2007
2009
2010
2010
2011
2011
2011
LIWC
2011 Hybrid
N-Grams
2011
2011
2012
2012
2012
2012
2012
Conference, LIWC,
2012 Hybrid
IRI Handcrafted
Conference, ICDS 2012 Mairesse Baseline
2012
2012
2012
Jornal, IEEE Transactions on

2012 Audio-Vsual
Multimedia
BoW
Statistical Features
Conference, UMAP 2013 Bottom-Up Upitt
Writing Style
Speech Acts
Conference,
2013 Top-Down Mairesse Baseline
AAAI
Conference,
2013 Bottom-Up TF-IDF
AAAI
Conference,
2013 Bottom-Up N-Grams
AAAI
Conference,
2013
AAAI
Conference,
2013
AAAI
Conference,
2013
AAAI
Conference,
2013
AAAI
Conference,
2013
AAAI
Conference,
2013
AAAI
2013 Common Sense
2013
2013
Diffirential Language
2013 Bottom-Up
Analysis
2013
2014
LIWC,
NRC,
MRC,
2014
SentiStrength,
SPLICE
Normalised relative
Journal, 2014 Bottom-Up frequency, Word
PSP representation, Topic
usage
2014
2014
2014
2014
2014
2015
2015
Conference,
2015 Bottom-Up TF-IDF
ICoDSE
Conference,
2015
CLEF
N-Grams,
Conference,
2015 Hybrid LIWC,
CLEF
2015
2015
2016
Journal,
2016 Bottom-Up FFT-SCLIWC
PeerJ
Conference,
2016 Top-Down LIWC
IJCAI
Conference,
2016
ICITEE
2016
2016
2016
2016
2016
2017 Mairesse Baseline
2017
2017
2017
2017
2017 Bottom-Up N.A N-Grams
Conference,
2017 N.A
NCTM
GloVe
Bottom-Up,
2017 N.A
LIWC
Top-Down
SPLICE
Journal
2017 Top-Down N.A LIWC
ASL
Conference,
2017 N.A
AAAI
2017
Conference,
2017 Bottom-Up N.A Word Count
ICDMW
(Label Distribution
2017 Top-Down Learning)
TextMind
N.A
TF-IDF,
2018 Bottom-Up
TO
N.A
N-Grams,
LIWC,
2018 Hybrid N.A
Metadata
TF-IDF
2018 Bottom-Up N.A
Style
2018
2018
2018
2018
WordNet
2018 Top-Down N.A N-Grams
SentiWordNet
2018
2018 Bottom-Up N.A FastText
Conference,
2018 Bottom-Up N.A Ontology
ICoICT
Top-Down LIWC-B
2018 N.A
Bottom-Up TF-IDF
Splice
2018 Hybrid N.A
LIWC
2018
2019
2019
2019
2019
Conference,
2019 Bottom-Up N.A Ontology
ICIMTech
Journal, VADER
2019 Bottom-Up N.A
UCS GloVe
Journal,
2019 Bottom-Up N.A TF-IDF
JOP
BoW,
2019 Bottom-Up N.A SkipGram,
CBoW
2019
2019
2019
2020 Bottom-Up
2020
TF-IDF
TO
Conference,
2020 Bottom-Up GloVE
PCSC
FastText
Self Trained*
2020
2020
2020
Journal, Sentense Embeding,

2020 Bottom-Up N.A
Information FastText
2020
2020
Conference, LIWC,
2020 Hybrid
DSC Handcrafted
2020
Users,
Journal,
2020 Hybrid Documents, TF-IDF
Applied Sciences
Words
2020
2020
2021
Journal,
2021 Bottom-Up N.A GloVe
Applied Intelligence
Word2Vec
Jornal, Top-Down
2021 N.A GloVe
SCE Bottom-Up
BERT
Journal,
2021
IEEE Access
2021
2021
2021
2021
2021
2021
2021
2021
2021
N-Grams
2021 Bottom-Up
Jornal,
2021 Bottom-Up Handcrafted
IES
2021
TF-IDF
2021 Bottom-Up
Word Embedding
Journal,
2021 Bottom-Up CapsuleNet
Electronics
2021
2022
Journal of Personality 2022
2022
Dbpedia.
Journal,
2022 Hybrid Yes NRC,
CIN
MRC
LIWC,
Jurnal RESTI 2022 Hybrid N.A
TF-IDF
GloVe
2022 Bottom-Up N.A
2022 Bottom-Up Skip-Gram
2022 Bottom-Up Skip-Gram

Feature Reduction,
Semantics Sharing Gate Model
Selection
N.A N.A
N.A N.A
N.A N.A
N.A N.A
N.A N.A
N.A N.A
N.A N.A
SMO
LR, M5R, M5D , REPT
SMO, NB, NN , J48, JRip,

AdaBoost
DNN, CNN
fgf
N.A
LinR,
N.A REPTree,
Dtable
NB
N.A Decision Tree
SVM
N.A
SMO.
N.A BLR,
MNB
N.A SMO
N.A
ST,
MTS,
MTSC,
ERC,
ERCC,
MORF
Univariate Feature Selection, Ridge Regressor

PCA
MNB,
N.A kNN,
SVM
ERCC
Stacked AE LinR
libSVM
Information Gain
SMO
PCA
Linear Logistic Regression
ANN
LDA SVM
NB,
SVM,
N.A LR,
GB,
LDA
Hoeffding Tree
Yes
Naïve Bayes
Gaussian Process
Regression
N.A
LR
Ridge
Yes
SVM
LogR
N.A
N.A N.A SVM
kNN
N.A PSO NB
DT
WORD
SENSE
N.A Selective WSD+ Chi Square SVM
S_SENSE
SENTI
N.A SVM (RBF)

Best first
SVM
GB
Correlation
LR
XGBoost
OM
N.A N.A N.A
N.A N.A MNB
N.A N.A
GradientBoosted Tree
Classifier
GradientBoosted Tree
Regressor
No MLP
N.A SVM
N.A SVR
N.A N.A
N.A
BPNN
N.A N.A N.A
N.A N.A SVM

N.A XGBoost
SVC
DT
kNN
CNN,
RNN,
RDF2Vec
LSTM,
BiLSTM
CHI SVM (L,R,P)
SiG
CAG N.A MAML
SiL
N.A N.A
N.A
Balancing
Filter Deep Model Ensemble
Oversampling
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A N.A
N.A N.A
Yes N.A
Yes
N.A
Yes
Yes
XGBoost
MLP,
LSTM,
N.A SMOTE
GRU,
CNN 1 D
N.A
N.A N.A N.A
N.A N.A N.A

MLP
N.A N.A N.A

LSTM + CNN N.A N.A
N.A N.A N.A
N.A N.A
Stacked NN N.A N.A
Peronality2vec N.A N.A
GCN N.A N.A
ABCNN,
ABLSTM,
HAN,
BERT,
RoBERTa
SEPRNN N.A N.A
N.A N.A N.A

N.A N.A
LSTM
XL-Net
N.A N.A
ROS,
N.A RUS,
SVM-SMOTE
CNN
CNN AdaBoost N.A
CNN AdaBoost
Ref REFERENCES Type
Goldberg, L. R. (1981). Language and individual differences: The search for universals
Correlation
in personality lexicons. Review of personality and social psychology, 2(1), 141-165.
Buchanan, T. (1999). Using the Internet for Psychological Research: Personality Testing
on the World Wide Web Cite this paper Int ernet research: Self-monit oring and Correlation
judgment s of at t ract iveness.
1. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an

11 individual difference. Journal of Personality and Social Psychology, 77(6), 1296–1312. Correlation
https://doi.org/10.1037/0022-3514.77.6.1296
Gill, A. J., & Oberlander, J. (2002). Taking Care of the Linguistic Features of
110 Extraversion. In Proceedings of the Annual Meeting of the Cognitive Science Correlation
Society (Vol. 24, No. 24).
Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A room with a cue:
personality judgments based on offices and bedrooms. Journal of personality and social Correlation
psychology, 82(3), 379.
Ashton, M. C., Lee, K., & Goldberg, L. R. (2004). A hierarchical analysis of 1,710
English personality-descriptive adjectives. Journal of Personality and Social Correlation
Psychology, 87(5), 707.
Vazire, S., & Gosling, S. D. (2004). e-Perceptions: personality impressions based on

Correlation
personal websites. Journal of personality and social psychology, 87(1), 123.
Argamon, S., Dhawle, S., Koppel, M., & Pennebaker, J. W. (2005, June). Lexical
12 predictors of personality type. In Proceedings of the 2005 Joint Annual Meeting of the Prediction
Interface and the Classification Society of North America (pp. 1-16).
Correlation
Mairesse, F., & Walker, M. (2006). Words mark the nerds: Computational models of
personality recognition through language. In Proceedings of the Annual Meeting of the
Cognitive Science Society (Vol. 28, No. 28).
Prediction
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural
15 habitat: manifestations and implicit folk theories of personality in daily life. Journal of Correlation
personality and social psychology, 90(5), 862.
1. Oberlander, J., & Nowson, S. (2006, July). Whose thumb is it anyway? Classifying
16 author personality from weblog text. In Proceedings of the COLING/ACL 2006 Main
Conference Poster Sessions (pp. 627-634).
1. Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic
Correlation
17 cues for the automatic recognition of personality in conversation and text. Journal of
Prediction
artificial intelligence research, 30, 457-500.
6. Estival, D., Gaustad, T., Pham, S. B., Radford, W., & Hutchinson, B. (2007,
24 September). Author profiling for English emails. In Proceedings of the 10th Conference
of the Pacific Association for Computational Linguistics (Vol. 263, p. 272).
Gosling, S. D., Gaddis, S., & Vazire, S. (2007). Personality impressions based on
Correlation
facebook profiles. Icwsm, 7, 1-4.
Nowson, S., & Oberlander, J. (2007). Identifying more bloggers Towards large
scale personality classification of personal weblogs. Ανακτήθηκε από
https://www.researchgate.net/publication/239542111
Gill, A., Nowson, S., & Oberlander, J. (2009, March). What are they blogging
about? Personality, topic and motivation in blogs. In Proceedings of the
International AAAI Conference on Web and Social Media (Vol. 3, No. 1, pp. 18-25).
1. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words:
LIWC and computerized text analysis methods. Journal of language and social Correlation
psychology, 29(1), 24-54.
Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality

Correlation
and word use among bloggers. Journal of research in personality, 44(3), 363-373.
Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of
research in personality, 45(1), 92-99.
4. Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October).
Predicting personality from twitter. In 2011 IEEE third international conference on
22
privacy, security, risk and trust and 2011 IEEE third international conference on
social computing (pp. 149-156). IEEE.
5. Golbeck, J., Robles, C., & Turner, K. (2011). Predicting personality with social
23 media. In CHI'11 extended abstracts on human factors in computing systems (pp.
253-262).
7. Iacobelli, F., Gill, A. J., Nowson, S., & Oberlander, J. (2011, October). Large
scale personality classification of bloggers. In international conference on
25
affective computing and intelligent interaction (pp. 568-577). Springer, Berlin,
Heidelberg.
8. Quercia, D., Kosinski, M., Stillwell, D., & Crowcroft, J. (2011, October). Our
twitter profiles, our selves: Predicting personality with twitter. In 2011 IEEE third
26 Prediction
international conference on privacy, security, risk and trust and 2011 IEEE third
international conference on social computing (pp. 180-185). IEEE.
2. Chittaranjan, G., Blom, J., & Gatica-Perez, D. (2011, June). Who's who with
20 big-five: Analyzing and classifying personality traits with smartphones. In 2011
15th Annual international symposium on wearable computers (pp. 29-36). IEEE.
60. Adali, S., & Golbeck, J. (2012, August). Predicting personality with social
78 behavior. In 2012 IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining (pp. 302-309). IEEE.
9. Bachrach, Y., Kosinski, M., Graepel, T., Kohli, P., & Stillwell, D. (2012, June).
27 Personality and patterns of Facebook usage. In Proceedings of the 4th annual
ACM web science conference (pp. 24-32).
10. Bai, S., Zhu, T., & Cheng, L. (2012). Big-five personality prediction based on
28
user behaviours at social network sites. arXiv preprint arXiv:1204.4809.
12. Kermanidis, K. L. (2012, May). Mining authors’ personality traits from Modern
30 Greek spontaneous text. In Proc. of Workshop on Corpora for Research on
Emotion Sentiment & Social Signals, in conjunction with LREC (pp. 90-93).
3. Staiano, J., Lepri, B., Aharony, N., Pianesi, F., Sebe, N., & Pentland, A. (2012,
September). Friends don't lie: inferring personality traits from social network
21
structure. In Proceedings of the 2012 ACM conference on ubiquitous computing
(pp. 321-330).
Wald, R., Khoshgoftaar, T., & Sumner, C. (2012, August). Machine prediction of
personality from Facebook profiles. In 2012 IEEE 13th International Conference on Prediction
Information Reuse & Integration (IRI) (pp. 109-115). IEEE.
Celli, F. (2012, March). Unsupervised personality recognition for social network sites.
In Proc. of sixth international conference on digital society (pp. 59-62).
11. Celli, F., & Rossi, L. (2012, April). The role of emotional stability in Twitter
29 conversations. In Proceedings of the workshop on semantic analysis in social
media (pp. 10-17).
48. Wald, R., Khoshgoftaar, T., & Sumner, C. (2012, August). Machine prediction
66 of personality from Facebook profiles. In 2012 IEEE 13th International Conference
on Information Reuse & Integration (IRI) (pp. 109-115). IEEE.
Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality
expression and perception on Twitter. Journal of Research in Personality, 46, 710–
718. doi:10.1016/j.jrp.2012.08.008
87. Biel, J. I., & Gatica-Perez, D. (2012). The youtube lens: Crowdsourced
105 personality impressions and audiovisual analysis of vlogs. IEEE Transactions on
Multimedia, 15(1), 41-55.
Shen, J., Brdiczka, O., & Liu, J. (2013, June). Understanding email writers: Personality
prediction from email messages. In International conference on user modeling, Prediction
adaptation, and personalization (pp. 318-330). Springer, Berlin, Heidelberg.
Mohammad, S., & Kiritchenko, S. (2013). Using nuances of emotion to identify
personality. In Proceedings of the International AAAI Conference on Web and Social
Media (Vol. 7, No. 2, pp. 27-30).
Alam, F., Stepanov, E. A., & Riccardi, G. (2013). Personality traits recognition on social
network-facebook. In Proceedings of the International AAAI Conference on Web and Prediction
Social Media (Vol. 7, No. 2, pp. 6-9).
13. Verhoeven, B., Daelemans, W., & De Smedt, T. (2013, June). Ensemble methods for
31 personality recognition. In Seventh International AAAI Conference on Weblogs and Prediction
Social Media.
14. Farnadi, G., Zoghbi, S., Moens, M. F., & De Cock, M. (2013, June). Recognising
32 personality traits using facebook status updates. In Proceedings of the International
AAAI Conference on Web and Social Media (Vol. 7, No. 1).
15. Tomlinson, M. T., Hinote, D., & Bracewell, D. B. (2013, June). Predicting
33 conscientiousness through semantic analysis of facebook posts. In Seventh International
AAAI Conference on Weblogs and Social Media.
16. Markovikj, D., Gievska, S., Kosinski, M., & Stillwell, D. J. (2013, June). Mining
34 facebook data for predictive personality modeling. In Seventh International AAAI
Conference on weblogs and social media.
17. Iacobelli, F., & Culotta, A. (2013, June). Too neurotic, not too friendly: structured
35 personality classification on textual data. In Seventh International AAAI Conference on
Weblogs and Social Media.
18. Appling, D., Briscoe, E., Hayes, H., & Mappus, R. (2013, June). Towards automated
36 personality identification using speech acts. In Proceedings of the International AAAI
Conference on Web and Social Media (Vol. 7, No. 1).
19. Mohammad, S., & Kiritchenko, S. (2013, June). Using nuances of emotion to
37 identify personality. In Seventh International AAAI Conference on Weblogs and Social
Media.
22. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., & Howard, N. (2013,
November). Common sense knowledge based personality recognition from text.
40
In Mexican International Conference on Artificial Intelligence (pp. 484-496).
Springer, Berlin, Heidelberg.
23. Zuo, X., Feng, B., Yao, Y., Zhang, T., Zhang, Q., Wang, M., & Zuo, W. (2013,
41 September). A weighted ML-KNN model for predicting users’ personality traits. In
Proc. Int. Conf. Inf. Sci. Comput. Appl.(ISCA) (pp. 345-350).
64. Montjoye, Y. A. D., Quoidbach, J., Robic, F., & Pentland, A. S. (2013, April).
Predicting personality using novel mobile phone-based metrics. In International
82
conference on social computing, behavioral-cultural modeling, and prediction (pp.
48-55). Springer, Berlin, Heidelberg.
65. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M.,
83 Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language
of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
Biel, J. I., & Gatica-Perez, D. (2013). The youtube lens: Crowdsourced personality
impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia, 15,
41–55. doi:10.1109/TMM.2012.2225032
21. Celli, F., Lepri, B., Biel, J. I., Gatica-Perez, D., Riccardi, G., & Pianesi, F. (2014,
39 November). The workshop on computational personality recognition 2014. In
Proceedings of the 22nd ACM international conference on Multimedia (pp. 1245-1246).
Farnadi, G., Sushmita, S., Sitaraman, G., Ton, N., De Cock, M., & Davalos, S. (2014,
November). A multivariate regression approach to personality impression recognition
Prediction
of vloggers. In Proceedings of the 2014 ACM Multi Media on Workshop on
Computational Personality Recognition (pp. 1-6).
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., ... & Correlation
Seligman, M. E. (2015). Automatic personality assessment through social media
language. Journal of personality and social psychology, 108(6), 934.
Prediction
24. Gou, L., Zhou, M. X., & Yang, H. (2014, April). KnowMe and ShareMe:
understanding automatically discovered personality traits from social media and
42
user sharing preferences. In Proceedings of the SIGCHI conference on human
factors in computing systems (pp. 955-964).
Sarkar, C., Bhatia, S., Agarwal, A., & Li, J. (11 2014). Feature analysis for
computational personality recognition using YouTube personality data set. 11–14.
doi:10.1145/2659522.2659528
Solinger, C., Hirshfield, L., Hirshfield, S., Friendman, R., & Lepre, C. (2014). Beyond
Facebook Personality Prediction: A Multidisciplinary Approach to Predicting Social
Media Users’ Personality (τ. 8531, σσ. 486–493).
Gievska, S., & Koroveshovski, K. (11 2014). The impact of affective verbal content on
predicting personality impressions in YouTube videos. 19–22.
doi:10.1145/2659522.2659529
63. Lima, A. C. E., & De Castro, L. N. (2014). A multi-label, semi-supervised

81 classification approach applied to personality prediction in social media. Neural
Networks, 58, 122-130
28. Kalghatgi, M. P., Ramannavar, M., & Sidnal, N. S. (2015). A neural network
46 approach to personality prediction based on the big-five model. International
Journal of Innovative Research in Advanced Engineering (IJIRAE), 2(8), 56-63.
62. Chorley, M. J., Whitaker, R. M., & Allen, S. M. (2015). Personality and
80
location-based social networks. Computers in Human Behavior, 46, 45-56.
67. Pratama, B. Y., & Sarno, R. (2015, November). Personality classification based on
85 Twitter text using Naive Bayes, KNN and SVM. In 2015 International Conference on Prediction
Data and Software Engineering (ICoDSE) (pp. 170-174). IEEE.
76. Rangel Pardo, F. M., Celli, F., Rosso, P., Potthast, M., Stein, B., & Daelemans, W.
94 (2015). Overview of the 3rd Author Profiling Task at PAN 2015. In CLEF 2015
Evaluation Labs and Workshop Working Notes Papers (pp. 1-8).
Arroju, M., Hassan, A., & Farnadi, G. (2015). Age, gender and personality recognition
using tweets in a multilingual setting. In 6th Conference and Labs of the Evaluation
Prediction
Forum (CLEF 2015): Experimental IR meets multilinguality, multimodality, and
interaction (Vol. 23, p. 31).
Pla, F., Giménez, M., & Hernández, D. I. (χ.χ.). Segmenting Target Audiences: Automatic
Author Profiling using Tweets: Notebook for PAN at CLEF 2015 Sentiment Analysis for
Spanish Language View project ASLP-MULAN-UPV: Audio, Speech and Language
Processing for Multimedia Analytics View project Segmenting Target Audiences:
Automatic Author Profiling Using Tweets. Notebook for PAN at CLEF 2015. Ανακτήθηκε
από https://www.researchgate.net/publication/281240866
Peng, K. H., Liou, L. H., Chang, C. S., & Lee, D. S. (2015, October). Predicting
personality traits of Chinese users based on Facebook wall posts. In 2015 24th Wireless
and Optical Communication Conference (WOCC) (pp. 9-14). IEEE.
25. Poddar, S., Kattagoni, V., & Singh, N. (2015). Personality Mining from
43
Biographical Data with the" Adjectival Marker" Technique. In BD (pp. 39-47).
Liu, X., & Zhu, T. (2016). Deep learning for constructing microblog behavior
representation to identify social media user’s personality. PeerJ Computer Science, 2, Prediction
e81.
Tighe, E. P., Ureta, J. C., Pollo, B. A. L., Cheng, C. K., & de Dios Bulos, R. (2016, July).
Personality Trait Classification of Essays with the Application of Feature Reduction. Prediction
In SAAIP@ IJCAI (pp. 22-28).
26. Lukito, L. C., Erwin, A., Purnama, J., & Danoekoesoemo, W. (2016, October).
Social media user personality classification using computational linguistic. In 2016
44
8th International Conference on Information Technology and Electrical
Engineering (ICITEE) (pp. 1-6). IEEE.
27. Pramodh, K. C., & Vijayalata, Y. (2016, October). Automatic personality
45 recognition of authors using big five factor model. In 2016 IEEE International
Conference on Advances in Computer Applications (ICACA) (pp. 32-37). IEEE.
29. Su, M. H., Wu, C. H., & Zheng, Y. T. (2016). Exploiting turn-taking temporal
47 evolution for personality trait perception in dyadic conversations. IEEE/ACM
Transactions on Audio, Speech, and Language Processing, 24(4), 733-744.
30. Liu, F., Perez, J., & Nowson, S. (2016). A language-independent and
48 compositional model for personality trait recognition from short texts. arXiv
preprint arXiv:1610.04345.
47. Xianyu, H., Xu, M., Wu, Z., & Cai, L. (2016, July). Heterogeneity-entropy based
65 unsupervised feature learning for personality prediction with cross-media data. In
2016 IEEE international conference on multimedia and Expo (ICME) (pp. 1-6). IEEE.
49. Skowron, M., Tkalčič, M., Ferwerda, B., & Schedl, M. (2016, April). Fusing social
67 media cues: personality prediction from twitter and instagram. In Proceedings of
the 25th international conference companion on world wide web (pp. 107-108).
31. Majumder, N., Poria, S., Gelbukh, A., & Cambria, E. (2017). Deep learning-
49 based document modeling for personality detection from text. IEEE Intelligent Prediction
Systems, 32(2), 74-79.
68. Yu, J., & Markov, K. (2017, November). Deep learning based personality
86 recognition from facebook status updates. In 2017 IEEE 8th International
Conference on Awareness Science and Technology (iCAST) (pp. 383-387). IEEE.
POLITECNICO DI TORINO Semantic Analysis to Compute Personality Traits from

Social Media Posts. (2017).
Alsadhan, N., & Skillicorn, D. (12 2017). Estimating Personality from Social Media
Posts. 2017-Noember, 350–356. doi:10.1109/ICDMW.2017.51
Ahmad, Z., Lutfi, S. L., Kushan, A. L., & Yixing, R. T. (8 2017). Personality prediction
of malaysian facebook users: Cultural preferences and features variation. Advanced
Science Letters, 23, 7900–7903. doi:10.1166/asl.2017.9604
Ong, V., Rahmanto, A. D. S., Williem, W., Suhartono, D., Nugroho, A. E., Andangsari,
E. W., & Suprayogi, M. N. (11 2017). Personality prediction based on Twitter Prediction
information in Bahasa Indonesia. 367–372. doi:10.15439/2017F359
Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., Nugaliyadde, A., &
Thelijjagoda, S. (3 2017). Linguistic features based personality recognition using social
media data. 63–68. doi:10.1109/NCTM.2017.7872829
71. Tandera, T., Suhartono, D., Wongso, R., & Prasetio, Y. L. (2017). Personality
89 Prediction
prediction system from facebook users. Procedia computer science, 116, 604-611.
Ahmad, Z., Lutfi, S. L., Kushan, A. L., & Yixing, R. T. (2017). Personality Prediction of
Malaysian Facebook Users: Cultural Preferences and Features Variation. Advanced Prediction
Science Letters, 23(8), 7900-7903.
Arnoux, P. H., Xu, A., Boyette, N., Mahmud, J., Akkiraju, R., & Sinha, V. (2017, May).
25 tweets to know you: A new model to predict personality with social media.
In Eleventh international AAAI conference on web and social media.
75. Giménez, M., Paredes, R., & Rosso, P. (2017, April). Personality recognition
using convolutional neural networks. In International Conference on
93
Computational Linguistics and Intelligent Text Processing (pp. 313-323). Springer,
Cham.
Alsadhan, N., & Skillicorn, D. (2017, November). Estimating personality from social
media posts. In 2017 IEEE international conference on data mining workshops
(ICDMW) (pp. 350-356). IEEE.
Xue, D., Hong, Z., Guo, S., Gao, L., Wu, L., Zheng, J., & Zhao, N. (2017). Personality
recognition on social media with label distribution learning. IEEE access, 5, 13478-
13488.
Tighe, E., & Cheng, C. (2018, June). Modeling personality traits of filipino twitter users.
In Proceedings of the Second Workshop on Computational Modeling of People’s Prediction
Opinions, Personality, and Emotions in Social Media (pp. 112-122).
Celli, F., & Lepri, B. (2018). Is big five better than MBTI? A personality computing
Prediction
challenge using Twitter data. Computational Linguistics CLiC-it 2018, 93.
Mao, Y., Zhang, D., Wu, C., Zheng, K., & Wang, X. (2018, December). Feature
analysis and optimisation for computational personality recognition. In 2018 IEEE
Prediction
4th International Conference on Computer and Communications (ICCC) (pp. 2410-
2414). IEEE.
32. Xue, D., Wu, L., Hong, Z., Guo, S., Gao, L., Wu, Z., & Sun, J. (2018). Deep
50 learning-based personality recognition from text posts of online social networks.
Applied Intelligence, 48(11), 4232-4246.
43. Adi, G. Y. N., Tandio, M. H., Ong, V., & Suhartono, D. (2018). Optimization for
61 automatic personality recognition on Twitter in Bahasa Indonesia. Procedia
Computer Science, 135, 473-480.
44. Costantini, G., & Perugini, M. (2018). A Framework for Testing Causality in
62 Personality Research. European Journal of Personality, 32(3), 254–268.
https://doi.org/10.1002/per.2150
69. Sun, X., Liu, B., Cao, J., Luo, J., & Shen, X. (2018, May). Who am I? Personality
87 detection based on deep learning for texts. In 2018 IEEE International Conference
on Communications (ICC) (pp. 1-6). IEEE.
Vu, X. S., Flekova, L., Jiang, L., & Gurevych, I. (2018, January). Lexical-semantic
resources: yet powerful resources for automatic personality classification. Prediction
In Proceedings of the 9th Global Wordnet Conference (pp. 172-181).
70. An, G., & Levitan, R. (2018, February). Lexical and Acoustic Deep Learning
88
Model for Personality Recognition. In INTERSPEECH (pp. 1761-1765).
Carducci, G., Rizzo, G., Monti, D., Palumbo, E., & Morisio, M. (5 2018).
TwitPersonality: Computing personality traits from tweets using word embeddings and Prediction
supervised learning. Information (Switzerland), 9. doi:10.3390/info9050127
Alamsyah, A., Putra, M. R. D., Fadhilah, D. D., Nurwianti, F., & Ningsih, E. (2018,
May). Ontology modelling approach for personality measurement based on social
media activity. In 2018 6th International Conference on Information and
Communication Technology (ICoICT) (pp. 507-513). IEEE.
dos Santos, W. R., Ramos, R. M., & Paraboni, I. (2019). Computational personality
recognition from facebook text: psycholinguistic features, words and facets. New
Review of Hypermedia and Multimedia, 25(4), 268-287.
Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2018). Personality Predictions Based on
Prediction
User Behavior on the Facebook Social Media Platform. IEEE Access, 6, 61959–
Correlation
61969. doi:10.1109/ACCESS.2018.2876502
73. Carducci, G., Rizzo, G., Monti, D., Palumbo, E., & Morisio, M. (2018).
91 Twitpersonality: Computing personality traits from tweets using word
embeddings and supervised learning. Information, 9(5), 127.
50. Yılmaz, T., Ergil, A., & İlgen, B. (2019, October). Deep learning-based document
68 modelling for personality detection from Turkish Texts. In Proceedings of the Future
Technologies Conference (pp. 729-736). Springer, Cham.
61. Rahman, M. A., Al Faisal, A., Khanam, T., Amjad, M., & Siddik, M. S. (2019,
May). Personality detection from text using convolutional neural network. In 2019 1st
79
International Conference on Advances in Science, Engineering and Robotics Technology
(ICASERT) (pp. 1-6). IEEE.
66. Akrami, N., Fernquist, J., Isbister, T., Kaati, L., & Pelzer, B. (2019, December).
84 Automatic extraction of personality from text: Challenges and opportunities. In 2019
IEEE International Conference on Big Data (Big Data) (pp. 3156-3164). IEEE.
72. Zheng, H., & Wu, C. (2019, February). Predicting personality using facebook status
90 based on semi-supervised learning. In Proceedings of the 2019 11th International
Conference on Machine Learning and Computing (pp. 59-64).
Alamsyah, A., Nurwiant, F., Rachman, M. F., Hudaya, C. S., Putra, R. P., Rifkyano, A.
I., & Nurwianti, F. (2019a). A Progress on the Personality Measurement Model using
Ontology based on Social Media Text Cite this paper Personalit y Measurement Design
for Ont ology based Plat form using Social Media Text Andry Alamsyah Ont ology
Modelling Approach for Personalit y Measurement based on Social Media Act ivit y A
Progress on the Personality Measurement Model using Ontology based on Social Media
Text.
Darliansyah, A., Naeem, M. A., Mirza, F., & Pears, R. (2019). SENTIPEDE: A Smart
System for Sentiment-based Personality Detection from Short Texts. Article in
Prediction
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 25, 1323–1352.
doi:10.3217/jucs-025-10-1323
Artissa, Y. B. N. D., Asror, I., & Faraby, S. A. (5 2019). Personality Classification

based on Facebook status text using Multinomial Naïve Bayes method. 1192. Prediction
doi:10.1088/1742-6596/1192/1/012003
Santos, W. R., Ramos, R. M. S., & Paraboni, I. (10 2019). Computational personality
recognition from Facebook text: psycholinguistic features, words and facets. New
Prediction
Review of Hypermedia and Multimedia, τ. 25, σσ. 268–287.
doi:10.1080/13614568.2020.1722761
Montag, C., & Elhai, J. D. (9 2019). A new agenda for personality psychology in the
digital age? Personality and Individual Differences, 147, 128–134.
doi:10.1016/j.paid.2019.03.045
33. Ergu, İ., Işık, Z., & Yankayış, İ. Predicting Personality with Twitter Data and
51 Machine Learning Models. In 2019 Innovations in Intelligent Systems and Applications
Conference (ASYU) (pp. 1-5). IEEE.
1. Rahman, M. A., Al Faisal, A., Khanam, T., Amjad, M., & Siddik, M. S. (2019,
May). Personality detection from text using convolutional neural network. In 2019 1st
79
International Conference on Advances in Science, Engineering and Robotics Technology
(ICASERT) (pp. 1-6). IEEE.
Fernandes, B., González-Briones, A., Novais, P., Calafate, M., Analide, C., & Neves, J.
(2020). An adjective selection personality assessment method using gradient boosting
machine learning. Processes, 8(5), 618.
34. Rohit, G. V., Bharadwaj, K. R., Hemanth, R., Pruthvi, B., & Kumar, M. (2020,
August). Machine intelligence based personality prediction using social profile data. In
52
2020 Third International Conference on Smart Systems and Inventive Technology
(ICSSIT) (pp. 1003-1008). IEEE.
Tighe, E., Aran, O., & Cheng, C. (2020). Exploring Neural Network Approaches in
Automatic Personality Recognition of Filipino Twitter Users.
35. Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020,
November). Bottom-up and top-down: Predicting personality with psycholinguistic and
53
language model features. In 2020 IEEE International Conference on Data Mining
(ICDM) (pp. 1184-1189). IEEE.
36. Kazameini, A., Fatehi, S., Mehta, Y., Eetemadi, S., & Cambria, E. (2020).
54 Personality trait detection using bagged svm over bert word embedding ensembles.
arXiv preprint arXiv:2010.01309.
37. Jayaratne, M., & Jayatilleke, B. (2020). Predicting personality using answers to
55
open-ended interview questions. IEEE Access, 8, 115345-115355.
Leonardi, S., Monti, D., Rizzo, G., & Morisio, M. (2020). Multilingual transformer-
Prediction
based personality traits estimation. Information, 11(4), 179.
Khan, A. S., Ahmad, H., Asghar, M. Z., Saddozai, F. K., Arif, A., & Khalid, H. A.
(2020). Personality Classification from Online Text using Machine Learning
Approach. IJACSA) International Journal of Advanced Computer Science and
Applications, τ. 11. Ανακτήθηκε από www.ijacsa.thesai.org
Iskandar, A. F., Utami, E., & Prasetio, A. B. (11 2020). Impact of Feature Extraction
and Feature Selection on Indonesian Personality Trait Classification. 206–211. Prediction
doi:10.1109/ICOIACT50329.2020.9332107
Guan, Z., Wu, B., Wang, B., & Liu, H. (2020, July). Personality2vec: Network
representation learning for personality. In 2020 IEEE Fifth International Conference Prediction
on Data Science in Cyberspace (DSC) (pp. 30-37). IEEE.
Camati, R. S., & Enembreck, F. (10 2020). Text-Based Automatic Personality

Recognition: A Projective Approach. 2020-October, 218–225.
doi:10.1109/SMC42975.2020.9282859
Wang, Z., Wu, C. H., Li, Q. B., Yan, B., & Zheng, K. F. (2020). Encoding text
information with graph convolutional networks for personality recognition. Applied Prediction
sciences, 10(12), 4081.
46. Mõttus, R., Wood, D., Condon, D. M., Back, M. D., Baumert, A., Costantini, G., ...
& Zimmermann, J. (2020). Descriptive, predictive and explanatory personality
64
research: Different goals, different approaches, but a shared need to move beyond the
Big Few traits. European Journal of Personality, 34(6), 1175-1201.
74. Jiang, H., Zhang, X., & Choi, J. D. (2020, April). Automatic text-based personality
recognition on monologues and multiparty dialogues using attentive networks and
92 Prediction
contextual embeddings (student abstract). In Proceedings of the AAAI Conference on
Artificial Intelligence (Vol. 34, No. 10, pp. 13821-13822).
41. Başaran, S., & Ejimogu, O. H. (2021). A Neural Network Approach for Predicting
59 Prediction
Personality From Facebook Data. SAGE Open, 11(3), 21582440211032156.
Xue, X., Feng, J., & Sun, X. (2021). Semantic-enhanced sequential modeling for
56 Prediction
personality trait recognition from texts. Applied Intelligence, 51(11), 7705-7717.
Pabón, F. O. L., & Arroyave, J. R. O. (12 2021). Automatic Personality Evaluation from
Transliterations of YouTube Vlogs Using Classical and State of the art Word
Prediction
Embeddings. Ingeniería e Investigación, τ. 42, σ. e93803.
doi:10.15446/ing.investig.93803
Ahmad, H., Asghar, M. U., Asghar, M. Z., Khan, A., & Mosavi, A. H. (2021). A
hybrid deep learning technique for personality trait classification from text. IEEE
Access, 9, 146214-146232.
Camati, R. S., Scaduto, A. A., & Enembreck, F. (2021, October). Using the
Projective Themathic Apperception Test for Automatic Personality Recognition in
Texts. In 2021 IEEE International Conference on Systems, Man, and Cybernetics
(SMC) (pp. 78-85). IEEE.
39. El-Demerdash, K., El-Khoribi, R. A., Shoman, M. A. I., & Abdou, S. (2021). Deep
57 learning based fusion strategies for personality prediction. Egyptian Informatics
Journal.
Hassanein, M. M., Rady, S., Hussein, W., & Gharib, T. (2021). Extracting
Relationships between Big Five Model and Personality characteristics in Social
Networks. International Journal of Intelligent Computing and Information
Sciences, 21(2), 41-49.
40. Christian, H., Suhartono, D., Chowanda, A., & Zamli, K. Z. (2021). Text based
58 personality prediction from multiple social media data sources using pre-trained
language model and model averaging. Journal of Big Data, 8(1), 1-20.
89. Jeremy, N. H., & Suhartono, D. (2021). Automatic personality prediction from
107 Indonesian user on twitter using word embedding and neural networks. Procedia
Computer Science, 179, 416-422.
Hernández, Y., Martínez, A., Estrada, H., Ortiz, J., & Acevedo, C. (3 2022). Machine
Learning Approach for Personality Recognition in Spanish Texts. Applied Sciences
(Switzerland), τ. 12. doi:10.3390/app12062985
Christian, H., Suhartono, D., Chowanda, A., & Zamli, K. Z. (12 2021). Text based
personality prediction from multiple social media data sources using pre-trained
language model and model averaging. Journal of Big Data, 8. doi:10.1186/s40537-021-
00459-1
Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2021). Automated video
interview personality assessments: Reliability, validity, and generalizability
investigations. Journal of Applied Psychology. doi:10.1037/apl0000695
Yang, F., Quan, X., Yang, Y., & Yu, J. (2021a). Multi-Document Transformer for
Personality Detection. Ανακτήθηκε από www.aaai.org
Alamsyah, A., Dudija, N., & Widiyanesti, S. (10 2021). New approach of measuring
human personality traits using ontology-based model from social media
data. Information (Switzerland), 12. doi:10.3390/info12100413
Ong, V., Rahmanto, A. D. S., Williem, W., Jeremy, N. H., Suhartono, D., & Andangsari,
E. W. (2021). Personality Modelling of Indonesian Twitter Users with XGBoost Based
Prediction
on the Five Factor Model. International Journal of Intelligent Engineering and
Systems, 14, 248–261. doi:10.22266/ijies2021.0430.22
Hassanein, M., Rady, S., Hussein, W., & Gharib, T. (7 2021). EXTRACTING
RELATIONSHIPS BETWEEN BIG FIVE MODEL AND PERSONALITY
CHARACTERISTICS IN SOCIAL NETWORKS. International Journal of Intelligent
Computing and Information Sciences, 21, 41–49. doi:10.21608/ijicis.2021.77015.1092
Mavis, G., Toroslu, I. H., & Karagoz, P. (10 2021). Personality Analysis Using
Classification on Turkish Tweets. International Journal of Cognitive Informatics
and Natural Intelligence, 15, 1–18. doi:10.4018/ijcini.287596
Wang, Y., Zheng, J., Li, Q., Wang, C., Zhang, H., & Gong, J. (6 2021). Xlnet-caps:
Personality classification from textual posts. Electronics (Switzerland), 10. Prediction
doi:10.3390/electronics10111360
Štajner, S., & Yenikent, S. (2021, April). Why Is MBTI Personality Detection from
Texts a Difficult Task?. In Proceedings of the 16th Conference of the European Chapter
of the Association for Computational Linguistics: Main Volume (pp. 3580-3589).
90. Kosan, M. A., Karacan, H., & Urgen, B. A. (2022). Predicting personality traits
108 with semantic structures and LSTM-based neural networks. Alexandria Engineering
Journal, 61(10), 8007-8025.
Giorgi, S., Nguyen, K. L., Eichstaedt, J. C., Kern, M. L., Yaden, D. B., Kosinski, M., ... & Park,
G. (2022). Regional personality assessment through social media language. Journal of
personality, 90(3), 405-425.
Shah, C., Thacker, C., & Patel, Y. (2022, March). Multi-Label Personality Prediction on
Twitter data using Machine Learning. In 2022 International Mobile and Embedded
Technology Conference (MECON) (pp. 140-144). IEEE.
1. Ramezani, M., Feizi-Derakhshi, M. R., & Balafar, M. A. (2022). Knowledge

Graph-Enabled Text-Based Automatic Personality Prediction. arXiv preprint Prediction
arXiv:2203.09103.
Safitri, G., & Setiawan, E. B. (2 2022). Optimization Prediction of Big Five Personality
in Twitter Users. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), τ. 6, σσ. Prediction
85–91. doi:10.29207/resti.v6i1.3529
Li, Y., Kazemeini, A., Mehta, Y., & Cambria, E. (7 2022). Multitask learning for
emotion and personality traits detection. Neurocomputing, 493, 340–350.
doi:10.1016/j.neucom.2022.04.049
Deilami, F. M., Sadr, H., & Nazari, M. (2022). Using Machine Learning Based Models
Prediction
for Personality Recognition. arXiv preprint arXiv:2201.06248.
Deilami, F. M., Sadr, H., & Tarkhan, M. (2022). Contextualized Multidimensional

Personality Recognition using Combination of Deep Neural Network and Ensemble Prediction
Learning. Neural Processing Letters. doi:10.1007/s11063-022-10787-9
Prediction Methods Multi-Label Nature Prediction Metric
O |
Statistical N.A N.A N.A N.A

Classification
Regression
RE
N.A
Binary Classification Acc

Classification
Regression
Ranking
Statistical
Regression N.A Supervised MSE 0.4761

Ranking N.A Supervised
N.A UnSupervisd
Binary Classification Supervised Accuracy

Binary Classification N.A Supervised Accuracy 0.6912
Binary Classification N.A Supervised F1 0.73

N.A
Mutivariate
Supervised RMSE 0.77
Regression
Regression N.A Supervised N.A N.A

Multi Label Classification
Multi Label Classification Binary Relevence Unsupervised Accuracy N.A
MultiVariate
Supervised MAE 0.0811
Regression
Regression N.A Supervised RMSE 5.1587
Binary Classification Supervised F1

Binary Classification
Binary Classification N.A Supervised Accuracy
Binary Classification N.A Supervised
Binary Classification Supervised F1 N.A

Regression N.A MAE 0.349
Supervised
N.A Accuracy 0.553

Binary Classification
Binary Classification N.A Supervised F1

Regression N.A Transfer Learnig MSE 0.3316

Multi-Label
Yes
Classification
Multiclass F1
Classification ME
MAE
Regression
RMSE
RMSE 0.342
Regression N.A Supervised MSE 0.1759
MultiVariateRegression
0.4852
Regression N.A Supervised MAE 0.5824
0.1097
0.67
Binary Classification N.A Supervised F1
0.85

Multi-Label Classification Yes
0.6316
Multi-Label Classification Yes Supervised Accuracy
0.6385
Regression,
Binary Classification, N.A Supervised
Multi Classification
Binary Classification N.A Supervised AUROC 0.6275
Classification Supervised Recall

Multi Label Classification Yes Supervised F1 0.7364
F1

Prediction Accuracy
Language
O | C | E | A | N | AVG
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A
N.A N.A N.A N.A N.A

0.99
0.76
42-73 %
0.5776 0.7744 0.6241 0.7225 0.63494
0.72
0.6071 0.5974 0.5933 0.6323 0.62426
0.58 0.7 0.62 0.63 0.652

0.64 0.91 0.72 0.7 0.748
N.A N.A N.A N.A N.A

N.A N.A N.A N.A 0.65
0.1085 0.1337 0.1834 0.2013

5.339 4.8043 4.7753 5.6188 5.13922
0.56
N.A 0.676 N.A 0.586 N.A
0.457 0.619 0.537 0.606 0.5136
Filipino
0.776 0.716 0.676 0.57 0.6582
TBA Bahasa
Bahasa
0.53 0.7084 0.4477 0.5572 0.51498
Brazillian-
Portoguese
TBA
.776
.165
3.979
5.512
0.334 N.A N.A N.A 0.338

0.3045 0.475 0.2667 0.2911 0.30264
0.5713 0.5971 0.5148 0.5328
0.5746 0.7213 0.6675 0.6028
0.1007 0.1036 0.0997 0.1425
0.69 0.69 0.68 0.67 0.68

0.79 0.7 0.75 0.78 0.768
0.5855 0.6062 0.5972 0.6107 0.61164

0.5749 0.5891 0.5749 0.5951
0.6585 0.7073 0.7317 0.7805
TBA
0.5 0.591 0.713 0.4865 0.5836
Turkish
0.7568 0.7772 0.7178 0.6834 0.73432
0.949 0.846 0.872 0.77 0.836
0.7245
0.7393
0.6463 0.6125 0.5902 0.6193 0.61398
Concept
Language use as an individual difference
How does personality influence language

production?
Personality estimation from personal

websites
Modelling Personality Traits
Personality expression in natural habitat

Can Personality be communicated by a
social netork, in this casde facebook?
Unsupervised Personality recognition
from social networks
Trait estimation from EMAILS

Shared Challenge WPCR13


MultiVariate regression for personality
classification of Vloggers
Validating Language based Personality

Accesments
Multi label personality classification
using Twitter
Multi Lingual Personality modelling

Linguistic Representation Feature
Vector (LRFV)
Feature reduction for trait classification

Comparative evaluation of Classical
versus Deep architectures
Personality in culturally diverse

environment
Modelling personality traits of Filipini
Twitterusers
Is Big five better than MBTI?
Personality Prediction of Malaysian

Facebook
Users: Cultural Preferences and
Features Variation
Word Net based lexical features
Using Trsnsfer learning for personality

prediction in social media
Linguistic markers for low resource
languages
Preprocessing variation
Adjective selection to access personality
Nueral networks in APR of Filipino

Twitter users
Sentense level attention for personality
prediction
Network Representation Learning for
Personality Prediction
Personality GCN for trait prediction
Attention and BERT for Personality

Prediction. Dialogue based Personality
prediction
Multi label Classification by ANN for
personality analysis
Sementic Enhanced Personality

modelling
Comparison of classical and word

embeddings
Personality classification for Indonesian
tweets
Region based Personality prediction using
linguistic features
Use of Knowledge graphs for personality

prediction from text.
Diffirent Kernels for SVM,

Use of diffirent filter sizes to improve
Personality prediction
Limitations Traits
OCEAN
Only Extroversion has beeen studied in detail PEN

Top Down only OCEAN
No specific cues are provided which could be useful for identication of
personality.
OCEAN
Only one email has been per user. Multiple mails could enhance the prediction
OCEAN
accuracty per user.
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
Only Extroversion and Nuerotocism has been used EN

OCEAN
OCEAN
OCEAN
MBTI
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
MBTI
OCEAN
OCEAN
MBTI
OCEAN
OCEAN
OCEAN
OCEAN
OCEAN
Observer/Self Report Datasets Public Repository
N.A
CUSTOM
N.A
(Student Texts)
N.A
N.A
ESSAYS
Self
EAR Corpus
N.A
Observer EAR Corpus
EAR Corpus
N.A
MyPersonality
Italian FriendFeed
Custom
(Email)
YouTube N.A
MyPersonality* N.A
ESSAYS
ESSAYS
MyPersonality
Custom
(Facebook)
Custom
N.A
(Facebook)
N.A
N.A
N.A
N.A
N.A
Custom
(Twitter)
N.A
Custom
N.A
(Twitter)
MyPersonality N.A
PAN AM
MyPersonality
Essays
YouTube
Facebook
(Custom)
MyPersonality
Custom
(Twitter)
MyPersonality
Essays
YouTube
PAN
Essays,
MyPersonality
Essays,
FriendsPersona
Essays,
YouTube
YouTube
Custom
(Twitter)
Essays
Custom,
Twitter
MyPersonality
ISEAR 1
TEC 2
Essays
Essays
YouTube
Feature
LIWC
NRC
MRC
SentiStrength
SPLICE
Description
NRC is a lexicon that contains more than 14,000 distinct English words annotated
with 8 emotions (ang-er, fear, anticipation, trust, surprise, sadness, joy, and
disgust), and 2 sentiments (negative, positive)
MRC is a psycholinguictic database4 which contains psychological and

distributional information about words. The MRC database contains 150,837 en-
tries with information about 26 properties (e.g., the number of syllables in the
word, the number of letters, etc.), although not all properties are available for
every word.
SentiStrength5 assigns to each text a positive, negative and neutral sentiment

score on a scale of 1 (no sentiment) to 5 (very strong sentiment). Texts may be
simultaneously positive, negative and neutral.
(Structured Program-ming for Linguistic Cue Extraction

Book 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Book 1

Uploaded by

Copyright:

Available Formats

Sn No

Exploed use of crowdsourcing in obtaining personality impressions,

Shared Task for Author profiling from Twitter data.

Jornal, IEEE Transactions on

2013 Common Sense

2017 Mairesse Baseline

2017 Bottom-Up N.A N-Grams

2018 Bottom-Up N.A FastText

Journal, Sentense Embeding,

Journal of Personality 2022

2022 Bottom-Up Skip-Gram

LR, M5R, M5D , REPT

SMO, NB, NN , J48, JRip,

Univariate Feature Selection, Ridge Regressor

N.A N.A SVM

N.A SVM (RBF)

N.A N.A N.A

N.A N.A MNB

N.A N.A N.A

N.A N.A SVM

CHI SVM (L,R,P)

N.A N.A N.A

N.A N.A N.A

N.A N.A N.A

GCN N.A N.A

N.A N.A N.A

1. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an

Vazire, S., & Gosling, S. D. (2004). e-Perceptions: personality impressions based on

Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality

63. Lima, A. C. E., & De Castro, L. N. (2014). A multi-label, semi-supervised

POLITECNICO DI TORINO Semantic Analysis to Compute Personality Traits from

Artissa, Y. B. N. D., Asror, I., & Faraby, S. A. (5 2019). Personality Classification

Camati, R. S., & Enembreck, F. (10 2020). Text-Based Automatic Personality

1. Ramezani, M., Feizi-Derakhshi, M. R., & Balafar, M. A. (2022). Knowledge

Deilami, F. M., Sadr, H., & Tarkhan, M. (2022). Contextualized Multidimensional

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Statistical N.A N.A N.A N.A

Binary Classification Acc

Regression N.A Supervised MSE 0.4761

Binary Classification Supervised Accuracy

Binary Classification N.A Supervised F1 0.73

Regression N.A Supervised N.A N.A

Multi Label Classification Binary Relevence Unsupervised Accuracy N.A

Binary Classification Supervised F1

Binary Classification N.A Supervised

Binary Classification Supervised F1 N.A

N.A Accuracy 0.553

Binary Classification N.A Supervised

Binary Classification N.A Supervised F1

Regression N.A Transfer Learnig MSE 0.3316

Binary Classification N.A Supervised Accuracy 0.6586

Classification Supervised Recall

Binary Classification N.A Supervised Accuracy 0.743

Binary Classification N.A Supervised

N.A N.A N.A N.A N.A

N.A N.A N.A N.A N.A

N.A N.A N.A N.A N.A

N.A N.A N.A N.A N.A