Professional Documents
Culture Documents
Predicting Users' Eat-Out Preference From Big5 Personality Traits
Predicting Users' Eat-Out Preference From Big5 Personality Traits
net/publication/370881033
CITATIONS READS
0 41
4 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Md. Adnanul Islam on 31 May 2023.
Abstract Social Networking Sites (SNS) such as Facebook and Twitter have become
important place for sharing one’s views, belief, and ideas and communicating with
family members and friends. These virtual places can capture a wide range of details
of every user which may represent his behavioral traits such as user’s preferences in
daily life. In this study, we build a machine learning (ML) model to predict a user’s
eat-out preference from her Big5 personality traits derived from tweets. To this end,
we collect users’ check-ins from a location-aware social network, Foursquare. Later,
we build a ML-based model based on content of user’s tweets and check-ins from
the Foursquare which allows us to predict user’s eat-out preferences in various types
of restaurants from his personality traits. We conduct an experiment with a total of
731 Twitter and Foursquare users, and the result shows that user’s Big5 personality
traits have strong association with their eat-out preference. Our model achieves an
average AUC score of 84% for all categories of restaurants.
1 Introduction
Today, Twitter has turned into a significant online communication tool. Scientists can
determine human behavior, such as personality [12] and preferences [6] of related
users, based on the textual information of these interactions, i.e., tweets. Additionally,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 511
S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks
and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_37
512 Md. S. H. Mukta et al.
1 https://personality-insights-livedemo.mybluemix.net/.
Predicting Users’ Eat-Out Preference from Big5 Personality Traits 513
R 2 strength for the highest- and the lowest-performing models, respectively, in the
test dataset outperforming the score of LRM. To measure the strength of the best
performing BRM, we further evaluate the model with different binary classifiers
by measuring the Area under the Receiver Operating Characteristic (AUC-ROC)
curve. It obtains an average AUC-ROC of 83.25% with a maximum score of 93.1%
(Moderate category) and a minimum score of 73.2% (Expensive category).
In summary, we have the following contributions:
• We are the first to integrate the data of Twitter and Foursquare to predict users’
eat-out pattern from their Big5 personality.
• We obtain a strong relationship between the restaurant categories and personality
traits.
• We develop a Bi-LSTM-based regression model to predict the price-based eat-out
preferences of the users from their Twitter data.
• We demonstrate a comparative performance analysis of Linguistic and Personality
features to predict eat-out preference.
Predicting the eat-out pattern of a user has numerous applications. For example,
by knowing the eat-out preference of Twitter users, restaurant owners can lunch
personalized advertisements. A food chain service provider can take decision for his
new location of business after investigating the eat-out patterns of the users of the
locality. Moreover, it is also possible to assume an economic profile of a region by
using the application.
2 Related Work
3 Methodology
In this study, we predict users’ eat-out preference score in restaurants with differ-
ent price range from their tweets. Figure 1 illustrates the research framework with
following steps. Firstly, we collect the users’ eat-out preferences in four categories
(Cheap, Moderate, Expensive and Very Expensive) and calculate the category-based
relative frequency from the users’ tweets. Secondly, we extract two types of features
from the users’ tweet: (a) linguistic feature vectors using BERT and (b) Big5 person-
ality traits using IBM Personality API. Finally, we develop two Bi-LSTM regression
models to compare the performance of the models and select the best performing
model.
One of the most difficult aspects in our work is extracting information about con-
sumers’ psychological characteristics and how frequently they visit a particular type
of restaurant because there is no single source from which we can get all of this data.
Therefore, we combine the Twitter and Foursquare datasets in order to gather psy-
Predicting Users’ Eat-Out Preference from Big5 Personality Traits 515
2 http://www.tweepy.org.
516 Md. S. H. Mukta et al.
total of 72,662 Foursquare connections for various eateries within the tweets. Each
link on the Foursquare website leads to a different web page. In certain instances, the
location’s type and other details are displayed right on the page. If not, visiting the
location page, which includes the restaurant’s prices, may be accessed by clicking the
name of the place. To learn more about the places that the Foursquare links connect
to, we do HTML parsing. Not all links are linked to eateries or other services in the
food industry. As a result, we ignore any links that have nothing to do with restaurants
or other such establishments. When a link points to a restaurant, Foursquare often
classifies the eateries according to the cost of the food they provide.
Based on pricing, we categorize the restaurant links as cheap, moderate, expen-
sive, and extremely costly. If the link is to a restaurant and is listed in Foursquare’s
restaurant category, we can locate a dollar symbol ($) on the Foursquare link page.
The dollar marks one ($), two ($$), three ($$$), and four ($$$$) denote, respectively,
low, moderately priced, expensive, and very costly categories of restaurants. Users
with fewer than 50 restaurant-related Foursquare check-ins are disqualified. Then,
using the equation indicated in Eq. 1, we determine the relative frequency of user
visits to a certain restaurant type and utilize that information as the ground truth
data. The percentages of links connected to low, moderate, expensive, and extremely
costly restaurant categories that we identify overall are 33.01%, 48.80%, 14.22%,
and 2.96%, respectively.
To forecast consumers’ preferences for eating out, we extract the relevant elements
in this section from both language and personality perspectives. Since both of our
independent variables (Personality scores) and dependent variable (frequency of
visits to various restaurant categories) are continuous values, we also apply Pearson’s
correlation (ρ) analysis to find a meaningful correlation between these two variables
in the case of Big5 personality traits.
Linguistic feature extraction using BERT: Context-based vector representations of
a specific word within a text are known as word embedding. The method can identify
a word’s relationship to other words, as well as its semantic and syntactic similarity,
in a document. For the purpose of creating word embeddings, we use a pretrained
BERT model. By leveraging data from the full tweet, the BERT embedding layer
creates token level representations. The input features are organized as follows: M0 =
{m 1 , . . . , m N }, where m n (n ∈ [1, N ]) is the combination of the token, position, and
segment embedding corresponding to the input token xn . The representations, M t =
m t1 , . . . , m tK at the kth transformer layer ((0 ≤ k ≤ K )), can be shown in accordance
with the following equation:
.
The contextualized representations of the input tokens are thus represented by M k .
The regression block receiving the output from BERT in the form of contextualized
representations M K is supplied as an input as follows:
M K = m 1K , . . . , m KN ∈ M N ×dimm (2)
Personality feature extraction using IBM Watson API: Big5 model is one of the
well-studied topics in the personality research [10]. Big5 model has five personal-
ity traits, namely Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism.
Big5 personality model [10] is one of the popular models for personality research.
Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism are the
five personality attributes that make up the Big5 model. Using the IBM Watson API,
we extract the scores of the abovementioned five traits by involving the collected
tweets. The scores varies between users based on their writings, because Pennebaker
et al. [7] observe what people say and write, actually revealing their behavior and
personality.
It may be quite challenging to pinpoint which personality qualities specifically
impact a person’s decision to dine out at which category of restaurants, despite the
fact that one may believe they do. Since both of our independent (personality traits)
and dependent (visiting frequency of various categories of restaurants) variables are
continuous, we also use Pearson’s correlation coefficient to determine the relationship
between users’ Big5 personality traits and visiting frequency of various restaurant
variables. Table 1 display the pearson correlation between personality characteristics
and several restaurant category types where N = 731 and p < 0.10 critical value.
The chart clearly shows that various personality types are associated with particular
restaurant categories.
Table 1 Pearson’s correlations between Big5 personality traits and visiting frequencies of different
categories of restaurants
Chp. Mod. Exp. V. Exp.
Openn. − 0.087 0.104** 0.003 − 0.034
Conscit. − 0.109** 0.020 0.098* 0.075*
Extrav. − 0.060 − 0.007 0.075* 0.055
Agree. − 0.109** 0.012 0.107** 0.080*
Neuro. − 0.016 0.018 0.031 0.039
* p < 0.05, ** p < 0.10
518 Md. S. H. Mukta et al.
Table 2 Performance of the developed Model (R 2 score) in train and test dataset
Models Train dataset Test dataset
Chp. Mod. Expen. V. Exp. Avg Chp. Mod. Expen. V. Exp. Avg
BRM 34.3 39.8 27 37.12 34.55 31 34.5 25.1 33.12 30.93
LRM 31.8 33.4 23.1 29.17 29.37 25.8 29.4 19.1 24.17 24.62
Predicting Users’ Eat-Out Preference from Big5 Personality Traits 519
Fig. 2 Comparison in
performance between BRM
and LRM (test dataset)
classifier, together with its TPR, FPR, and AUC, is presented in Table 3 for deter-
mining the frequency of visits to each restaurant category [5]. As a starting point,
we employ the ZeroR classifier. Our baseline classifier has an average AUC score
of 0.793. Using our classifiers, we determine the lowest AUC score (0.732) for the
Expensive category of restaurants and the highest AUC score (0.931) for the Mod-
erate category of restaurants. Using our classifiers, we additionally discover AUC
values of 0.864 and 0.803 for the cheap and very-expensive categories of restau-
rants, respectively. The top classifier consistently outperforms the baseline average
for every category (Fig. 3).
5 Discussion
Table 3 Best performing classifier to predict different restaurant categories from Big5 personality
traits
Resturnt. types Best AUC AUC TPR FPR
obtaining classfr.
Cheap. N. Bayes 0.86 0.81 0.092
Moder. Rep Tree 0.93 0.91 0.046
Exp. Rep Tree 0.73 0.63 0.18
V. Exp. N. Bayes 0.80 0.72 0.13
because neurotic persons might not be interested in sharing their Foursquare check-
ins about eating out, we may not have been able to establish any correlation between
neuroticism personality characteristics and eating out activity. In our study, we also
discovered less check-ins of neurotic people to predict eat-out choice. This is because
another study [14] from Facebook demonstrates that neurotic people inclined to share
less information with their friends.
We note that compared to other restaurant categories, the forecast for expensive
restaurants has a lower potential. We only have a tiny number of examples in our
training dataset to forecast the really costly restaurant categories. As a result, we
only get weak forecast accuracy for restaurants in the highly costly price range. We
note that the size of the datasets is between 250 and 300 in a recent well-cited work
[12] connected to psycholinguistic research from social media. As a result, the size
of our dataset (N = 731) is sufficient for employing psychological traits to forecast
consumers’ preferences for eating out.
6 Conclusion
References
1. Aletras N, Chamberlain BP (2018) Predicting twitter user socioeconomic attributes with net-
work and language information. In: Proceedings of the 29th on hypertext and social media, pp
20–24
2. Álvarez-Carmona MÁ, Villatoro-Tello E, Villaseñor-Pineda L, Montes-y Gómez M (2022)
Classifying the social media author profile through a multimodal representation. In: Intelligent
technologies: concepts, applications, and future directions. Springer, pp 57–81
3. Ansari MZ, Aziz M, Siddiqui M, Singh K (2020) Analysis of political sentiment orientations
on twitter. Procedia Comput Sci 167:1821–1828
4. Bartkiene E et al (2019) Factors affecting consumer food preferences: food taste and depression-
based evoked emotional expressions with the use of face reading technology. BioMed Res Int
2019
5. Cantarino I, Carrion MA, Goerlich F, Martinez Ibañez V (2019) A roc analysis-based classifi-
cation method for landslide susceptibility maps. Landslides 16(2):265–282
522 Md. S. H. Mukta et al.
27. Wang Z, Hale S, Adelani DI, Grabowicz P, Hartman T, Flöck F, Jurgens D (2019) Demographic
inference and representative population estimates from multilingual social media data. In: The
world wide web conference, pp 2056–2067
28. Xing W, Gao F (2018) Exploring the relationship between online discourse and commitment
in twitter professional learning communities. Comput Educ 126:388–398
29. Yang D, Qu B, Yang J, Cudre-Mauroux P (2019) Revisiting user mobility and social relation-
ships in lbsns: a hypergraph embedding approach. In: The world wide web conference, pp
2147–2157