Professional Documents
Culture Documents
2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) | 978-0-7381-1180-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/SNAMS52053.2020.9336534
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.
train five supervised machine learning models using a hate. A major axis sandwiching remorse is loathe, which
combination of linguistic and a subset of discriminatory is very close to hate. Furthermore, in real life and in
metadata features identified by the χ2 test. The models the emotion classification literature [2] love and hate
were evaluated using sensitivity, specificity, accuracy, are often considered contradictory. Therefore, we include
and AUC. Finally, we conduct importance analysis to Love vs. Hate as the fourth problem.
determine the relative contribution of linguistic and
metadata features.
The rest of the paper is organized as follows: Sec-
tion II describes the model of emotions. Section III
outlines preprocessing and feature extraction. Section IV
provides an overview of models and performance met-
rics. Section V analyzes the results. Section VI com-
pares and contrasts related research. Section VII offers
concluding remarks and directions for future research.
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.
“Frown”. Smiley face is the most common emoticon sentiment polarity for each tweet, which ranges from −1
regardless of whether the labeled emotion is positive to +1, where −1, 0 and +1 indicate negative, neutral
or negative. Overall, the tendency to use emoticons is and positive respectively. Vader computes a normalized
higher for positive emotions such as happiness, surprise, and weighted composite score obtained by analyzing
love, fun and relief compared to negative emotions such each word in a tweet for its direction of sentiment
as boredom, empty, hate, and sadness. Worry is the as a negative (positive) valency for negative (positive)
only negative emotion that stands out in its high use sentiment. It therefore ranges from −1 to +1 depending
of the smiley face; here the smiley may have been on the net sentiment of the tweet. We use both TextBlob
used to express sarcasm. Emoticons were replaced with and Vader because Vader may be more sensitive to
emotion words using the emoticons replacement tool sentiments than TextBlob, even though TextBlob may
emoji [11]. The final step is data correction, employed be better correlated with reviewer scores [1]. Additional
after observing that some words (not stop-words) occur metadata features include the numbers of words and
with nearly equal frequency in both classes. These words unique words, numbers of YouTube, image and other
are removed because they may not contribute to the links, numbers of all-capital words, punctuation marks,
classification. and emoticons because these may be used to emphasize
Emotion ;) :3 :’( (Y) (y) =[
or accentuate emotions, and often substitute the visual
Worry 28 26 10 1 0 0 clues gleaned from facial expressions and body language.
Boredom 2 4 0 1 0 0 We used Pearson’s χ2 test scores to select the top
Anger 19 15 15 1 0 0 10 out of the 17 features. χ2 tests a null hypothesis
Sadness 0 0 0 0 0 1
Hate 2 4 3 0 0 1 that the output label has a strong relationship with each
Love 25 7 0 4 1 0 feature in the set or not. Feature-wise χ2 scores for
Surprise 12 4 2 0 0 0 each problem are summarized in Table IV. The scores
Happiness 53 8 0 1 1 0
Relief 11 5 0 1 1 0
marked in red indicate that the particular feature is not
Fun 19 5 0 1 0 0 discriminatory in that problem. The table indicates that
Enthusiasm 3 3 1 0 0 1 four features, namely, the TextBlob and Vader sentiment
Neutral 21 17 1 3 1 0 scores, and the numbers of stop words and characters
Empty 3 3 0 0 0 0
are discriminatory for all problems while the numbers of
Table III: Frequency of Emoticons Per Emotion Label image and YouTube links, emoticons and ellipsis ending
are not discriminatory for any problem. The rest produce
mixed results, for example, the numbers of words and
A. Linguistic Features unique words can discriminate between pairs of emotions
We considered two popular linguistic features: Term in all problems except for Joy vs. Sadness.
Frequency-Inverse Document Frequency (TF-IDF) vec-
torization and n-grams using the bag-of-words approach. IV. M ODELS & P ERFORMANCE
These features were extracted using the TF-IDF vector-
izer class of Scikit-Learn library. TF-IDF refers to a We considered several popular supervised machine
scoring measure used in information retrieval or sum- learning models including Random Forest (RF), Support
marization. It measures how relevant a word is in a Vector Machines (SVM), Naive Bayes (NB), Gradient
document by assigning an additional weight to those Boosting (GB), and Multi Layer Perceptron (MLP). To
words that occur more frequently through the data. select model parameters, we plotted the error in the
TF-IDF was computed for 6148 unigrams. In general, F-measure as a function of different parameter (like
unigrams that denote optimism such as good, love, like, number of decision trees in RF, different kernel functions
fun occur frequently in the classes Joy, Love Trust and in SVM, number of layers or neurons in different layers
Anticipation. The opposing emotions in each problem in MLP) settings [3].
show a larger frequency of unigrams with negative In the RF model, the number of decision trees is set to
connotation including sad, oh, feel, don’t, etc. 30, the number of features used by each tree is equal to
the square root of the number of total features, and each
B. Metadata Features tree was allowed to grow fully up to its leaves. We used
Metadata features include sentiment scores of each SVMs with linear kernel with other default parameters.
original tweet (before preprocessing) computed using MLP model was built with 3 hidden layers containing
TextBlob and Vader libraries. TextBlob calculates the 10, 5, and 2 neurons respectively, and the rectifier linear
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.
χ2 Test Scores L J A T
Feature ML
L J A T Metric vs vs vs vs
Model
vs vs vs vs H S S D
H S S D RF 0.87 0.78 0.73 0.76
Vader score 406.7 739.3 41.3 468.3 SVM 0.86 0.78 0.73 0.77
Textblob score 262.5 303.9 17.9 246.3 Accuracy MLP 0.83 0.75 0.65 0.71
# characters 213.4 161.9 38.2 115.4 NB 0.86 0.76 0.71 0.72
# stopwords 99.3 129.5 24.8 274.0 GB 0.85 0.74 0.70 0.76
# words 94.2 11.9 8.6 124.6 RF 0.94 0.86 0.88 0.62
# unique words 81 4.2 8.7 105.3 SVM 0.94 0.86 0.88 0.66
# mentions 38.1 63.2 1.5 25.1 Sensitivity MLP 0.90 0.82 0.71 0.54
# ‘@’ 37.3 60.8 1.3 25.8 NB 0.96 0.83 0.91 0.56
# all capitals 21.1 28.7 5.3 4.9 GB 0.91 0.87 0.90 0.63
# ‘!’ sign 20.3 558.9 37.6 411.2 RF 0.70 0.64 0.41 0.85
# punctuation 0.7 89.9 27.5 24.5 SVM 0.66 0.64 0.41 0.85
ellipsis ending? 0.8 52.2 2.9 23.4 Specificity MLP 0.66 0.63 0.50 0.81
NB 0.60 0.58 0.26 0.81
# hashtags 0.1 0.2 3.3 1.6
GB 0.69 0.63 0.26 0.84
# image link 9.7 16.0 2.6 23.4
RF 0.92 0.84 0.75 0.81
# links 6.5 27.9 1.0 22.7
SVM 0.91 0.84 0.74 0.82
# YouTube link 1.0 6.1 1.0 2.3 AUC MLP 0.86 0.80 0.65 0.79
# emoticons 0.1 0.5 0.1 0 NB 0.90 0.80 0.70 0.76
GB 0.90 0.83 0.73 0.81
Table IV: χ2 Test Scores for Metadata Features
Table V: Performance of ML Models
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.
sentiment scores plus the total number of words and rifice the accuracy of some for the others [17, 7, 2].
characters in a tweet. We alleviate this issue by formulating multiple binary
problems, each problem consisting of a pair of polar
L J A T
Feature vs vs vs vs opposite emotions on the axes of Plutchik’s wheel.
H S S D Although each binary classification is similar in spirit to
# total words 0.02 - 0.03 0.02 detecting positive/negative sentiment, this scheme allows
# unique words 0.02 - 0.03 0.03
TextBlob score 0.09 0.05 0.01 0.03 us to learn about the specific emotions such as love,
Vader score 0.16 0.12 0.02 0.06 joy, anticipation and trust that render a tweet positive,
#characters 0.03 0.05 0.06 0.05 or those such as hate, sadness, disgust and surprise that
#stop-words 0.02 0.03 0.03 0.03
# all capitals 0.01 0.02 0.02 -
make a tweet negative, with high accuracy. Finally, many
#‘@’ counts 0.01 0.01 - 0.01 of the above works consider a single machine learning
#exclamatory 0.01 0.02 0.02 0.02 model, whereas, we compare the performance of five
# punctuation - 0.03 0.04 0.03 common models. Finally, our feature importance analysis
# mentions 0.01 0.01 - 0.01
Text Features 0.62 0.66 0.74 0.71 reveals how each group of features contributes to the
prediction performance.
Table VI: Feature Importance
VII. C ONCLUSIONS AND F UTURE R ESEARCH
VI. R ELATED R ESEARCH This paper proposes a supervised machine learning
Most of the research on mining subjective knowledge approach to detect four pairs of polar opposite emotions
from tweets is concerned with extracting sentiment or inspired by the Plutchik’s wheel. The methodology trains
the polarity; very few works are concerned with emotion five supervised machine learning models on a combina-
mining. Although emotions surrounding specific events tion of linguistic and metadata features, and compares
such as the presidential election [23] or the Brazilian soc- their performance using sensitivity, specificity, accuracy,
cer league [6], or natural disasters such as the California and AUC. Random Forest and Support Vector Machine
Camp Fire [10] and the MERS outbreak [5] have been classifiers offer the best AUC, with Random Forest
mined, extracting them from a general corpus remains providing a more reasonable balance between sensitivity
relatively unaddressed. and specificity. Although every pair of emotions are
Many research works formulate multi-label classifi- polar opposites on the Plutchik’s wheel, identifying the
cation problems over a set of emotions; the chosen set alternating emotions of Love and Hate is the most
may be completely ad hoc, inspired by a psychological accurate, whereas differentiating between Anticipation
framework such as Ekman’s emotions or the Plutchik’s and Surprise is the least accurate. Importance analysis in-
wheel, or a combination of psychology and heuristics. dicates that linguistic and metadata features respectively
For example, Wang et. al. [22] annotated a data set of contribute 60% and 40% to the prediction.
2.5 million tweets based on hashtags related to emotion Our future research consists of mining emotions in
words, and classified them into seven emotions, six content shared over other social media platforms such
basic plus thankfulness. Their classification accuracy is as Reddit, Facebook, and Instagram. One of the biggest
around 60%, and this performance is further improved challenges in emotion mining is the availability of high-
by about 5% [9]. Jaishree et al [17] labeled tweets by quality labeled data sets for model training. Building
combining the scores from NRC word-level lexicon tool automated methods for data labeling to alleviate these
and emotion-based hashtags. Their problem considered concerns is also a topic of the future.
8 basic emotions of Plutchik’s wheel, however, their
R EFERENCES
multi-label classification problem was completely unable
to detect surprise, and registered low scores for fear. [1] K. Arunachalam. “Evaluation of Python Packages
A smaller set of 4 emotions is also used by some for Sentiment Analysis”, 2019 (last accessed
researchers [7, 19]. Although Mohammed et. al. [8] on October 19, 2019). https://www.linkedin.
formulate their problem based on Plutchik’s wheel, they com/pulse/evaluation-python-packages-sentiment-
ultimately boil it down to binary classification by using analysis-karthikeyan-arunachalam/.
the one vs. other method. [2] M. Bouazizi and T. Ohtsuki. “Sentiment Analy-
Generally, multi-label emotion classification suffers sis: From Binary to Multi-Class Classification: A
from either low accuracy for all classes [22] or sac- Pattern-Based Approach for Multi-Class Sentiment
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.
Analysis in Twitter”. In Proc. of IEEE Intl. Conf. [13] R. M. Ohashi. “From Sentiment Analysis to Emo-
on Communications, July 2016. tion Recognition: A NLP Story”, July 2019 (last
[3] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, accessed on March 3, 2020). https://medium.com/
A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, neuronio/bcc9d6ff61ae.
A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, [14] G. W. Parrott. Emotions in Social Psychology:
A. Joly, B. Holt, and Gaël Varoquaux. “API De- Essential Readings. Psychology Press, 2001.
sign for Machine Learning Software: Experiences [15] A. Qadir and E. Riloff. “Bootstrapped Learning
from the Scikit-learn project”. In Proc. of ECML of Emotion Hashtags# hashtags4you”. In Proc. of
PKDD Workshop: Languages for Data Mining and Workshop on Computational Approaches to Subjec-
Machine Learning, pages 108–122, 2013. tivity, Sentiment and Social Media Analysis, pages
[4] J. Clement. “Number of Monthly Active Twitter 2–11, 2013.
Users Worldwide from 1st Quarter 2010 to 1st [16] A. Qadir and E. Riloff. “Learning Emotion In-
Quarter 2019”, 2019 (last accessed on March 5, dicators from Tweets: Hashtags, Hashtag Patterns,
2020). https://www.statista.com/statistics/282087/. and Phrases”. In Proc. of the Conf. on Empirical
[5] H. J. Do, C. Lim, Y. J. Kim, and H. Choi. “An- Methods in Natural Language Processing, pages
alyzing Emotions in Twitter during a Crisis: A 1203–1209, 2014.
Case Study of the 2015 Middle East Respiratory [17] J. Ranganathan, N. Hedge, A. S. Irudayaraj, and
Syndrome Outbreak in Korea”. In Proc. of Intl. A. A. Tzacheva. “Automatic Detection of Emotions
Conf. on Big Data and Smart Computing, pages in Twitter Data-A Scalable Decision Tree Classifi-
415–418, 2016. cation Method”. In Proc. of the RevOpID 2018
[6] A. Esmin, R. De Oliveira Jr, and S. Matwin. Workshop on Opinion Mining, Summarization and
“Hierarchical Classification Approach to Emotion Diversification, 2018.
Recognition in Twitter”. In Proc. of Intl. Conf. [18] Plutchik Robert. Emotion: Theory, research, and
on Machine Learning and Applications, volume 2, experience. vol. 1: Theories of emotion, 1980.
pages 381–385, 2012. [19] F. M. Shah, A. S. Reyadh, A. I. Shaafi, and
[7] S. S. Ibraheim, S. S. Ismail, K. A. Bahansy, F. T. Sithil. “Emotion Detection from Tweets
and M. M. Aref. “Multi-Emotion Classification using AIT-2018 Dataset”. In Proc. of Intl. Conf,
Evaluation via Twitter”. In Proc. of Intl. Conf. on Advances in Electrical Engineering, Dhaka,
on Intelligent Computing and Information Systems, Bangladesh, September 2019.
pages 60–67, Cairo, Egypt, 2019. [20] J. Suttles and N. Ide. “Distant Supervision for
[8] M. Jabreel and A. Moreno. “A Deep Learning- Emotion Classification with Discrete Binary Val-
based Approach for Multi-label Emotion Classi- ues”. In Proc. of Intl. Conf. on Intelligent Text
fication in Tweets”. Applied Sciences, 9(6):1123, Processing and Computational Linguistics, pages
2019. 121–136. Springer, 2013.
[9] O. Janssens, M. Slembrouck, S. Verstockt, S. V. [21] Chris Van Pelt and Alex Sorokin. Designing a
Hoecke, and R. V. de Walle. “Real-time Emotion scalable crowdsourcing platform. In Proceedings of
Classification of Tweets”. In Proc. of Intl. Conf. on the 2012 ACM SIGMOD International Conference
Advances in Social Network Analysis and Mining, on Management of Data, pages 765–766, 2012.
pages 1430–1431, August 2013. [22] W. Wang, L. Chen, K. Thirunarayan, and A. P.
[10] N. H. Khun, T. T. Zin, M. Yokota, and H. Y. Thant. Sheth. “Harnessing Twitter “Big Data” for Au-
“Emotion Analysis of Twitter Users on Natural tomatic Emotion Identification”. In Proc. of Intl.
Disasters”. In Proc. of Global Conf. on Consumer Conf. on Privacy, Security, Risk and Trust and Intl.
Electronics, Osaka, Japan, October 2019. Conf. on Social Computing, pages 587–592, 2012.
[11] T. Kim and K. Wurster. “Emoji 0.5.4”, Sep 11, [23] U. Yaqub, S. Chun, V. Atluri, and J. Vaidya.
2019 (last accessed on March 5, 2020). https:// “Sentiment-based Analysis of Tweets during the
pypi.org/project/emoji/. US Presidential Election”. pages 1–10, 06 2017.
[12] R. Klinger et al. “An Analysis of Annotated Cor- [24] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin. “A
pora for Emotion Classification in Text”. In Proc. of Survey of Sentiment Analysis in Social Media”.
the Intl. Conf. on Computational Linguistics, pages Knowledge and Information Systems, (60):617–
2104–2119, 2018. 663, July 2018.
Authorized licensed use limited to: Univ Politecnica de Madrid. Downloaded on September 07,2022 at 12:57:16 UTC from IEEE Xplore. Restrictions apply.