You are on page 1of 4

2020 International Conference for Emerging Technology (INCET)

Belgaum, India. Jun 5-7, 2020

Sarcasm Detection using Genetic Optimization on


LSTM with CNN
Darkunde Mayur Ashok1, Agrawal Nidhi Ghanshyam2, Sayed Saniya Salim3, Dungarpur Burhanuddin Mazahir4, Bhushan S.Thakare5
1,2,3,4,5
Computer Engineering, Sinhgad Academy of Engineering, Pune, India
1
mayur08darkunde@gmail.com, 2agrawal.nidhi6@gmail.com, 3saniyasayed514@gmail.com, 4burhandungarpur10@gmail.com,
5
bhushan.thakare86@gmail.com

Abstract—The challenging problem of 21st Century is to a unique approach of parameter optimization of LSTM by
detect sarcasm in vivid data available on a large scale. Over 20 genetic algorithm followed by a CNN model.
years of study in this field, the past 10 years have shown a
significant progress not only in semantic features, but also an The dynamic detection of sarcasm, at present, is needed,
upward trend has also been observed in the various machine- as sarcasm plays an important part in everyday aspect.
learning approaches to analyze and process the data. To enlist Several studies concentrate on detecting sarcasm by using
a few, theories of sarcasm, it’s syntactical and semantic various data sets, features and algorithms. Abundant study
properties; lexical features have been an area of interest for done in this field has majorly utilized Twitter as dataset.
almost all of them. In this paper, we propose a unique deep
neural network model whose Bidirectional LSTM undergo II. SARCASM: KEY ASPECTS
Hyper parameters optimization using genetic algorithm
followed by a Convolution Neural Network for sarcasm ‘Sarcasm’ has been in the core of human emotions right
detection. We put forward the results in a robust way, which from the beginning of the mankind, humans have used it as a
may result in a better future work in this field. sly weapon to intimidate a person in a jolly way. The
contrast of what been said and what really implies is not the
Keywords— Genetic; Optimization; Deep learning; Natural same always which usually differs the way it is interpreted.
language processing; LSTM; CNN;
A. Sarcasm in Human Emotion
I. INTRODUCTION Sarcasm for humans is a form of humor. It has been
st gradually being used by people to reflect something negative
21 Century, the Information Age, showed a surge in the in positive way regarding people who are generally at a
generation of enormous amount of data which directly higher order. In recent years, it became a necessity to extract
affected on opinionated textual data over the Internet through the actual emotion and opinion of people on social media
various social media platforms. Sentiment Analysis is used platforms as it provides a uniform platform to all kinds of
for the analysis of opinionated text for a better understanding people. Sarcasm plays crucial role in sentiment analysis as
of the actual motive of the person concerned. Sarcasm is sarcasm is basically expressing emotion in well framed
identified as a sentiment where, the literal meaning differs sentence. For example, “Cannibalism can solve both of the
from what is being said. Therefore, the challenging problem world hunger and overpopulation problem” which generally
has led to rise in the study of understanding sarcasm and implies humans should kill each other.Thus identification of
detecting it in text in the area of sentiment analysis. emotions behind these comments is major task in sentiment
The research is considered as one of the latest in the area analysis. In this digital world, Social media is kind of human
of natural language processing, with time not more than 20 voice on various human issues. Human emotions such as
years, but, the need of dynamic sarcasm detection and its anger, fear, disgust, joy, surprise and sadness are indentified
application for the betterment of digital society of 21st from textual data in researches [6]. Researchers had done
century is already discovered. To begin with, there exist most of the work by using training model. This model is
various viewpoints to the particular problem. Firstly - trained by annotating the large dataset with basic emotions.
psycholinguistic, the study of establishing relationship SenticNet, wordNet this approaches are commonly used to
between natural language and human perspective. E.g The find the emotion in web based content. In recent research
existence of context based information in sarcastic utterances work millions of emoji data is used train the model and
[1]. Sarcasm is categorized as sub-form of humor from the detection sentiment, emotion and sarcasm is done in superior
point of view of humor [2]. While, considering the medical way [7].
aspect, it is examining the understanding of sarcasm of a
person with physically damaged brain affecting the thinking B. Dataset Analysis
and uttering process [3]. Convolutional Neural Networks We observe that, through all data set taken into account
(CNNs) have shown good accuracy over the state of the art by the researchers, the dataset popular among them is
traditional models. Genetic algorithm are usually used to Twitter. The breakpoint to this reason is because the
construct a CNN architecture automatically [4] and in [5] properties of Twitter, such as word limit, which is not
real time application for image classification with available on any other social media platform.
automatically generated CNN architecture with the help of it.
Embedding layers are used like a perfect launch pad. One of the features amongst many which diverts the
Embedding layer provides words to vector conversion and researcher’s interest from other social media platform in
for computational purpose numbers are always easy to Twitter is #(hashtag). A hashtag is a unique strategy for
understand from a machines perspective. This paper presents categorization of messages on Twitter. The string following
the hashtag represents or depicts the central idea of the tweet.

978-1-7281-6221-8/20/$31.00 ©2020 IEEE 1

Authorized licensed use limited to: Middlesex University. Downloaded on August 04,2020 at 05:44:08 UTC from IEEE Xplore. Restrictions apply.
Therefore, users with similar ideas adopt the same hashtag. Depression, Brooding, Polite and Maniac. Amazon alexa
For instance, consider the recent “Plastic Ban Movement” presents a tough challenge for Natural Language Processing
where the hashtag-#saynotoplastic was adopted by millions as the data gathered varies in diversity. Pandey et al. [14]
across the globe. Amazon Customer Reviews have worked on Amazon Alexa dataset which was processed
contributed to over a hundred million reviews and using the SentiWordNet 3.0 and the further processing was
experience, which provides wide spectrum of data for done with the help of TextBlob to remove noise and
Natural Language Processing (NLP). Amazon Alexa a unwanted data.
popularly house automation IOT device currently booming
in the market presents fascinating data to the researchers. For C. Semi-supervised
eg. A person saying “Alexa you would be great swimmer”. Rajeshwari and ShantiBala [15] came forward with SASI
That generally implies sarcasm, besides form the plain (Semi-supervised Algorithm for Sarcasm Identification) that
textual data; human tonal clues are present in this kind of implements both unsupervised and supervised approaches.
dataset which would leads to a better hindsight in NLP. The unsupervised approaches are used to find labeled
Emoticons that were neglected earlier are now becoming a quantities and cluster into different classes. Supervised
hot topic for recognizing the different sentiments associated machine learning approaches uses large labeled data set to
with them. train the classifier. The classification of the various types of
sarcastic emotions of the individual users is performed with
III. RELATED WORK : SARCASM DETECTION SVM.
Techniques used in sarcasm detection could be
considered the central part of the processing. Once the data D. Hybrid Approach
has been established, it is the technique that has to be clever It is the combination of more than one machine learning
enough to derive a model that fits the data. Based on the classifier, which is bind together to become mixed classifier.
types of approaches, we can broadly classify them in two Author Amir et al. [16] form CUECNN model which is for
categories. (1) Rule-based, (2) Learning algorithms and sarcasm detection using word and user embedding features.
features based. Here rule-based technique is based on Devamanyu Hazarika et al. [17] came up with the hybride
evidences of sarcasm whereas the other technique is approach as CASCADE (ContextuAl SarCasm DEtector).
dependent upon the parameters used to classify sarcasm and This model generally adapts a hybrid approach of content
learning algorithm using those parameters as features. and context-driven sarcasm detection. The above mentioned
model performs significantly better than the CUECNN
A. Rule-based approach model proposed by Amir et al. [16].
These rule-based approaches are commonly known as 9-
rule based approaches. Hao [8] used this 9-rule approach for E. Neural Network
various expressions and emotions in sentences. Important Neural Network forms the base of deep learning, a
rule of sarcasm is applied by Bharti et al. [9] that negative subfield of machine learning where algorithms are inspired
statement spoken in positive way shows the chances of from structure of human brain. Neural Network takes input
sentence to be sarcastic high. For example “I like the oil data to find pattern and then it predicts the output for new set
stains on your oversized shirt” Here sentence actual meaning of similar data. Neural units connect to many others units
is negative but it is spoken in positive way. Rule based and form a network. Each unit may have a function which
algorithm for sarcasm detection were proposed by Bharti et combines the values of all its inputs together.
al. [10] PBLGA(Parsing Based Lexical Generation
Algorithm) that takes input as a sentence and split the 1) Artificial Neural Network(ANN):
sentence in different phrases such as noun, adjective, verb ANN is based on biological Neural Network that
and adverb. After processing of the phrases the output is in replicates the processing of human brain to build algorithms
form of lexicons where the lexicons are categorized into and processes information. ANN works by building directed
positive, negative, positive situation and negative situation graphs where nodes are formed by artificial neurons, these
using different classifiers namely support vector machine, neurons are functions which take various weighted input and
naïve bayes, maximum entropy and decision tree. Use of the sum is passed though non liner function known as
parse trees is done by Reyes et al. [11] for identifying irony Activation function. The connection between neurons is
in twitter dataset, mainly statistical based method is shown by the weighted edge in graph and this various layers
implemented. of nodes feeds information forward to process. ANN deals
with large amount of associated processing units and
B. Supervised Learning produce meaningful results.
Basak et al. [12] suggested the need for classification of 2) Convolution and Recurrent Neural Network:
online shaming tweets not just into positive and negative but Le Hoang Son et al. [18] implied convolution neural
into Abusive, Comparison, Passing judgment, Religious, network and soft attention-based bidirectional long short-
Sarcasm, Whataboutery. Thus six SVM’s were trained for term memory in their studies. They used two dataset of
above classification. Rajeshwari and shantibala et al. [13] Twitter as balanced and unbalanced, balanced dataset has
proposed a system to recognized the sarcastic emotion of the 20,000 annotated tweet that include hash tags e.g. #sarcasm,
individual with use of Multinomial Naïve Bayes(MNB) and #sarcastic, etc and unbalanced dataset 15,000 sarcastic and
also to identify type of sarcastic emotion using SVM model. 25,000 non-sarcastic tweet. 39,000 tweets are used for
MNB basically models the word counts and regulated the training dataset containing tweets. Furthermore author used
underlying calculation to deal with in. SVM here is further eight layers for this hybrid approach. Humor a form of
used for classifying sarcastic tweets in different type as amusement for human beings but is a complex aspect and

Authorized licensed use limited to: Middlesex University. Downloaded on August 04,2020 at 05:44:08 UTC from IEEE Xplore. Restrictions apply.
contains many forms. Jaiswal et al. [19] proposed a work for A. Data pre-processing
humor detection and classification of its type such as The data present on the twitter is diverse in its own
Sarcasm, Pun, Irony and Exaggeration using various nature like the human population we have limited our self to
classification algorithms. only English language. The tweets are acquired using
P.K.Mandal and R.Mahto [20] worked on news headline #hastags such as #Sarcasm, #sarcasm, #Humor. After
dataset for sarcasm detection. News headline implied perfect acquisition of the data is pre-processed by eliminating URLs,
meaning, minimal grammatical error, unique sentences and hashtags, person tagged, emojis and conversion of text into
clean data. Natural Language Toolkit (NLTK) was used for lowercase. NLTK is used to process the raw data and
data pre-processing and to carry out operations such as perform tokenization operation. The words are then reduced
tokenization and lemmatization. The dataset consisted of to their root words’ using the Potter Stemmer the operation is
26,709 news headlines where 11,725 were sarcastic records. called as Stemming.
The first layer of the proposed model is an embedding layer
to encode words into vectors which was followed by a CNN- B. Input Layer
LSTM . Hia and Shimada [21] proposed a method that The input layer acts as the first layer in the architecture
classifies text into sarcastic or non-sarcastic using an RNN where the processed tweets are fed to the network.
model. They used two types of features to train their RNN
Model 1.)Words in sentences 2.)The role pair relation vector. C. Embedding layer
Firstly, input words in a sentence were inserted into the RNN
model. Identification of role pairs in the sentence by The second layer in the architecture is the embedding
obtaining a list of role pairs from a corpus. Construct of the layer which converts the input text into real time vector
role pair relation vector and input the vector to the RNN representation. The conversion of words into vector
model took place in the last phase. If role pairs were not representation is beneficial as computer understand numbers
present in the sentence similar relation vectors were assigned for clearly than a human language. BERT has been used as
to similar sentences. an embedding layer, as it uses Transformer to understand the
contextual relation between words related to each other in
the text being processed. It is been trained on a large corpus
IV. PROPOSED APPROACH of Wikipedia which results in better performance.
The deep learning architecture proposed in this paper
consist of BERT as a embedding layer , LSTM layer whose D. LSTM layer
hyper parameters, window size and number and hidden units The third layer is the interconnected cells of the LSTM;
have been optimized using a genetic algorithm followed by a they act as a base architecture of Recurrent Neural Network.
Convolutional Neural Network. The main function performed by the LSTM layer is to decide
which data to remember and which data to forget via the
forget gate as the internal architecture of LSTM is a gated
architecture which consist of 3 gates input, output and forget
gate.

E. Genetic Optimization
The LSTM cell consists of various parameters which are
needed to be fine tuned to get the optimal results.
Optimization with the help of Genetic algorithm is done to
obtain the optimal hyper parameters, window size and
number of hidden units of the LSTM layer. The overall
optimization reduces the Root Mean Square Error (RMSE)
which indirectly increases the accuracy of the model.

Fig. 1. System Arctitecture of proposed model

Fig. 2. Genetic Algorithm

Authorized licensed use limited to: Middlesex University. Downloaded on August 04,2020 at 05:44:08 UTC from IEEE Xplore. Restrictions apply.
F. Convolutional Neural Network [5] Yanan Sun, Bing Xue, Mengjie Zhang, Gary G. Yen, “Automatically
designing CNN architectures using genetic algorithm for image
The fourth layer in the architecture is the Convolutional classification” in IEEE Transactions on Cybernetics, 2020
Neural Network. CNN have been used as a unique approach [6] C. Strapparava and R. Mihalcea, “Learning to Identify Emotions in
to solve the real world application using deep learning. The Text”, Proceedings of the 2008 ACM Symposium on Applied
idea of CNN followed by LSTM is; CNN has the capability Computing, pp. 1556-1560, 2008.
to search over a large space while LSTM have the capability [7] Jayashree Subramanian, Varun Sridharan, Kai Shu, and Huan Liu,
to search over time. The optimized LSTM layer provide “Exploiting Emojis for Sarcasm Detection”, Springer Nature
Switzerland AG 2019, pp. 70–80,2019.
good set of input to the CNN to increase the efficiency of the
[8] Hao, T. V., “Detecting Ironic Intent in creative Comparisons.” ECAI,
model to detect sarcasm in a contextual way due to BERT Vol. 215, 765–770, 2010.
embedding layer.
[9] S. K. Bharti, K. S. Babu, S. K. Jena, “Parsing-based Sarcasm
Sentiment Recognition in Twitter Data,” in Proceedings of the 2015
V. DISCUSSION IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining , 2015
The proposed approach would perform better that the [10] S.K. Bharti , R. Pradhan , K.S. Babu ,S.K. Jena,” Trends in Social
state of the art models, as they rely mostly on the lexical Network Analysis”, , Lecture Notes in Social Networks,2017,pp 51-
features. BERT provides a way to interpret sarcasm in a 76
contextual way. Contextualized information provides a better [11] A. Reyes, P. Rosso ve T. Veale, “A multidimensional approach for
hindsight to get in-depth understanding of text. A CNN detecting irony in Twitter,” Language Resources and Evaluation,
network provides an accuracy of 85-88%. A LSTM-CNN 2012
model with word embedding provides an accuracy of 90- [12] Rajesh Basak, Shamik Sural , Niloy Ganguly, ve Soumya K. Ghosh,
“Online Public Shaming on Twitter: Detection, Analysis, and
92%. The proposed model would give accuracy around 93- Mitigation”,2019
95% due to the parameter optimization, BERT as embedding [13] Rajeswari K and ShanthiBala P, “sarcasm detection using machine
layer and reduced RMSE error. learning techniques ”, International Journal of Recent Scientific
Research, Vol. 9, Issue, 4(L), pp. 26368-26372, April, 2018
VI. CONCLUSION [14] Avinash Chandra Pandey, Saksham Raj Seth and Mahima Varshney,
“Sarcasm Detection of Amazon Alexa Sample Set”Advances in
The field of Sarcasm detection has seen a significant Signal Processing and Communication, Lecture Notes in Electrical
growth in past few years, due to increase in usage of social Engineering 526, 2019.
media sites and big data boom. Sarcasm detection remains [15] K.Rajeswari and P.ShanthiBala, “Recognization of Sarcastic
the problem where many ways have been explored to reach Emotions of Individuals on Social Network ”, International Journal
of Pure and Applied Mathematics, Vol. 118 No. 7, pp. 253-25, 2018
the goal state but only partial fulfillment is achievable. The
traditional techniques/algorithm performs well with small [16] S. Amir, B. C. Wallace, H. Lyu, P. Carvalho , M. J. Silva, “Modelling
Context with User Embeddings for Sarcasm Detection in Social
amount of data but is not able to crunch large volume of Media,” in Proceedings of The 20th SIGNLL Conference on
data. The work done provides a way to understand sarcasm Computational Natural Language Learning, 2016.
using deep learning model. The proposed deep learning [17] D. Hazarika, S. Poria, S. Gorantla, E. Cambria, R. Zimmermann, R.
model for sarcasm detection has been explained. The future Mihalcea, “CASCADE: Contextual Sarcasm Detection in Online
work of the model would be to detect sarcasm in real time Discussion Forum” in Proceedings of the 27th International
data. Conference on Computational Linguistics, pages 1837–1848, Santa
Fe, New Mexico, USA, August 20-26, 2018.
[18] Le Hoang Son, Akshi Kumar , Saurabh Raj Sangwan , Anshika
REFERENCES Arora, Anand Nayyar , ve Mohamed Abdel-Basset, “Sarcasm
Detection Using Soft Attention-Based Bidirectional Long Short-Term
[1] R. G. W., “On the Psycholinguistics of Sarcasm,” Journal of
Memory Model With Convolution Network”, Special section on
Experimental Psychology, Volume 115, pp. 315, 1986.
emerging trends, issues and challenges in underwater acoustic sensor
[2] R. A. Martin, P. Puhlik-Doris, G. Larsen, J. Gray ve K. Weir, networks,2019
“Individual differences in uses of humor and their relation to
[19] Arunima Jaiswal, Monika, Anshu Mathur, Prachi, Sheena Mattu,
psychological well-being: Development of the Humor Styles
“Automatic Humour Detection in Tweets using Soft Computing
Questionnaire,” Journal of Research in Personality, Volume 37, no. 1,
Paradigms”, International Conference on Machine Learning, Big
pp. 48-75, 2003.
Data, Cloud and Parallel Computing (Com-IT-Con),2019
[3] S. McDonald ve S. Pearce, “Clinical insights into pragmatic theory:
[20] Paul K.Mandal and Rakeshkumar Mahto, “Deep CNN-LSTM with
frontal lobe deficits and sarcasm,” Brain and Language, Volume 53,
Word Embeddings for News Headline Sarcasm Detection”, 16th
no. 1, pp. 81-104, 1996.
International Conference on Information Technology-New
[4] Lingxi Xie, Alan Yuille, “ Genetic CNN ” The IEEE International Generations (ITNG),2019.
Conference on Computer Vision (ICCV), 2017, pp. 1379-1388
[21] Hiai, S., & Shimada, K. (2019). Sarcasm Detection Using RNN with
Relation Vector. International Journal of Data Warehousing and
Mining,15(4),66–78

Authorized licensed use limited to: Middlesex University. Downloaded on August 04,2020 at 05:44:08 UTC from IEEE Xplore. Restrictions apply.

You might also like