Professional Documents
Culture Documents
Emotion Sentiment Analysis of Indian Twitter-Data of COVID-19 After Lockdown
Emotion Sentiment Analysis of Indian Twitter-Data of COVID-19 After Lockdown
Abstract— This COVID-19 pandemic has severely affected Twitter data. We find the emotion semantics of the text of the
countries around the world. The intensity of the pandemic in COVID-19 tweet related to emotion sentiment and opinion
India is also increased. The number of new cases is increasing analysis. We have five emotions categories Happy, Sad,
day by day. In the past six months, the total number of cases Angry, Fear and Shame [5][12]. We using deep learning
crossed 50 lakh and the total number of deaths is almost 1 lakh. methods for the detection of the emotion sentiment from the
It has been observed that the sudden outbreaks of such text.
pandemics affect public mental states and emotions. This
pandemic also results in either constructive or destructive II. EMOTION SENTIMENT ANALYSIS
behavioural changes among people. Anger, Sadness, and fear
are the most common emotions witnessed among the people Emotion sentiment analysis is like normal sentiment
during several pandemics. Social media platform like Twitter analysis except that it having multiclass classification rather
and others have rich sources of information from people. So, we than a binary class that is different emotion of humans in place
are here going to analyze the emotions of the peoples after the of positive or negative polarity [5][11]. Emotion sentiment
COVID-19 six month's phase. It can help to detect how a person analysis focus on different multiclass classification of the
is feeling and can help those persons to not feel negative or can textual information in different emotions like happy, sad,
be able to stop them from taking any wrong decision. angry, shame or fear. In this whole text sentence or paragraph
is analysed based on the emotion consist of the given text
Keywords— COVID-19, sentiment, emotion, classification, according to the emotion sentiment. Here we are using a
LSTM. machine learning approach for detecting the emotion from the
unlabelled data. We here using the labelled data from different
I. INTRODUCTION standard data available for the emotion sentiments like smile
This Sentiment analysis is an important research topic for dataset which is very famous emotion sentiment labelled data
understanding the people's reviews and opinions on to train our model.
anything’s like reviews against a product like in favour of that
product or in against of the same or it may be tweet over the Emotion sentiment analysis results can be used to find the
tweeter whether they are posting negative tweets or positive mental health of the person or can detect the unusual activities
tweets all matters by analysing a large amount of information that are going to be carried out by a person or we can stop a
from the reviews or the tweeter data [1]. In this large number person by knowing the emotion of a person from its last post
of peoples are expressing their opinion or thought about some the social network. So, we can stop a person from taking the
things or on some current hot topic in news. We can detect wrong step in his life or we can stop a person from doing some
how peoples are reacting to some current event topic or news shameful work. We can also use this for predicting how much
on the social platform. We can do text sentiment analysis a customer is happy with a product. We can predict that the
finding the opinions of the peoples and their thoughts whether customer is happy or sad about using that product or feeling
they are thinking positive or negative about that current news angry because of the product [10]. So, it can play a very
or topic or things [10]. significant role in business or our daily life. By use of this
analysis on social media, it will be beneficial for a person who
Active feedback of peoples is a very essential part of any is going to take a wrong step and helping alerts can be
business and public opinion about anything whether it is a generated to their friends based on the emotion or text of that
public voice for something and in election time. it also person. So, emotion sentiment analysis gives a very deep
indicates many things about the situation.[7] So, in this paper, insight into a product and also give a strong opinion of a
we are focusing on the emotion of the peoples during the after person in emotion. How a person is feeling using a product.
6 months of COVID-19 how the peoples are thinking and how So, it will be used to improve productivity as well as the
their mental conditions are now after 6 months of the COVID- product which is in the market.
19pandemic. We have the tweeter data collected for a week
duration worldwide using the scrapping method. We are here III. LITERATURE REVIEW
to find the emotions of the peoples using the emotion Emotion sentiment analysis is a computation way to
sentiment analysis. predict the opinions or emotions expressed by peoples through
Emotions sentiment from COVID-19 related tweets on the text. Nowadays it is a common research area to predict the
Twitter. – We have created a machine learning-based model opinions of large crowd people’s sentiments in different
that is LSTM for emotion sentiment detection of COVID-19 aspects as the review comment of the customer of different
hashtags tweets text, gives good outcomes in comparison with eCommerce products or offline products. It can be used for the
various well-known machine learning algorithm methods. We prediction of the stock exchange, news sentiments, political
discuss and discover meaningful insights about COVID-19
analysis through political text articles, news, natural speeches the hashtags related to the COVID-19 and location in India
& many more [3], [8], [10]. using the scrapping process. We also use the standard smile
dataset for the training data so first we used that data to train
Many machine learning or deep learning algorithms is our model for the labelled data. For the five different
used to analyse a large amount of data and to predict the categories like Happy, Sad, Angry, Fear and Shame [5], [11].
sentiment on many topics. By using machine learning
methods, the opinions of peoples can be predicted through
social platforms. LSTM has very good noticeable accuracy in
analysing people’s opinions on Reddit, Naves-Bayes & KNN Social
Media
algorithms also get good accuracy for the sentiment analysis
for binary classification for detecting influenza in the Arab Web Scrapping
region through social media[1], [2], [3], [4]. Twitter is much
common source to collect data for research analysis purposes. Drop NA Remove
Hyperlinks
Records
There are many works of literature concerning peoples to the
action by posted text in social platforms, which can be Convert Data Gathering Remove
differentiated by the type of things that occurs in reaction to Emoji to Punctuatio
Text n
and through the target study.[4][7] These events types include
Remove
criminal and terrorist activities and protests or health Drop stop words
Duplicate
associated events or maybe nature disasters. Studies Pre-
Processing
conducted for many purposes including analysis of spreading Convert to
Remove non
English words
of the information on the various social media platform small case
[14][15][16][17]. Tokenize
Extract text
Sentiment analysis of tweeter data has been a hot research Hashtags
Labelled Dataset
area topic interest worldwide for more than 10 years. The Stemming
Remove
researchers did many analyses on the tweeter data for finding single
text
422
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:08:51 UTC from IEEE Xplore. Restrictions apply.
2021 Second International Conference on Secure Cyber Computing and Communication (ICSCCC)
• Drop Duplicate Record's: Here we dropped the projection of words is described by a vector. It learns from the
duplicate records to prevent extra processing power or placement of text words into the vector space. It is done on
affect our analysis. given words and it encloses when it is in use. In our process,
here we have the predefined function to convert the words into
• Convert all text data into Small Case: We converted all vector form using the SKlearn library
the data into the same case so the data will not make
any inconsistency in our analysis. D. Proposed Model
• Extract Hashtags: Extracting hashtags is essential for We have used the LSTM model for our task. LSTM (Long
analysing the tweets Short-Term Memory) is a certain kind of artificial neural
network that works good enough for the text type data[1]. It
• Remove Single Character Single character: doesn’t works very efficiently in long time dependencies. In LSTM
have any sentiment meaning so it will not add up any there is term memory which keeps track of the info for a long
sentiment meaning to our sentiment analysis so we just time in its memory.
removed it from our tweets.
Here we have created the LSTM model, few CNN [12]
• Remove Hyperlink's: Hyperlinks are just URL’s and tasks remain the same there is no change. We have used 256
don’t have any sentiment meaning as it is a string of dimensions for the embedding layer with a vocabulary size of
some combination of the specified format so we 2000. For non-linearity, we use a ReLU activation function
removed it using the below regex expression. [11]. Then we introduced a hidden layer to the LSTM model
of 128 sizes. For reducing overfitting, we also added 0.5
• Remove Punctuation & Special Character's: dropouts. For connecting the output of previously made layers
Punctuations doesn’t have any special meaning in the we introduced a fully connected layer. At last, to distinguishes
text or special character also don’t have any meaning input sentiment into five classes that are Happy, Sad, Angry,
so we removed the same as using the below regex Fear, and Shame. Fig. 2 describes our proposed LSTM model.
expression.
Finally, we compile the created model using the Adam
• Remove Stop Words and extra spaces: In Natural optimization algorithm as an optimizer and categorical cross-
Language processing, stop words are words which are entropy as a loss function. After this, we start our training
which doesn’t affect the meaning of the sentence so we process. We have achieved 94% accuracy after the training
filter out all stop words. The most common words in a model process.
language are known as stop words, there is no single
list available for stop words used by all NLP tools. So Input: (None, 32, 256)
we removed all the stops words from all tweets. Using Embed input: Input Layer
Output: (None, 32, 256)
the stop words list that we have created.
• Remove Non-English: We here are doing sentiment Input: (None, 257)
analysis of the English language as English is the most Lstm: LSTM
preferred language on social media so other language Output: (None, 257)
words don’t add any specific sentiment to our analysis
so we removed all non-English words. Input: (None, 257)
SoftMax: Dense )
• Tokenize Text: we need to tokenize all text tweets of Output: (None, 5)
our dataset. Tokens are individual terms or words, and
tokenization means splitting a text or a string into Fig. 2. Applied LSTM model.
multiple tokens.
V. EXPERIMENTS RESULTS
• Stemming: Rule-based splitting of the suffixes (“ing”,
We need to set some hyperparameter to measure the
“ly”, “es”, “s” etc) from a word is known as stemming.
performance of the LSTM model [1], [2]. Sometimes we
Taking eg. “play”, “player”, “played”, “plays” &
change these parameters for detecting which parameters will
“playing” are the dissimilar variations of the same
yield us better results in term of accuracy for given data.
word – “play”.
TABLE I. LSTM model Hyper-parameters
• Convert to Vectors: we need to convert each tweet into
its equivalent vector that is a numeric representation so Hyperparameter Used Values
it can make sense to our developed machine learning LSTM model hidden layer size 256
model.
Size of Dropout 0.5
C. Word place-in Size of Batch 512
Word place-in provides us with a way to make a feasible No. of Epochs 50
and strong method in which the matching words will encode
similarly. Moreover, we don’t need to apply this manually. Rate of Learning 0.005
The embedding process has a dense type vector that has a
floating-point value. Basic embedding has 8 dimensions and We compile this model after setting these
for a large dataset, this number is 1024. Word embedded with hyperparameters. We used the Adam optimization algorithm
high dimension gives a better result but we need to provide
more data for learning. In embedding the words are converted as our optimizer and the loss function is categorical cross-
into a dense vector. Within a continuous vector space, the entropy. After this, we started the training process for our
model. Our model achieves an accuracy of 94% after the
423
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:08:51 UTC from IEEE Xplore. Restrictions apply.
2021 Second International Conference on Secure Cyber Computing and Communication (ICSCCC)
training process. Accuracy training vs accuracy validation and not used for the training. This shows the comparison of the
loss training vs loss Validation performance comparison using accuracy of the model at training data and after the training
the plotted graph for each epoch used in the next part. predicting the data which is not used for the model. If the
A. Model Performance Plot accuracy of validation data is the same as the accuracy at
training, then the model is predicting rightly but if it is giving
In this section, the plotted performance graph of our model very low accuracy at validation data then there is a problem
for the training data and the validation data is shown in Fig.3 in the model. As in our model, the accuracy of the validation
and Fig. 4. is near to the accuracy for the validation data so our model is
The graph is plotted between training data loss vs predicting rightly.
validation data loss and training data accuracy and validation
data accuracy of the LSTM model. VI. EXPERIMENTS RESULTS
Below Fig. 3 the graph is plotted between the loss at training
We get a very interesting result after applying the model
and loss at Validation time. Three possible conditions tells
to our data. Our model achieved an accuracy of the 94%
about the model that it is overfitting underfitting or perfect
which is very good accuracy. We get the different number of
fit.
1. Overfitting tweets counts for the different emotions over the India tweets
of duration after the lockdown in India from date.
This condition arises when Training loss is much smaller
than the Validation loss means Training loss <<
Validation loss.
2. Underfitting
This condition arises when Training loss is much greater
than the Validation loss means Training loss >>
Validation loss.
3. Perfect fit
This condition arises when Training loss and Validation
loss are nearly equal to each other or they are converging
over time it means we are doing things right.
Training loss == Validation loss.
As in our graph, it is nearly converging so there is no
overfitting and underfitting in our model.
424
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:08:51 UTC from IEEE Xplore. Restrictions apply.
2021 Second International Conference on Secure Cyber Computing and Communication (ICSCCC)
Here in Fig. 10. that the words which show the anger of
the peoples in tweets. So, there are words like blame, cross
etc. which represents the word cloud for the angry emotion’s
tweets.
425
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:08:51 UTC from IEEE Xplore. Restrictions apply.
2021 Second International Conference on Secure Cyber Computing and Communication (ICSCCC)
426
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:08:51 UTC from IEEE Xplore. Restrictions apply.