Sentiment Analysis of COVID Data Extracted Via Twitter: (Constituent College of Somaiya Vidyavihar University)

K J SOMAIYA COLLEGE OF ENGINEERING,
MUMBAI-77
(CONSTITUENT COLLEGE OF SOMAIYA VIDYAVIHAR
UNIVERSITY)
Sentiment Analysis of COVID data extracted via Twitter
Presented by:
Rugved Sunil Mone
16030720007
Guide : Dr. Bhakti Palkar

7/21/2020 1
Contents of Presentation
● Introduction
● Scope
● Literature Survey
● Data extraction using Twitter Developer API
● Dataset and it’s breakdown
● Dataset cleaning
● Visualization
● Dataset preprocessing
● Training and Classification
● Results
● Conclusion and Inferences
● Future Scope (Stage 2)
● References
7/21/2020 2
Introduction
• What is Sentiment analysis ?

• Why this topic ?
• Which techniques have been developed ?
7/21/2020 3
Scope (Stage 1)
• To understand the volume of information which can be extracted
from textual tweets.
• To study and analyze machine learning algorithms for sentiment
analysis.
• To shortlist and develop one machine learning algorithm to perform
sentiment analysis
• To understand the implementation of Twitter Developer API for
tweets extraction.
7/21/2020 4
Literature survey
Naive Bayes Classification
Cleaning & Preprocessing: Datasets size:
• Letter casing • 15000 tweets
• Tokenizing • 20000 tweets
• Noise removal
• Stop words removal Techniques used for
• Normalization creating sparse matrix:
• Stemming • TFIDF
• Lemmatization • Count vectorization
• Vectorization of data
7/21/2020 5
Random Forests (Methodology)
Datasets size:
10000 tweets
• Based on
CART
• TFIDF for
nodes
Accuracy: 77 %
7/21/2020 6
KNN and other algorithms
Datasets size: 8000 tweets
Limitations:
• Distance computing
• Feature selection
Accuracy: 72 %
Other algos:
• SVM
• Logistic regression
7/21/2020 7
Extracting data using Twitter Developer API
7/21/2020 8
Dataset
No of tweets: 41158
• Compiled from 3
sources
• Manually labelled
partial data
7/21/2020 9
Dataset breakdown
• Positive : 17500
• Negative : 15000
• Neutral : 7500
7/21/2020 10
Dataset cleaning
Steps taken for dataset cleaning:
• null values
• http links
• punctuations
• digits
• stop words
• user tags
7/21/2020 11
Dataset cleaning
7/21/2020 12
Dataset cleaning
7/21/2020 13
Tokenization
7/21/2020 14
Tokenization
7/21/2020 15
Visualization
7/21/2020 16
Visualization
Positive Negative
7/21/2020 17
Dataset preprocessing (TFIDF & Count
Vectorizer)
7/21/2020 18
Sparse Matrix Generation
7/21/2020 19
Training and Classification
7/21/2020 20
Results
training split accuracy: 88%.

testing split accuracy: 84%
7/21/2020 21
Conclusion
• The performance of the Naïve Bayes Classifier (84%) is comparable with
other machine learning algorithms mentioned in the literature survey.
• The performance of the classifier is also affected by the location of the user’s
tweet. It was found that in the testing split, 90% of the tweets which have
been tagged in the United Kingdom region, were correctly classified.
• Sarcasm present in some tweets also impacted the performance.
• The choice of pre-processing techniques is also relevant in combination with
the machine learning algorithm considered.
7/21/2020 22
Future Plan (Stage 2)
• As a part of Stage 1, the literature survey was performed to analyze only

machine learning algorithm, and implement one of them based on the
review.
• However, other and possibly more complex techniques are also available to
be considered such as Deep Learning which will employ Neural Networks.
• These techniques will be explored as a part of Stage 2. After conducting a
thorough literature review, one of these Deep Learning techniques will be
implemented and tested in hope of better results.
7/21/2020 23
References
[1] A. Radaideh, F. Dweiri and M. Obaidat, "A Novel Approach to Predict the Real
Time Sentimental Analysis by Naive Bayes & RNN Algorithm during the COVID
Pandemic in UAE," 2020 International Conference on Communications, Computing,
Cybersecurity, and Informatics
[2] H. Adamu, M. J. Bin Mat Jiran, K. H. Gan and N. -H. Samsudin, "Text Analytics
on Twitter Text-based Public Sentiment for Covid-19 Vaccine: A Machine Learning
Approach," 2021 IEEE International Conference on Artificial Intelligence in
Engineering and Technology (IICA
IET),
[3] Imamah and F. H. Rachman, "Twitter Sentiment Analysis of Covid-19 Using
Term Weighting TF-IDF And Logistic Regresion," 2020 6th Information Technology
International Seminar
7/21/2020 24
References
[4] A. J. Nair, V. G and A. Vinayak, "Comparative study of Twitter Sentiment On
COVID - 19 Tweets," 2021 5th International Conference on Computing
Methodologies and Communication
[5] M. Wongkar and A. Angdresey, "Sentiment Analysis Using Naive Bayes
Algorithm Of The Data Crawler: Twitter," 2019 Fourth International Conference on
Informatics and Computing (ICIC), 2019
[6] N. Muhammad, S. Bukhori and P. Pandunata, "Sentiment Analysis of Positive
and Negative of YouTube Comments Using Naïve Bayes – Support Vector Machine
(NBSVM) Classifier," 2019 International Conference on Computer Science,
Information Technology, and Electrical Engineering
7/21/2020 25
THANK YOU
7/21/2020 26

Sentiment Analysis of COVID Data Extracted Via Twitter: (Constituent College of Somaiya Vidyavihar University)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sentiment Analysis of COVID Data Extracted Via Twitter: (Constituent College of Somaiya Vidyavihar University)

Uploaded by

Copyright:

Available Formats

K J SOMAIYA COLLEGE OF ENGINEERING,

Sentiment Analysis of COVID data extracted via Twitter

Guide : Dr. Bhakti Palkar

• What is Sentiment analysis ?

training split accuracy: 88%.

• As a part of Stage 1, the literature survey was performed to analyze only

You might also like