You are on page 1of 26

K J SOMAIYA COLLEGE OF ENGINEERING,

MUMBAI-77
(CONSTITUENT COLLEGE OF SOMAIYA VIDYAVIHAR
UNIVERSITY)

Sentiment Analysis of COVID data extracted via Twitter

Presented by:
Rugved Sunil Mone
16030720007

Guide : Dr. Bhakti Palkar


7/21/2020 1
Contents of Presentation
● Introduction
● Scope
● Literature Survey
● Data extraction using Twitter Developer API
● Dataset and it’s breakdown
● Dataset cleaning
● Visualization
● Dataset preprocessing
● Training and Classification
● Results
● Conclusion and Inferences
● Future Scope (Stage 2)
● References
7/21/2020 2
Introduction

• What is Sentiment analysis ?


• Why this topic ?
• Which techniques have been developed ?

7/21/2020 3
Scope (Stage 1)
• To understand the volume of information which can be extracted
from textual tweets.
• To study and analyze machine learning algorithms for sentiment
analysis.
• To shortlist and develop one machine learning algorithm to perform
sentiment analysis
• To understand the implementation of Twitter Developer API for
tweets extraction.
7/21/2020 4
Literature survey
Naive Bayes Classification
Cleaning & Preprocessing: Datasets size:
• Letter casing • 15000 tweets
• Tokenizing • 20000 tweets
• Noise removal
• Stop words removal Techniques used for
• Normalization creating sparse matrix:
• Stemming • TFIDF
• Lemmatization • Count vectorization
• Vectorization of data

7/21/2020 5
Random Forests (Methodology)
Datasets size:
10000 tweets

• Based on
CART
• TFIDF for
nodes

Accuracy: 77 %

7/21/2020 6
KNN and other algorithms
Datasets size: 8000 tweets

Limitations:
• Distance computing
• Feature selection

Accuracy: 72 %

Other algos:
• SVM
• Logistic regression

7/21/2020 7
Extracting data using Twitter Developer API

7/21/2020 8
Dataset

No of tweets: 41158
• Compiled from 3
sources
• Manually labelled
partial data

7/21/2020 9
Dataset breakdown

• Positive : 17500
• Negative : 15000
• Neutral : 7500

7/21/2020 10
Dataset cleaning
Steps taken for dataset cleaning:

• null values
• http links
• punctuations
• digits
• stop words
• user tags

7/21/2020 11
Dataset cleaning

7/21/2020 12
Dataset cleaning

7/21/2020 13
Tokenization

7/21/2020 14
Tokenization

7/21/2020 15
Visualization

7/21/2020 16
Visualization

Positive Negative

7/21/2020 17
Dataset preprocessing (TFIDF & Count
Vectorizer)

7/21/2020 18
Sparse Matrix Generation

7/21/2020 19
Training and Classification

7/21/2020 20
Results

training split accuracy: 88%.


testing split accuracy: 84%

7/21/2020 21
Conclusion
• The performance of the Naïve Bayes Classifier (84%) is comparable with
other machine learning algorithms mentioned in the literature survey.
• The performance of the classifier is also affected by the location of the user’s
tweet. It was found that in the testing split, 90% of the tweets which have
been tagged in the United Kingdom region, were correctly classified.
• Sarcasm present in some tweets also impacted the performance.
• The choice of pre-processing techniques is also relevant in combination with
the machine learning algorithm considered.

7/21/2020 22
Future Plan (Stage 2)

• As a part of Stage 1, the literature survey was performed to analyze only


machine learning algorithm, and implement one of them based on the
review.
• However, other and possibly more complex techniques are also available to
be considered such as Deep Learning which will employ Neural Networks.
• These techniques will be explored as a part of Stage 2. After conducting a
thorough literature review, one of these Deep Learning techniques will be
implemented and tested in hope of better results.

7/21/2020 23
References
[1] A. Radaideh, F. Dweiri and M. Obaidat, "A Novel Approach to Predict the Real
Time Sentimental Analysis by Naive Bayes & RNN Algorithm during the COVID
Pandemic in UAE," 2020 International Conference on Communications, Computing,
Cybersecurity, and Informatics
[2] H. Adamu, M. J. Bin Mat Jiran, K. H. Gan and N. -H. Samsudin, "Text Analytics
on Twitter Text-based Public Sentiment for Covid-19 Vaccine: A Machine Learning
Approach," 2021 IEEE International Conference on Artificial Intelligence in
Engineering and Technology (IICA
IET),
[3] Imamah and F. H. Rachman, "Twitter Sentiment Analysis of Covid-19 Using
Term Weighting TF-IDF And Logistic Regresion," 2020 6th Information Technology
International Seminar
7/21/2020 24
References
[4] A. J. Nair, V. G and A. Vinayak, "Comparative study of Twitter Sentiment On
COVID - 19 Tweets," 2021 5th International Conference on Computing
Methodologies and Communication
[5] M. Wongkar and A. Angdresey, "Sentiment Analysis Using Naive Bayes
Algorithm Of The Data Crawler: Twitter," 2019 Fourth International Conference on
Informatics and Computing (ICIC), 2019
[6] N. Muhammad, S. Bukhori and P. Pandunata, "Sentiment Analysis of Positive
and Negative of YouTube Comments Using Naïve Bayes – Support Vector Machine
(NBSVM) Classifier," 2019 International Conference on Computer Science,
Information Technology, and Electrical Engineering
7/21/2020 25
THANK YOU

7/21/2020 26

You might also like