Professional Documents
Culture Documents
Sentiment Analysis of COVID Data Extracted Via Twitter: (Constituent College of Somaiya Vidyavihar University)
Sentiment Analysis of COVID Data Extracted Via Twitter: (Constituent College of Somaiya Vidyavihar University)
MUMBAI-77
(CONSTITUENT COLLEGE OF SOMAIYA VIDYAVIHAR
UNIVERSITY)
Presented by:
Rugved Sunil Mone
16030720007
7/21/2020 3
Scope (Stage 1)
• To understand the volume of information which can be extracted
from textual tweets.
• To study and analyze machine learning algorithms for sentiment
analysis.
• To shortlist and develop one machine learning algorithm to perform
sentiment analysis
• To understand the implementation of Twitter Developer API for
tweets extraction.
7/21/2020 4
Literature survey
Naive Bayes Classification
Cleaning & Preprocessing: Datasets size:
• Letter casing • 15000 tweets
• Tokenizing • 20000 tweets
• Noise removal
• Stop words removal Techniques used for
• Normalization creating sparse matrix:
• Stemming • TFIDF
• Lemmatization • Count vectorization
• Vectorization of data
7/21/2020 5
Random Forests (Methodology)
Datasets size:
10000 tweets
• Based on
CART
• TFIDF for
nodes
Accuracy: 77 %
7/21/2020 6
KNN and other algorithms
Datasets size: 8000 tweets
Limitations:
• Distance computing
• Feature selection
Accuracy: 72 %
Other algos:
• SVM
• Logistic regression
7/21/2020 7
Extracting data using Twitter Developer API
7/21/2020 8
Dataset
No of tweets: 41158
• Compiled from 3
sources
• Manually labelled
partial data
7/21/2020 9
Dataset breakdown
• Positive : 17500
• Negative : 15000
• Neutral : 7500
7/21/2020 10
Dataset cleaning
Steps taken for dataset cleaning:
• null values
• http links
• punctuations
• digits
• stop words
• user tags
7/21/2020 11
Dataset cleaning
7/21/2020 12
Dataset cleaning
7/21/2020 13
Tokenization
7/21/2020 14
Tokenization
7/21/2020 15
Visualization
7/21/2020 16
Visualization
Positive Negative
7/21/2020 17
Dataset preprocessing (TFIDF & Count
Vectorizer)
7/21/2020 18
Sparse Matrix Generation
7/21/2020 19
Training and Classification
7/21/2020 20
Results
7/21/2020 21
Conclusion
• The performance of the Naïve Bayes Classifier (84%) is comparable with
other machine learning algorithms mentioned in the literature survey.
• The performance of the classifier is also affected by the location of the user’s
tweet. It was found that in the testing split, 90% of the tweets which have
been tagged in the United Kingdom region, were correctly classified.
• Sarcasm present in some tweets also impacted the performance.
• The choice of pre-processing techniques is also relevant in combination with
the machine learning algorithm considered.
7/21/2020 22
Future Plan (Stage 2)
7/21/2020 23
References
[1] A. Radaideh, F. Dweiri and M. Obaidat, "A Novel Approach to Predict the Real
Time Sentimental Analysis by Naive Bayes & RNN Algorithm during the COVID
Pandemic in UAE," 2020 International Conference on Communications, Computing,
Cybersecurity, and Informatics
[2] H. Adamu, M. J. Bin Mat Jiran, K. H. Gan and N. -H. Samsudin, "Text Analytics
on Twitter Text-based Public Sentiment for Covid-19 Vaccine: A Machine Learning
Approach," 2021 IEEE International Conference on Artificial Intelligence in
Engineering and Technology (IICA
IET),
[3] Imamah and F. H. Rachman, "Twitter Sentiment Analysis of Covid-19 Using
Term Weighting TF-IDF And Logistic Regresion," 2020 6th Information Technology
International Seminar
7/21/2020 24
References
[4] A. J. Nair, V. G and A. Vinayak, "Comparative study of Twitter Sentiment On
COVID - 19 Tweets," 2021 5th International Conference on Computing
Methodologies and Communication
[5] M. Wongkar and A. Angdresey, "Sentiment Analysis Using Naive Bayes
Algorithm Of The Data Crawler: Twitter," 2019 Fourth International Conference on
Informatics and Computing (ICIC), 2019
[6] N. Muhammad, S. Bukhori and P. Pandunata, "Sentiment Analysis of Positive
and Negative of YouTube Comments Using Naïve Bayes – Support Vector Machine
(NBSVM) Classifier," 2019 International Conference on Computer Science,
Information Technology, and Electrical Engineering
7/21/2020 25
THANK YOU
7/21/2020 26