You are on page 1of 17

Analysis of Twitter Data to Evaluate Women’s Safety and

Inclusion in Society

Mentor Name: Mr. Mohit Tiwari

Team Members:
Jasmeen Vohra (20311502720)
Sanya Bhanot (35211502720)
Shagun (36011502720)
Harshni Pandita (75111502720)
rtment1of Computer Science & Engineering, BVCOE, New Delhi
Table of Contents

● Introduction
● Literature Review
● Objectives
● Minor Outcome Results
● Research Methodology
● Result Analysis

rtment2of Computer Science & Engineering, BVCOE, New Delhi


Introduction
● Women Inclusion and Safety in Society
1. The inclusion and safety of women in modern society is a problem
that over the years has become increasingly important in all
countries, leading to a large number of awareness campaigns and
social movements.
2. Several investigations were held mainly focusing on interpreting the
behaviors of people, also according to their geopolitical situations,
for preventing and reducing discrimination situations against
women.
3. Women often feel aggressive harassment which includes trolling,
stalking and making sexist remarks.
ent of3Computer Science & Engineering, BVCOE, New Delhi
Introduction
● Role of Twitter
1.Twitter is the most popular platform among numerous microblogging
platforms that are available on the internet where people share their
opinions, ideas, and messages with a limit of 280 characters called
“tweets”.
2.Twitter provides a platform to spread information worldwide. Millions of
tweets are posted on Twitter every day from all over the world.
3.Each tweet expresses an opinion and opinion on the topic. These tweets
help to carry out social awareness, product promotion, social issues, and
health issues.
4.Therefore, twitter conversations can be used to analyze the sentiments of
people about women’s
4 Department issuesScience
of Computer which&are very critical
Engineering, issues
BVCOE, in Delhi
New our society.
Literature Review
Sr.
Author Techniques / Methodology Limitations / Comparison
No.

BERT models (SpanBERT, BETO, BERT Models are used to analyze the
1. Mamgain et. al [1] multilingual BERT) smaller dataset of Spanish tweets.

Only one classification algorithm was


2. B. Gupta et. al [2] Logistic Regression algorithm was used used to find the results attaining 80%
accuracy.

Machine learning algorithm ANN model ANN attained 83% accuracy and
3. Reyes-Menendez et. al [3] was used dataset is also quite small.

Support Vector Machine is used for


D. Kumar and S. Aggarwal
4. Supervised Vector Machine (SVM) data classification. Data analyses were
et. al [4] done on a smaller dataset.

Python modules such as textblob and No classifier was used to analyze the
5. Sahayak et. al [5] Tweepy were used. accuracy of the model.
Problem Statement

To apply data analytics to Twitter data to evaluate women’s safety and inclusion in
modern society through statistics and finding correlations and relationships between
the most common hashtags, the most frequent occurring keywords and time-
location-based sentiment analysis.

6 Computer Science & Engineering, BVCOE, New Delhi


ment of
Objective
● To highlight the rising issue of inclusion and safety of women in modern society by analyzing the
opinions, behavior and sentiments of people through tweets.
● To investigate the assorted wrongdoings against the ladies by making use of preferred and powerful
social media data with the help of specific keywords and hashtags.
● To show true insights and educate people about the sensational expansion in the quantity of
wrongdoings against women in the form of a web application.

7 Computer Science & Engineering, BVCOE, New Delhi


ment of
Minor Outcome Results
❖ Table 1 shows that women tend to use
Twitter to report and share their stories
related to crimes they face.

❖ Table 2 shows the different hashtags used


in the tweets all related to women
wanting freedom of speech and right to
justice.

❖ Me Too movement gave a tremendous


boost in women to raise issues regarding
the topic of our project. Table 1: Most frequent words Table 2: Most common hashtags
Minor Outcome Results
According to VADER, there are 15054 tweets, of which:

● 3547 are positive


● 9330 are negative
● 2177 tweets are neutral

● This pie chart obtained after sentiment analysis of the twitter


data after removing most frequent words using TextBlob
indicates the percentages of positive, negative and neutral
tweets on women.

● It has been observed that there were 24.4% positive (23.6%


before), 20.3% negative (62% before) and 55.2% (14.5%)
neutral comments on women.
Minor Outcome Results
● We applied the different types of the
Naive Bayes classifier to the model
to evaluate the best accuracy among
the three types.

● Best accuracy among the three is of


Bernoulli, hence, our chosen type of
classifier is Bernoulli Naive Bayes
with 91.43% accuracy.
Methodology
1. Data Collection: Twitter Scraper and the data world website is used for collecting
the dataset of tweets based on the topic of women inclusion and safety.
2. Preprocessing: Once tweets are collected, tweets are pre-processed to remove
unnecessary noise.
3. Sentiment Analysis: With the help of Text Blob and VADER labeled each tweet as
Positive, Negative, and Neutral based on the polarity of each tweet. Also, an in-
depth analysis of sentiments will be analyzed with the account of different emotions
in the tweets using Text_to_Emotion library.
4. Classifiers: The model goes through different types of classifiers to get the most
accurate model, hence, comparative analysis is done between Support Vector
Machine (SVM), Random Forest, Naive Bayes and Logistic Regression models.
5. Visualization: The model is visualized by using Matplotlib to get the confusion
matrix, ROC curve and training and validation accuracy. An in-depth visualization
will be done with respect to geographical, longitudinal and intersectional analysis of
data.
6. Model Deployment: The model will be deployed into an existing production
environment to create a web application on it using Streamlit to make it easier to
analyze and visualize.

11Computer Science & Engineering, BVCOE, New Delhi


ment of
Result Analysis
Geographical Analysis

❖ The map shows the largest frequency of tweets in North America continent.
❖ The two countries who are vocal and promote freedom of speech are United States and India.

Map and bar graph showing frequency of tweets


Result Analysis
Emotion Analysis

❖ The bar graph shows the frequency of the emotions present in the text such as Happy, Fear, Surprise,
Sad and Angry.
❖ We found that there was a positive result with Happy emotion being the maximum with 4947
occurrences and Angry being the minimum with 869 occurrences.

Bar graph showing emotion analysis using


text2emotion
Result Analysis
Comparative Analysis of Different Classifiers

❖ The accuracy of Naive Bayes is 90.77%, Logistic Regression is 95.62%, Random Forest is 96.28% and
Support Vector Machine is 96.41%.

❖ Therefore, the best accuracy amongst all is of Support Vector Machine classifier.

Accuracy Table of different classifiers on 15,054 tweets Confusion Matrix of Support Vector Machine
Result Analysis
● We gathered a vast amount of tweets about women’s safety and inclusion in the society from
the desired dataset and then looked at the most popular hashtags and keywords and the
relationships between them.
● We employed a sentiment and emotion analysis procedure to highlight the favorable or
unfavorable inclination of tweets with special attention to the time of the year and geopolitical
regions of origin.
● We found that a significant number of them are linked to public awareness campaigns and
movements such as Me Too that promote women's participation and promptly bring attention to
violent situations in accordance with worldwide trends.
● We concentrated on the sentiment of tweets and found the reasons behind them which
demonstrated a distinct perspective about the social and political landscape in various nations
References
[1] N. Mamgain, E. Mehta, A. Mittal and G. Bhatt, “Sentiment analysis of top colleges in India using Twitter data”, 2016
International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT),
2016.
[2] B. Gupta, M. Negi, K. Vishwakarma, G. Rawat, and P. Badhani (2017). “Study of Twitter sentiment analysis using
machine learning algorithms on Python”. International Journal of Computer Applications, 2017.
[3] A. Reyes-Menendez, J. R. Saura, and C. Alvarez Alons. “Understanding World Environment Day user opinions in
Twitter: A topic-based sentiment analysis approach”. International journal of environmental research and public health,
2018 .
[4] D. Kumar and S. Aggarwal. “Analysis of Women Safety in Indian Cities Using Machine Learning on Tweets”, 2019
Amity International Conference on Artificial Intelligence (AICAI), 2019.
[5] V. Sahayak, V. Shete, and A. Pathan (2015). “Sentiment analysis on twitter data. International Journal of Innovative
Research in Advanced Engineering (IJIRAE)”(2020).
[6] D Swapna, Jampana Ashrita, Karpe Ashwini, Talasila Bindhu Bhargavi , “Analysis of Women Safety in Indian Cities
Using Twitter Data.” Journal Of Composition Theory (2021).
epartment of Computer Science & Engineering, BVCOE, New Delhi
16
References
[7] Y Md, Riyazuddin & Sriram, G & Vaibhav, P & Vikranth, I. (2020). Utilization Of Support Vector Machine for
Analyzing Women Safety in Indian States. International Journal of Grid and Distributed Computing, 2020.
[8] Raparthi Shravya, Dr.P. Neelakantan, “Women Protection Analysis Based On Twitter Data Using ML” European
Journal of Molecular & Clinical Medicine, ISSN 2515-8260, 2020.
[9] B. Durga Bhavani, S. Vaishnavi, T. Akshara, S. Vaishnavi, V. Harini, “Machine Learning Application: The Role of
Social Media in Promoting the Safety of Women in Indian Cities”, Journal of Cardiovascular Disease Research, 2023.
[10] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, Rebecca Passonneau, “Sentiment Analysis of Twitter Data”,
Department of Computer Science, Columbia University, New York, NY 10027 USA, 2022.
[11] Ranjitha, Pradeep Nayak , Vedanth M, Mahantesh G, Namitha D, “Sentiment Classification on Women Safety across
Indian Cities Based on Twitter Data using NLP and Machine Learning”, Alvas Institute of Engineering and Technology,
Moodbidri, Karnataka, India, 2022.
[12] Pandya, Rahul, and Sujal Charak. “Polarity Testing and Analysis of Tweets in Twitter Using Tweepy.” International
Journal of Computational Research, 2021.
Department of Computer Science & Engineering, BVCOE, New Delhi
17

You might also like