Predicting Cyberbullying in Social Media Using Machine Learning

PROJECT TITLE
PREDICTING CYBERBULLYING IN SOCIAL MEDIA USING

MACHINE LEARNING
ZEROTH REVIEW
PROJECT MEMBERS:
ROSHINI T S
LALITHAVANI V
SERAPHIN JAYANTHI SANTHAMARY S
UNDER THE GUIDANCE OF:

Dr. JAYABHARATHY J
Associate Professor
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
PONDICHERRY ENGINEERING COLLEGE
PUDUCHERRY
INTRODUCTION:
Cyberbullying is a form of harassment using electronic devices. Cyberbullying can occur

through SMS, Text, and apps, or online in social media, forums, or gaming where people can
view, participate in, or share content. Cyberbullying includes sending, posting, or sharing
negative, harmful, false, or mean content about someone else. It can include sharing personal
or private information about someone else causing embarrassment or humiliation.
Cyberbullying causes significant emotional and psychological distress. Some cyberbullying
crosses the line into unlawful or criminal behaviour. Recent research studies are focusing to
predict cyberbullying using various machine learning algorithms. These studies mainly focus
on text, image, video and audio. Bullying through image, video and audio are predicted and
prevented through various means. But till now, social media are rampant towards text
bullying. Predicting and classification of text as bullied with higher accuracy is required. As
per the current trend, social media users are victims to phishing links. User by providing their
personal details in such sites make themselves prone to phishing attacks. By intimating the
user about the malicious links and further options to block the sender, phishing attacks can be
reduced.
LITERATURE REVIEW:
SI TITLE AND ALGORITHM EXISISTING DATASET LIMITATIONS

N REFERENCE / SYSTEM
O AUTHOR TECHNIQUE
1. Cyberbullying Deep Neural Identification of YouTube Data and settings
Detection in Networks bullied text dataset (~54k are not clear
Social Networks posts by ~4k leading to
Using Deep users) inconsistencies
Learning Based in results.
Models; A
Reproducibility
Study
2. Unsupervised Natural Several hand Dataset from Improvement on
Cyber Bullying Language crafted features twitter tested these techniques
Detection in Processing and that are used to against other in order is
Social Networks Machine catch semantic social required to
Learning and syntactic networks like achieve better
communicational youtube results.
behavior of
potential cyber
bullies
3. Improving Naïve Bayes, To classify the Twitter Inclusion of
Cyberbullying Random Forest tweets into one dataset emotion
Detection using and J48 of four containing however, had no
Twitter Users’ categories: bully, 5453 tweets positive impact
Psychological aggressor, gathered on the detection
Features and spammer and using the model's
Machine normal. hashtag performance
Learning(2019) #Gamergate,
and manually
annotated by
human
experts
4. A Comparative Supervised ML To determine the Raw dataset Fails to identify
Analysis of and NLP efficiency of from the perpetrators
Cyberbullying Techniques different labeling Kaggle.com of cyberbullying
Detection in methods for that contains with optimal
Social Media detecting a total of accuracy
cyberbullying 12,744 data
from social points in this
media dataset and
these data
were crawled
from 50 IDs
in the
summer of
2017
5. Prediction of Character To automatically Tweets New algorithms
Cyberbullying recognition detect incidents posted in should be
Incident on algorithm of cyber bullying Twitter considered, such
Social Media on social media as deep learning
Network by analyzing and neural
posts written by networks to
bullies and improve the
victims. classifiers for
higher accuracy
6. Predicting Support Vector To classify the Data Certain data
Cyberbullying Machine tweets as bullied extracted collection
on Social Media or non-bullied from SM methods limit
in the Big Data websites by the prediction
Era Using using either model of
Machine keywords, cyberbullying to
Learning that is, specified
Algorithms: words, keywords
Review of phrases, or
Literature and hashtags or
Open Challenges by using user
(2019) profiles
EXISTING SYSTEM:
 Input data: The dataset from various social networks is fetched.
 Data Preprocessing:
Tokenization: Dividing a string of text into its component words
Stop word removal: Removing word like “of”, ”by”, ”on”
Case conversion: Capital letters to small letters
Special symbol removal: Removing characters like “!”, ”@”.
 Feature Extraction: Obtaining the necessary texts based on which classification is
done.
 Output data: Assigning labels for the classified words.
LIMITATIONS:
Though it solves some limitations faced in previous works, it still lags in some areas:
 Certain data collection methods limit the predicting model to specified keywords.
 The ratio of bullied and non-bullied post varies in a high way that leads to imbalance
class distribution in dataset.
 Incorporating new slang language like words, spellings and acronyms which
demands regular updation in training dataset.
 Social media doesn’t restrict the bullied contents before posting.
OBJECTIVES OF THE PROPOSED SYSTEM:
The objective of this project is to classify the posts containing text and emoji into bullied and
non-bullied through Support Vector Machine (SVM). Also, to detect the phishing links using
Convolutional Neural Network (CNN) that is being shared to social media users. Phishing
links trap the user’s identity, in order to prevent it, the user can block the sender based on
their choice.
Example:
HATSTAGS: #Chatroom
LINKS: https://iplogger.org/
PROBLEM DEFINITION:
Cyberbullying can occur through SMS, Text, and apps, or online in social media, forums, or
gaming where people can view, participate in, or share content. Cyberbullying includes
sending, posting, or sharing negative, harmful, false, or mean content about someone else. It
can include sharing personal or private information about someone else causing
embarrassment or humiliation. Some cyberbullying crosses the line into unlawful or criminal
behavior. Recent research studies have revealed that cyberbullying and online harassment are
considerable problems for users of social media platforms, especially young people. In order
to overcome such issues, the bullied text must be known as well as the trap through phishing
links must be intimated earlier. Thereby, to achieve this, the text and links are detected using
Support Vector Machine (SVM) and Convolutional Neural Network (CNN) algorithms
respectively.
INPUT:
The input for the system is a set of posts containing emoji, text and links obtained from
various social media.
OUTPUT:
The output of the system is to label the posts as bullied or non-bullied, in case of links it has
to specify whether it's malicious or not.
HARDWARE REQUIREMENTS:
 RAM: Minimum 2GB-4GB
 ROM:256 GB

SOFTWARE REQUIREMENTS:
 Operating System: 64 Bit(Windows 7/8/10)
 Platform: ASP.NET
 Front End: Visual Studio (Dot Net)
 Back End: SQL Server
CONCLUSION:
This study presents text classification into bullied and non-bullied followed by link
classification into malicious or non-malicious. The text classification is performed by Support
Vector Machine(SVM).The link classification is performed by Convolutional Neural
Network(CNN).
PLAN OF ACTION:
REFERENCES:
 Mohammed ali al-garadi1, Mohammad rashid hussain2 and nawsher khan2,

“Predicting Cyberbullying on Social Media in the Big Data Era Using Machine
Learning Algorithms,” Review of Literature and Open Challenges, vol. 7, pp. 4-10,
2019.
 M. Dadvar, D. Trieschnigg, and F. de Jong, “Experts and Machines against Bullies: A

Hybrid Approach to Detect Cyberbullies,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 8436 LNAI, Springer International Publishing, 2017, pp. 275–
281.
 D. Maher, “Cyberbullying: an Ethnographic Case Study of one Australian Upper

Primary School Class,” Youth Studies Australia, vol. 27, no. 4, pp. 50–57, 2018.
 M. Jiang, S. Kumar, V. S. Subrahmanian, and C. Faloutsos, ‘‘KDD 2017 tutorial:

Data-driven approaches towards malicious behavior modeling,’’ Dimensions, vol. 19,
p. 42, 2017.
 R. Slonje, P. K. Smith, and A. Frisén, ‘‘The nature of cyberbullying, and strategies for
prevention,’’ Comput. Hum. Behav., vol. 29, no. 1, pp. 26–32, 2018.
 K. Reynolds, A. Kontostathis, and L. Edwards, ‘‘Using machine learning to detect
cyberbullying,’’ in Proc. 10th Int. Conf. Mach. Learn. Appl. Workshops (ICMLA),
Dec. 2019, pp. 241–244

Predicting Cyberbullying in Social Media Using Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting Cyberbullying in Social Media Using Machine Learning

Uploaded by

Copyright:

Available Formats

PROJECT TITLE

PREDICTING CYBERBULLYING IN SOCIAL MEDIA USING

UNDER THE GUIDANCE OF:

Cyberbullying is a form of harassment using electronic devices. Cyberbullying can occur

SI TITLE AND ALGORITHM EXISISTING DATASET LIMITATIONS

 Input data: The dataset from various social networks is fetched.

OBJECTIVES OF THE PROPOSED SYSTEM:

 Mohammed ali al-garadi1, Mohammad rashid hussain2 and nawsher khan2,

 M. Dadvar, D. Trieschnigg, and F. de Jong, “Experts and Machines against Bullies: A

 D. Maher, “Cyberbullying: an Ethnographic Case Study of one Australian Upper

 M. Jiang, S. Kumar, V. S. Subrahmanian, and C. Faloutsos, ‘‘KDD 2017 tutorial:

You might also like