20011F0008 Samba PRC3

JNTUH College of Engineering Hyderabad
Spammer Detection in Twitter

Presented By Guided By
Goli Samba Kumar I.LAKSHMI MANIKYAMBA
20011F0008 Associate Professor
MCA 4th SEM CSE Department
Computer Science Department May-2022

Contents
• Brief Introduction
• Proposed System
• Tools used
• Implementaion
• Importing and Feature exatraction
• Sentiment analysis
• Modeling and evaluation
• Results
• Conclusion
References
BRIEF INTRODUCTION
Twitter turned into a target platform for the spammers to disperse the
huge amount of irrelevant and harmful information.
Spammers find these platforms easily accessible to trap users in

malicious activities by posting spam tweets.The main purpose of this
work is to distinguish whether the tweet is ‘spam’ or ‘ham’ and
evaluation of the emotion of the tweet.
PROPOSED SYSTEM
The analysis also shows that several machine learning-based techniques
can be effective for identifying spam on Twitter.
classification phase of model will decide whether the testing tweets are
spam or ham tweets.
It Easily predicts to determine the user sending the spam or quality
content in the twitter
Tools used…
i. Data manipulation packages (Pandas and Numpy..etc)
ii. Noise removal libraries (Regular expressions)
iii. Preprocess textual data NLTK (Puntuations removal, Stemming
and lemmatization)
iv. Sentiment Analysis and Classification using Textblob classifier
Implementation
•Fetching Dataset from Kaggle.
•Import libraries
•Preprocessing
•Feature Extraction
•Feature Matching
•Model Construction
•Results evaluation
Importing & Feature extraction
.
Detection of spam words
Word Frequencies¶
Sentiment analysis
Determine whether the writer's attitude towards a particular topic,
product, etc. is positive, negative, or neutral by using the textblob tool.
Modelling and evaluation
Naïve bayes :
It is an efficient classifier that is used to classify the text message as
spam message or ham message. The Naive Bayesian classifier is based
on probability theory. This model is used because it gives good
performance and requires less computational time for training the
model.t the probability of one attribute does not affect the probability of
the other. It is based on the bayes theorem.
Naïve bayes (continued…)
Decision tree
The decision tree is used to create a classification model based on
training data, that model can be used to predict the class label of test data
sample. The algorithm uses tree representation structure to solve
classification problem.
Decision tree(continued…)
Knn algorithm
To classify the test tweet, KNN algorithm identifies, k closest samples
that are similar to test sample. The k nearest neighbours are identified by
similarities of data sample. The data sample similarities are computed
with some set of similarity measures.
Knn algorithm(continued….)
Random forest
Random Forest is a very flexible machine learning classifier that consists
of a collection of tree structured classifiers. It randomly selects the
features to construct a collection of decision trees. As we realise that a
forest is composed of trees, more trees mean stronger forests
Random forest(continued….)
Results…
Naïve Random knn Decision
bayes forest tree
metrics
precision 0.92 0.84 0.72 0.82

recall 0.72 0.79 0.67 0.78
F1 score 0.80 0.80 0.69 0.78
Accuracy 72.10 82.00 70.60 80.64
Random forest gives the better results and better performance

conclusion
•To save the reputation of the twitter and awareness of the users to save
from spam tweets
•This research deals with the study of spam classification techniques in
twitter.
•The results display the highest accuracy 82.0 percent provided by
Random Forest Classifier which gives the effective model for detecting
the spam.
References
1 C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@spam: The underground on 140 characters or
less,” in proc. ACM conf. Computer communication security, 2010, pp. 27-37.
2 Y Boshmaf, I.Muslukhov, K Beznoson, and M. Ripeanu, “ design and analysis of social botnet,”
computer networks, vol. 57, no. 2, pp. 556- 578, 2013.
3.Ashwini Bhangare1, Smita Ghodke2, Kamini Walunj 3, Utkarsha Yewale4 “ twitter spammer
detection,” International Research Journal of Engineering .

20011F0008 Samba PRC3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20011F0008 Samba PRC3

Uploaded by

Copyright:

Available Formats

JNTUH College of Engineering Hyderabad

Spammer Detection in Twitter

Computer Science Department May-2022

Spammers find these platforms easily accessible to trap users in

precision 0.92 0.84 0.72 0.82

Accuracy 72.10 82.00 70.60 80.64

Random forest gives the better results and better performance

You might also like