You are on page 1of 10

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi-590018

SYNOPSIS OF PROJECT
“Detection of Malicious Bots in Twitter Network”
Submitted in the partial fulfillment of the requirements for the
award of

BACHELOR OF ENGINEERING DEGREE


in
COMPUTER SCIENCE & ENGINEERING
Submitted by

Akash J 4AD20CS007
Ravi Kumar 4AD20CS071
Shishir Somapur 4AD20CS082
Sumit Rathod 4AD20CS090

Under the guidance of


Mr. Sandesh R
Assistant Professor
Department of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ATME College of Engineering,


13th Kilometer, Mysore-Kanakapura-Bangalore Road
Mysore-570028
ABSTRACT

Malicious social bots generate fake tweets and automate their social relationships either by pretending
like a follower or by creating multiple fake accounts with malicious activities. Moreover, malicious social
bots post shortened malicious URLs in the tweet in order to redirect the requests of online social
networking participants to some malicious servers. Hence, distinguishing malicious social bots from
legitimate users is one of the most important tasks in the Twitter network. In this project we are detecting
Twitter bots within the network using machine learning algorithms, namely Naive Bayes, Random
Forest, and Decision Trees. We collected and preprocessed a comprehensive dataset of bot and genuine
user profiles, extracting features like posting frequency, content patterns, and follower relationships for
classification. Through extensive experimentation, we compared the accuracy, precision, recall, and F1 -
score of these algorithms.
1. INTRODUCTION

Malicious social bots are computer programs that act like real people on online social networks, such as
Twitter. They do bad things like sending lots of spam, creating fake identities, tricking others into giving
them good ratings, and trying to steal personal information through phishing. On Twitter, people often
use a service to make long web links shorter because tweets can only be 140 characters long. But these
sneaky bots take advantage of this by sharing short links that lead to harmful websites. They also make
fake tweets, pretend to be followers, or create many fake accounts to do harmful things. They even put
short harmful links in their tweets to take people to bad websites. So, telling these bad bots apart from
real users is crucial for Twitter's safety.
Identifying malicious social bots on Twitter is a big challenge. These tricky bots pretend to be regular
people and do harmful things that can make the platform less safe. People often use a service to make
long web links shorter when they share them in tweets because tweets can't be very long. But these sneaky
bots use this to share short links that lead to harmful websites. They also make up fake tweets, pretend
to be friends, or create fake accounts to do bad stuff. Sometimes, they even put short harmful links in
their tweets to take people to harmful websites. It's essential to find and stop these bad bots to keep
Twitter safe for everyone. Basically, picking up the best solution is a trade-off between the performance
of the software component and the hardware capabilities. Optimum parameters tuning is required. During
the indoor movement of a visually impaired person, one of the main objectives for the assisted system is
to automatically detect and recognize objects or obstacles followed by an acoustic alert.
The project extends beyond academic exploration, as it directly addresses real-world concerns. The
presence of malicious bots on Twitter not only undermines user trust but also has broader implications
for the spread of misinformation and the security of online communities. By developing a reliable bot
detection system, this research contributes to enhancing the overall cybersecurity of the Twitter network
and other online social platforms. Moreover, it aligns with the broader industry and societal need to
combat the proliferation of harmful automated accounts.
The project involves a rigorous process of data collection, preprocessing, feature extraction, and the
implementation of machine learning algorithms. Through systematic experimentation, we intend to
assess the effectiveness of Naive Bayes, Random Forest, and Decision Trees in distinguishing between
malicious social bots and authentic Twitter users. Performance metrics such as accuracy, precision,
recall, and F1-score will serve as critical benchmarks to evaluate the algorithms' capabilities.
In conclusion, this project's scope encompasses the development of a bot detection system tailored to the
unique challenges posed by Twitter's network dynamics and user behaviors. The insights gained from
this research have far-reaching implications for online security and trustworthiness, offering practical
applications for social media platforms and cybersecurity agencies. As we delve deeper into the project,
we aim to contribute valuable solutions to the ongoing battle against malicious social bots, ultimately
fostering a safer and more reliable online environment for all users.
The proposed system can solve the above problems of both the systems. Here, first the image of the
attendee and their minimal information like ID number and name is feeded to the model and is recorded
into the database. Later when the same person comes in view of the camera, the respective information
about the person is entered into the spreadsheet automatically. So the time consumed is minimized and
no proxy can be entertained. The error rate is also less compared to other existing systems.

2. LITERATURE SURVEY
[1] “Detection of Malicious Social Bots using ML technique in Twitter Network” [2022]
Authors: V N P Sai Siri Dantu, Jhansi Devi Telu, Padma Sree Kuncham, Gurudatta Pilla

This paper aimed to design a framework by considering the features set to be evaluated the trust value of
each online social network account and to detect the malicious social bots in the Twitter network
effectively and efficiently. The authors discusses the use of deep-learning algorithms to extract hidden
patterns in text for spam detection on Twitter. The paper focus on detecting malicious social bots based
on the LA model with the Naive Bayes algorithm with URL based features. It presents a framework that
combines text-based and metadata-based features to detect malicious accounts on Twitter. The proposed
system uses the LA-MSBD (Learning Automata Malicious Social Bot Detection) algorithm by
integrating a Naïve-Bayes algorithm model with a set of URL-based feature extraction to distinguish
between malicious bots and legitimate users. In this research work, it is an extension for the existing
system that is based on trust computational model of direct and indirect trust and the proposed system is
an enhancement of the existing system. It also mentions the prevalence of automated bot accounts on
Twitter and the potential benefits of using Twitter bots for customer support.

[2] “Random Forest Twitter Bot Classifier” [2019]


Authors: James Schnebly and Shamik Sengupta

The paper addresses the issue of detecting Twitter bot accounts, which are automated accounts that
spread malicious content or generate fraudulent popularity for political and social figures. The authors
propose a set of attributes for a Random Forest classifier that achieves high accuracy (90.25%) and
generalizability in detecting bot accounts. The paper provides a practical solution for detecting Twitter
bot accounts, which can help in combating the spread of malicious content and fraudulent popularity on
social media platforms. The proposed Random Forest classifier, with its high accuracy and
generalizability, can be deployed directly after training to accurately identify and label bot accounts. By
using a combination of accessible features from the Twitter API and derivative ratio features, the
classifier offers valuable insights into the behavior and characteristics of bot accounts. The use of the
Random Forest algorithm helps prevent overfitting and creates a model that can be effectively used for
detecting bot accounts on Twitter. The findings of this paper can be applied by Twitter and other social
media platforms to develop more effective strategies for identifying and removing bot accounts, thereby
improving the overall user experience and reducing the spread of false information.

[3] “Detection of Malicious Social Bots using Learning Automata with URL features in Twitter
Network” [2022]
Authors: M. Divya, D. Abhishek, K. Naresh, P. Sparsha

A Learning Automata-based Malicious Social Bot Detection (LA-MSBD) model is proposed by


integrating a trust computation model with URL-based features for identifying trustworthy participants
(users) in the Twitter network. In this the authors will be giving 13 parameters as input to the SVM
algorithm which process these inputs and return a single digit either 0 or 1. Experimentation has been
performed on two Twitter data sets, and the results illustrate that the proposed algorithm achieves
improvement in precision, recall, F-measure, and accuracy compared with existing approaches for
MSBD. The proposed LA-MSBD algorithm achieves the advantages of incremental learning. Two
Twitter data sets are used to evaluate the performance of our proposed LA-MSBD algorithm. The
experimental results show that the proposed LA-MSBD algorithm achieves upto 7% improvement of
accuracy compared with other existing algorithms. The results illustrate that the proposed algorithm
achieves improvement in precision, recall, Fmeasure, and accuracy compared with existing
approaches for MSBD.

[4] “Twitter bot detection using supervised machine learning” [2021]


Authors: A Ramalingaiah1, S Hussaini1, S Chaudhari

The paper educates us about Bots, Bots can be categorized as good or bad, with bad bots being
responsible for malicious activities such as impersonating human behavior, attacking IoT devices, and
exploiting performance. Detecting and preventing the activities of these bots is crucial, especially for
social media users who are more vulnerable to data breaches and manipulation. Supervised machine
learning techniques are used to detect Twitter bots, comparing the accuracy of different algorithms with
a classifier using a Bag of Bots word model. In this paper the dataset used for identifying bots contains
various attributes like URL, description, friends, followers, etc. The dataset is preprocessed by removing
data imbalance and applying the Spearman coefficient for feature independence. Decision tree, logistic
regression, K nearest neighbors, and Naïve Bayes algorithms are implemented to identify bots, and the
algorithm with the highest accuracy is tested for real-time data. Among different train-test data sets, the
90%-10% split yields the highest accuracy and AUC scores, while the 80%-20% split is considered
optimal due to its close accuracy and AUC scores to the 90%-10% split, low variance, and computational
efficiency for training moderately large datasets.

[5] “Detection of Malicious Social Bots Using Learning Automata with URL Features
in Twitter Network” [2020]
Authors: Rashmi Ranjan Rout, Greeshma Lingam, and D. V. L. N. Somayajulu

Malicious social bots, which impersonate real users, engage in a range of harmful activities, including the
dissemination of spam content and phishing attacks through shortened URLs. To combat this threat, the paper
presents the Learning Automata-based Malicious Social Bot Detection (LA-MSBD) algorithm. This algorithm
integrates a trust computation model with URL-based features to identify trustworthy users on Twitter. The trust
model incorporates two key parameters: direct trust, calculated using Bayes' theorem, and indirect trust,
determined through the Dempster-Shafer theory (DST). The paper's experimental results, based on two Twitter
datasets, demonstrate the superior performance of the LA-MSBD algorithm compared to existing methods. This
approach not only enhances precision, recall, F-measure, and accuracy in malicious social bot detection but also
offers a promising solution to the challenge of identifying and mitigating these threats within the Twitter network.

3. OBJECTIVES

 Studying and Understanding the existing system for Bot detection in Twitter.

 Identify and acquire structured datasets specifically tailored for training and evaluating machine
learning models to detect malicious bots on Twitter.

 Investigate the effectiveness of machine learning algorithms in identifying malicious bots on


Twitter.

 Explore feature engineering techniques to enhance model precision in bot classification.

 Contribute to a safer Twitter environment by developing robust Random Forest models for bot
detection.

4. SYSTEM REQUIREMENTS

4.1 SOFTWARE REQUIREMENTS:


Software required in development is as listed below
• Operating System: Windows XP version & Higher
• Libraries: pandas, seaborn, sklearn, numpy, matplot etc..
• Environment: Google Colab

4.2 HARDWARE REQUIREMENTS:


Hardware required to develop the software is as listed below

• Processor: i5 or equivalent
• Processor Speed: 2GHz
• RAM: 4GB +
• Hard disk space: 40GB +

5. PROBLEM STATEMENT

Malicious social bots generate fake tweets and automate their social relationships either by pretending
like a follower or by creating multiple fake accounts with malicious activities. Moreover, malicious social
bots post shortened malicious URLs in the tweet in order to redirect the requests of online social
networking participants to some malicious servers. Hence, distinguishing malicious social bots from
legitimate users is one of the most important tasks in the Twitter network.

6. EXISTING SYSTEM

The Existing system use a combination of text-based and metadata-based features, such as URL-based
attributes, user descriptions, friends, and followers, to distinguish between malicious bot accounts and
legitimate users. Supervised learning is the common approach, where models are trained on labeled
datasets to identify patterns and characteristics of bot behavior. Data preprocessing steps are employed
to address issues like data imbalance and feature independence. Evaluation metrics like accuracy,
precision, recall, F-measure, and AUC are used to assess the performance of these detection algorithms,
aiming for high accuracy and generalizability. Some models, like LA-MSBD, offer advantages in
incremental learning, potentially adapting to evolving bot behaviors. Real-time testing is also considered,
indicating practical applicability for ongoing bot detection on Twitter. The overarching goal is to enhance
social media security, particularly on Twitter, by effectively identifying and countering malicious bots
that engage in harmful activities, including impersonating human behavior and spreading false
information.
6.1 DISADVANTAGES OF EXISTING SYSTEM
 Existing systems often suffer from complexity and interpretability issues.
 The existing system lacks efficient methods for bot detection.

7. PROPOSED SYSTEM

Developing a Twitter bot detection system that leverages machine learning techniques and behavioral
analysis to effectively distinguish between legitimate and malicious accounts. Key steps include data
collection, preprocessing, feature extraction, and training machine learning models, including Random
Forest and Decision Tree. The system will focus on identifying bot-like behaviors, such as spam retweets,
excessive link sharing, and rapid account following. Evaluation will be based on precision, recall, and
F1-score metrics. By integrating these components, our system aims to enhance Twitter's security by
effectively identifying and mitigating malicious bot accounts, ultimately improving the user experience
and reducing the spread of false information.

7.1 ADVANTAGES OF PROPOSED SYSTEM


 High Accuracy: Random Forest is known for its high accuracy in classification tasks. It can
effectively distinguish between legitimate users and malicious bots
 Robustness: Random Forest is less prone to overfitting compared to some other machine learning
algorithms. This makes it a robust choice for handling noisy and complex data.
 Feature Importance: Random Forest helps you understand which features contribute most to
the classification. This can be valuable for identifying the key characteristics of malicious bots.
 User Trust: Effective bot detection helps maintain user trust in the platform, as it reduces the
presence of spammy and harmful content.

8. METHODOLOGY
Data Collection:
Gather a dataset of tweets and user profiles, including both legitimate and malicious accounts.
Data Preprocessing:
Clean and preprocess the collected data, including text and user profile information.
Feature Extraction:
Extract features that may be indicative of bot behavior, such as posting frequency, account creation date,
and content analysis.
Machine Learning Models:
Train machine learning models (e.g., Random Forest, Decision Tree, Naïve Bayees) using labeled data
to classify accounts as legitimate or malicious.
Behavior Analysis:
Analyze the behavioral patterns of accounts to identify bot-like actions, such as retweeting spam, posting
excessive links, and following a large number of accounts in a short time.
Evaluation:
Evaluate the performance of the bot detection system using metrics like precision, recall, and F1 -score.

Fig 1: System Architecture of Detection of Malicious Bots in Twitter Network

9. EXPECTED OUTCOME

The expected outcome of the proposed project, focused on detecting and mitigating malicious social bots
in Twitter using the Random Forest algorithm, encompasses several key objectives. Firstly, it aims to
develop a robust and highly accurate bot detection system capable of distinguishing legitimate users from
malicious bots. Successful implementation should result in a marked reduction in malicious bot activities
on Twitter, including the dissemination of fake news and spam, ultimately fostering a safer and more
enjoyable user experience.
REFERENCES

[1] V N P Sai Siri Dantu, Jhansi Devi Telu, Padma Sree Kuncham, Gurudatta Pilla: “Detection of
Malicious Social Bots using ML technique in Twitter Network” [2022]
[2] James Schnebly and Shamik Sengupta: “Random Forest Twitter Bot Classifier” [2019]
[3] M. Divya, D. Abhishek, K. Naresh, P. Sparsha: “Detection of Malicious Social Bots using
Learning Automata with URL features in Twitter Network”
[4] A Ramalingaiah1, S Hussaini1, S Chaudhari: “Twitter bot detection using supervised machine
learning” [2021]
[5] Rashmi Ranjan Rout, Greeshma Lingam, and D. V. L. N. Somayajulu: “Detection of Malicious
Social Bots Using Learning Automata with URL Features in Twitter Network” [2020]
[6] https://www.kaggle.com
[7] https://www.ieee.org/
[8] Sk-learn RandomForestClassifier https://scikit-learn.org/
[9] Decision Tree https://machinelearningmastery.com/implement-decision-tree-algorithm-
scratch-python
[10] RandomForestTwitterBotClassifier https://ieeexplore.ieee.org/abstract/document/8666593
[11] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[12] Scikit-learn.org. (2018). 1.13. Feature selection scikit-learn 0.20.2 documentation. [online]
Available at: http://scikit-learn.org/stable/modules/feature selection.html [Accessed 20 Dec.
2018].
[13] The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race, S.
Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. WWW ’17 Proceedings of the 26th
International Conference on World Wide Web Companion, 963-972, 2017
[14] Techterms.com.(2018).CaptchaDefinition.[online]Availableat:https://techterms.com/definition/
captcha [Accessed 20 Dec. 2018].

Signature of Signature of
Guide Coordinator
Guide Name Mr. Sandesh R Name of Mr. Anil Kumar C. J.
Coordinator
Designation Assistant Professor Designation Associate Professor

You might also like