Professional Documents
Culture Documents
SYNOPSIS OF PROJECT
“Detection of Malicious Bots in Twitter Network”
Submitted in the partial fulfillment of the requirements for the
award of
Akash J 4AD20CS007
Ravi Kumar 4AD20CS071
Shishir Somapur 4AD20CS082
Sumit Rathod 4AD20CS090
Malicious social bots generate fake tweets and automate their social relationships either by pretending
like a follower or by creating multiple fake accounts with malicious activities. Moreover, malicious social
bots post shortened malicious URLs in the tweet in order to redirect the requests of online social
networking participants to some malicious servers. Hence, distinguishing malicious social bots from
legitimate users is one of the most important tasks in the Twitter network. In this project we are detecting
Twitter bots within the network using machine learning algorithms, namely Naive Bayes, Random
Forest, and Decision Trees. We collected and preprocessed a comprehensive dataset of bot and genuine
user profiles, extracting features like posting frequency, content patterns, and follower relationships for
classification. Through extensive experimentation, we compared the accuracy, precision, recall, and F1 -
score of these algorithms.
1. INTRODUCTION
Malicious social bots are computer programs that act like real people on online social networks, such as
Twitter. They do bad things like sending lots of spam, creating fake identities, tricking others into giving
them good ratings, and trying to steal personal information through phishing. On Twitter, people often
use a service to make long web links shorter because tweets can only be 140 characters long. But these
sneaky bots take advantage of this by sharing short links that lead to harmful websites. They also make
fake tweets, pretend to be followers, or create many fake accounts to do harmful things. They even put
short harmful links in their tweets to take people to bad websites. So, telling these bad bots apart from
real users is crucial for Twitter's safety.
Identifying malicious social bots on Twitter is a big challenge. These tricky bots pretend to be regular
people and do harmful things that can make the platform less safe. People often use a service to make
long web links shorter when they share them in tweets because tweets can't be very long. But these sneaky
bots use this to share short links that lead to harmful websites. They also make up fake tweets, pretend
to be friends, or create fake accounts to do bad stuff. Sometimes, they even put short harmful links in
their tweets to take people to harmful websites. It's essential to find and stop these bad bots to keep
Twitter safe for everyone. Basically, picking up the best solution is a trade-off between the performance
of the software component and the hardware capabilities. Optimum parameters tuning is required. During
the indoor movement of a visually impaired person, one of the main objectives for the assisted system is
to automatically detect and recognize objects or obstacles followed by an acoustic alert.
The project extends beyond academic exploration, as it directly addresses real-world concerns. The
presence of malicious bots on Twitter not only undermines user trust but also has broader implications
for the spread of misinformation and the security of online communities. By developing a reliable bot
detection system, this research contributes to enhancing the overall cybersecurity of the Twitter network
and other online social platforms. Moreover, it aligns with the broader industry and societal need to
combat the proliferation of harmful automated accounts.
The project involves a rigorous process of data collection, preprocessing, feature extraction, and the
implementation of machine learning algorithms. Through systematic experimentation, we intend to
assess the effectiveness of Naive Bayes, Random Forest, and Decision Trees in distinguishing between
malicious social bots and authentic Twitter users. Performance metrics such as accuracy, precision,
recall, and F1-score will serve as critical benchmarks to evaluate the algorithms' capabilities.
In conclusion, this project's scope encompasses the development of a bot detection system tailored to the
unique challenges posed by Twitter's network dynamics and user behaviors. The insights gained from
this research have far-reaching implications for online security and trustworthiness, offering practical
applications for social media platforms and cybersecurity agencies. As we delve deeper into the project,
we aim to contribute valuable solutions to the ongoing battle against malicious social bots, ultimately
fostering a safer and more reliable online environment for all users.
The proposed system can solve the above problems of both the systems. Here, first the image of the
attendee and their minimal information like ID number and name is feeded to the model and is recorded
into the database. Later when the same person comes in view of the camera, the respective information
about the person is entered into the spreadsheet automatically. So the time consumed is minimized and
no proxy can be entertained. The error rate is also less compared to other existing systems.
2. LITERATURE SURVEY
[1] “Detection of Malicious Social Bots using ML technique in Twitter Network” [2022]
Authors: V N P Sai Siri Dantu, Jhansi Devi Telu, Padma Sree Kuncham, Gurudatta Pilla
This paper aimed to design a framework by considering the features set to be evaluated the trust value of
each online social network account and to detect the malicious social bots in the Twitter network
effectively and efficiently. The authors discusses the use of deep-learning algorithms to extract hidden
patterns in text for spam detection on Twitter. The paper focus on detecting malicious social bots based
on the LA model with the Naive Bayes algorithm with URL based features. It presents a framework that
combines text-based and metadata-based features to detect malicious accounts on Twitter. The proposed
system uses the LA-MSBD (Learning Automata Malicious Social Bot Detection) algorithm by
integrating a Naïve-Bayes algorithm model with a set of URL-based feature extraction to distinguish
between malicious bots and legitimate users. In this research work, it is an extension for the existing
system that is based on trust computational model of direct and indirect trust and the proposed system is
an enhancement of the existing system. It also mentions the prevalence of automated bot accounts on
Twitter and the potential benefits of using Twitter bots for customer support.
The paper addresses the issue of detecting Twitter bot accounts, which are automated accounts that
spread malicious content or generate fraudulent popularity for political and social figures. The authors
propose a set of attributes for a Random Forest classifier that achieves high accuracy (90.25%) and
generalizability in detecting bot accounts. The paper provides a practical solution for detecting Twitter
bot accounts, which can help in combating the spread of malicious content and fraudulent popularity on
social media platforms. The proposed Random Forest classifier, with its high accuracy and
generalizability, can be deployed directly after training to accurately identify and label bot accounts. By
using a combination of accessible features from the Twitter API and derivative ratio features, the
classifier offers valuable insights into the behavior and characteristics of bot accounts. The use of the
Random Forest algorithm helps prevent overfitting and creates a model that can be effectively used for
detecting bot accounts on Twitter. The findings of this paper can be applied by Twitter and other social
media platforms to develop more effective strategies for identifying and removing bot accounts, thereby
improving the overall user experience and reducing the spread of false information.
[3] “Detection of Malicious Social Bots using Learning Automata with URL features in Twitter
Network” [2022]
Authors: M. Divya, D. Abhishek, K. Naresh, P. Sparsha
The paper educates us about Bots, Bots can be categorized as good or bad, with bad bots being
responsible for malicious activities such as impersonating human behavior, attacking IoT devices, and
exploiting performance. Detecting and preventing the activities of these bots is crucial, especially for
social media users who are more vulnerable to data breaches and manipulation. Supervised machine
learning techniques are used to detect Twitter bots, comparing the accuracy of different algorithms with
a classifier using a Bag of Bots word model. In this paper the dataset used for identifying bots contains
various attributes like URL, description, friends, followers, etc. The dataset is preprocessed by removing
data imbalance and applying the Spearman coefficient for feature independence. Decision tree, logistic
regression, K nearest neighbors, and Naïve Bayes algorithms are implemented to identify bots, and the
algorithm with the highest accuracy is tested for real-time data. Among different train-test data sets, the
90%-10% split yields the highest accuracy and AUC scores, while the 80%-20% split is considered
optimal due to its close accuracy and AUC scores to the 90%-10% split, low variance, and computational
efficiency for training moderately large datasets.
[5] “Detection of Malicious Social Bots Using Learning Automata with URL Features
in Twitter Network” [2020]
Authors: Rashmi Ranjan Rout, Greeshma Lingam, and D. V. L. N. Somayajulu
Malicious social bots, which impersonate real users, engage in a range of harmful activities, including the
dissemination of spam content and phishing attacks through shortened URLs. To combat this threat, the paper
presents the Learning Automata-based Malicious Social Bot Detection (LA-MSBD) algorithm. This algorithm
integrates a trust computation model with URL-based features to identify trustworthy users on Twitter. The trust
model incorporates two key parameters: direct trust, calculated using Bayes' theorem, and indirect trust,
determined through the Dempster-Shafer theory (DST). The paper's experimental results, based on two Twitter
datasets, demonstrate the superior performance of the LA-MSBD algorithm compared to existing methods. This
approach not only enhances precision, recall, F-measure, and accuracy in malicious social bot detection but also
offers a promising solution to the challenge of identifying and mitigating these threats within the Twitter network.
3. OBJECTIVES
Studying and Understanding the existing system for Bot detection in Twitter.
Identify and acquire structured datasets specifically tailored for training and evaluating machine
learning models to detect malicious bots on Twitter.
Contribute to a safer Twitter environment by developing robust Random Forest models for bot
detection.
4. SYSTEM REQUIREMENTS
• Processor: i5 or equivalent
• Processor Speed: 2GHz
• RAM: 4GB +
• Hard disk space: 40GB +
5. PROBLEM STATEMENT
Malicious social bots generate fake tweets and automate their social relationships either by pretending
like a follower or by creating multiple fake accounts with malicious activities. Moreover, malicious social
bots post shortened malicious URLs in the tweet in order to redirect the requests of online social
networking participants to some malicious servers. Hence, distinguishing malicious social bots from
legitimate users is one of the most important tasks in the Twitter network.
6. EXISTING SYSTEM
The Existing system use a combination of text-based and metadata-based features, such as URL-based
attributes, user descriptions, friends, and followers, to distinguish between malicious bot accounts and
legitimate users. Supervised learning is the common approach, where models are trained on labeled
datasets to identify patterns and characteristics of bot behavior. Data preprocessing steps are employed
to address issues like data imbalance and feature independence. Evaluation metrics like accuracy,
precision, recall, F-measure, and AUC are used to assess the performance of these detection algorithms,
aiming for high accuracy and generalizability. Some models, like LA-MSBD, offer advantages in
incremental learning, potentially adapting to evolving bot behaviors. Real-time testing is also considered,
indicating practical applicability for ongoing bot detection on Twitter. The overarching goal is to enhance
social media security, particularly on Twitter, by effectively identifying and countering malicious bots
that engage in harmful activities, including impersonating human behavior and spreading false
information.
6.1 DISADVANTAGES OF EXISTING SYSTEM
Existing systems often suffer from complexity and interpretability issues.
The existing system lacks efficient methods for bot detection.
7. PROPOSED SYSTEM
Developing a Twitter bot detection system that leverages machine learning techniques and behavioral
analysis to effectively distinguish between legitimate and malicious accounts. Key steps include data
collection, preprocessing, feature extraction, and training machine learning models, including Random
Forest and Decision Tree. The system will focus on identifying bot-like behaviors, such as spam retweets,
excessive link sharing, and rapid account following. Evaluation will be based on precision, recall, and
F1-score metrics. By integrating these components, our system aims to enhance Twitter's security by
effectively identifying and mitigating malicious bot accounts, ultimately improving the user experience
and reducing the spread of false information.
8. METHODOLOGY
Data Collection:
Gather a dataset of tweets and user profiles, including both legitimate and malicious accounts.
Data Preprocessing:
Clean and preprocess the collected data, including text and user profile information.
Feature Extraction:
Extract features that may be indicative of bot behavior, such as posting frequency, account creation date,
and content analysis.
Machine Learning Models:
Train machine learning models (e.g., Random Forest, Decision Tree, Naïve Bayees) using labeled data
to classify accounts as legitimate or malicious.
Behavior Analysis:
Analyze the behavioral patterns of accounts to identify bot-like actions, such as retweeting spam, posting
excessive links, and following a large number of accounts in a short time.
Evaluation:
Evaluate the performance of the bot detection system using metrics like precision, recall, and F1 -score.
9. EXPECTED OUTCOME
The expected outcome of the proposed project, focused on detecting and mitigating malicious social bots
in Twitter using the Random Forest algorithm, encompasses several key objectives. Firstly, it aims to
develop a robust and highly accurate bot detection system capable of distinguishing legitimate users from
malicious bots. Successful implementation should result in a marked reduction in malicious bot activities
on Twitter, including the dissemination of fake news and spam, ultimately fostering a safer and more
enjoyable user experience.
REFERENCES
[1] V N P Sai Siri Dantu, Jhansi Devi Telu, Padma Sree Kuncham, Gurudatta Pilla: “Detection of
Malicious Social Bots using ML technique in Twitter Network” [2022]
[2] James Schnebly and Shamik Sengupta: “Random Forest Twitter Bot Classifier” [2019]
[3] M. Divya, D. Abhishek, K. Naresh, P. Sparsha: “Detection of Malicious Social Bots using
Learning Automata with URL features in Twitter Network”
[4] A Ramalingaiah1, S Hussaini1, S Chaudhari: “Twitter bot detection using supervised machine
learning” [2021]
[5] Rashmi Ranjan Rout, Greeshma Lingam, and D. V. L. N. Somayajulu: “Detection of Malicious
Social Bots Using Learning Automata with URL Features in Twitter Network” [2020]
[6] https://www.kaggle.com
[7] https://www.ieee.org/
[8] Sk-learn RandomForestClassifier https://scikit-learn.org/
[9] Decision Tree https://machinelearningmastery.com/implement-decision-tree-algorithm-
scratch-python
[10] RandomForestTwitterBotClassifier https://ieeexplore.ieee.org/abstract/document/8666593
[11] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[12] Scikit-learn.org. (2018). 1.13. Feature selection scikit-learn 0.20.2 documentation. [online]
Available at: http://scikit-learn.org/stable/modules/feature selection.html [Accessed 20 Dec.
2018].
[13] The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race, S.
Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. WWW ’17 Proceedings of the 26th
International Conference on World Wide Web Companion, 963-972, 2017
[14] Techterms.com.(2018).CaptchaDefinition.[online]Availableat:https://techterms.com/definition/
captcha [Accessed 20 Dec. 2018].
Signature of Signature of
Guide Coordinator
Guide Name Mr. Sandesh R Name of Mr. Anil Kumar C. J.
Coordinator
Designation Assistant Professor Designation Associate Professor