Professional Documents
Culture Documents
BELAGAVI- 590018
A Project Phase 1
Report on
BACHELOR OF ENGINEERINGIN
IN
COMPUTER SCIENCE AND ENGINEERING
Under the Guidance of
Mrs. SWETHA B R
Assistant Professor,
Department of Computer Science & Engineering
CERTIFICATE
Certified that the PROJECT PHASE I entitled “FAKE ACCOUNT DETECTION ON SOCILA
MEDIA” is carried out by Ms. SHRAVANI P H [4RA20CS090], Ms. SHRAVYA M KARLE
[4RA20CS091], Ms. SPOORTHI C D [4RA20CS097], Ms. SPOORTHI S [4RA21CS98] respectively,
a bonafide students of RAJEEV INSTITUTE OF TECHNOLOGY, Hassan in partial fulfilment for the
award of BACHELOR OF ENGINEERING in COMPUTER SCIENCE AND ENGINEERING of
the Visvesvaraya Technological University, Belagavi during the year 2023-2024. The Project Phase 1 report
has been approved as it satisfies the academic requirements in respect of Project Phase 1 work prescribed for the
said Degree.
1. ......................... ……………………
2. ……………..... ……………………
DECLARATION
SPOORTHI C D SPOORTHI S
4RA20CS097 4RA20CS098
Place:- Hassan
Date:- ……………
i
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful of any task would be
incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crownedour efforts with success.
We would like to express our sincere thanks to our principal Dr. Mahesh P K, Rajeev
Institute of Technology for his encouragement that motivated us for successful
completion of project Phase I.
We wish to express our gratitude to Dr. Shreeshail Matt, Head of the Department of
Information Science & Engineering for providing a good working environment and for
his constant support and encouragement.
We would also like to thank all our staffs of Information Science and Engineering
department who havedirectly or indirectly helped us in the successful completion of
this phase-1 project and also, we would like to thank our parents.
SPOORTHI C D SPOORTHI S
4RA21CS097 4RA20CS098
ii
ABSTRACT
In the current generation, online social networking (OSNs) has become more popular, and
social media is becoming more and more associated with these sites. They use OSN to
communicate with others, share news, organize events, and run their own e-business. The
strong growth of OSNs and the large number of personal information of its subscribers has
led attackers, and hypocrites to steal their information, share false news, and spread
malicious activities. Fake or man-made fake profiles designed to spread rumors, identity
theft etc. So, in this project, we are trying to propose a discovery model, which distinguishes
between fake profiles and real profiles on Twitter based on visual features such as fan
counts, friends count, status calculations and more using various machine learning methods.
The key characteristics are chosen to assess a social media profile’s authenticity.
Hyperparameters and the architecture are also covered. Finally, results are produced after
training the models. The output is therefore 0 for genuine profiles and 1 for false profiles.
When a phony profile is discovered, it can be disabled or destroyed so that cyber security
problems can be prevented. Python and the necessary libraries, such as Sklearn, Numpy,
and Pandas, are used for implementation. At the end of this study, the author will come to
the conclusion that XG Boost is the best machine learning technique for finding fake
profiles.
iii
CONTENTS
DECLARATION i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES v
1 INTRODUCTION 1-3
1.1 Motivation 1
1.2 Problem Statement 1
1.3 Objective of the project 1
1.4 Scope 2
1.5 Project Introduction 2
3.5 Methodology 8
iv
LIST OF FIGURES
v
Fake Account Detection on Social Media 2023-2024
CHAPTER -01
INTRODUCTION
1.1 Motivation:
Detecting and preventing fake accounts on social media using machine learning is
motivated by several important factors that aim to enhance the overall user experience,
maintain the integrity of online platforms, and ensure the safety of users. Fake accounts
can distort user engagement metrics, artificially inflate follower counts, and manipulate
trending topics. Machine learning helps in identifying and mitigating such manipulation,
maintaining the integrity of the platform and ensuring fair and accurate representation of
user interactions. leveraging machine learning for fake account detection aligns with the
broader goals of social media platforms to create a secure, trustworthy, and positive online
environment for users while adhering to regulatory requirements and protecting brand
reputation.
1.4 Scope:
The scope of this project involves creating and implementing a robust machine learning-
based system for the detection of fake accounts on social media platforms. The primary
focus is on developing accurate algorithms capable of analyzing user behavior,
engagement, and content patterns to identify and mitigate the presence of fraudulent
accounts. The integration of this system into existing social media platforms is essential,
ensuring a seamless and user-friendly experience. The project also addresses scalability,
aiming to handle a large volume of accounts while adapting to the evolving tactics used
by malicious actors. Real-time monitoring features will be implemented to promptly
detect and respond to potential threats, while optimization efforts will minimize false
positives and optimize human moderation resources. Compliance with relevant
regulations and the incorporation of educational measures for users further define the
scope, with the ultimate goal of fostering a secure, trustworthy, and positive online
environment
There are 229 million daily active members of Twitter and 465.1 million monthly users.
Furthermore, Facebook creates six new users per second, for a daily average of about
500,000 new users. Every day, a huge amount of information is posted on Twitter.
False profiles are frequently made under fictitious identities, and they spread defamatory
and abusive posts and images to influence society or advance anti-vaccine conspiracy
theories, among other things. Phony personas are an issue on all social media platforms
nowadays.
Most false profiles are made with spamming, phishing, and gaining more followers in
mind. The fraudulent accounts are completely capable of committing online crimes.
Fake accounts represent a serious risk, including identity theft and data breaches. When
consumers access the URLs sent by these false accounts, all user information is sent to
distant servers where it may be used against them. Furthermore, phony profiles
purportedly created on behalf of businesses or individuals can damage their reputation
and reduce the number of follows and likes they receive.
CHAPTER -02
LITERATURE SURVEY
Paper 1: Fraud Detection on Social Media using Data Analytics .Author Ms. Archna
Goyal1,
This paper has talked about a proposed approach for illuminating the fake news issues by
distinguishing the validity of news in two stages is distinguish counterfeit record
,recognize fake substance of news ,the first stage recognize the phony clients which
overlook the news that gives by fake clients however on the off chance that client isn't
phony, at that point go to the subsequent stage identify the validity of the news content by
utilizing the likeness measures and AI calculations that improve the validity than different
calculations This postulation has talked about a proposed approach for tackling the phony
news issues by identifying the believability of news in two stages
Paper 2: Machine Learning Framework for Detecting Spammer and Fake Users on
Twitter .Author : Akshatha T M
The development of successful strategies for the spam detection and fake user recognition
on Twitter, there are still many problems to further development by the researchers. The
issues are highlighted as fallow: False news identification on social media is a problem
that needs to be explored because of the serious repercussions of such news at individual
as well as different level. Another related subject that is worth exploring is the discovery
of rumor sources on social media.
The model presented in this project demonstrates that the Division of Decisions splis a
good and powerful way of binary planning on a large database. Despite the inconsistency
of the decision boundary, the Decision Tree Verification is able to distinguish between
false and real profiles with the correct truth (> 97%). This method can be extended to any
This proposed hybrid technique is used to most successful classifier neural network and
SVM. K-Medoid clustering is also used to improve the accuracy and reduce the time
complexity of the algorithm. In proposed work collected real time data set of Facebook
or Twitter from Facebook or Twitter users.
CHAPTER -03
SYSTEM ANALYSIS
3.1 Existing System:
Social media profiles and bots have been around since the advent of social media. There
is often a negative impact on them, as many of them are designed to jeopardize democracy,
cause panic attacks, disclose confidential information, affect the stock market, and wreak
havoc on the world. However, bots can also be useful for useful purposes such as
encouraging users to get shot in the flu, give earthquake warnings, health tips, share
automatic drawings, etc. Identifying bad bots can help us understand their behavior and
determine which emotional traits make them as prominent as bots. In addition, by easily
identifying Twitter accounts as bots,the public can be taught notto be a victim of bot or
malicious messages on Twitter. In addition, when bots are detected earlier, their tweets
can be quickly protected from spreading on the platform.
3.2 Disadvantages:
While fraud detection on social media using data analytics has its advantages, there are
also several disadvantages associated with existing systems. Here are some common
drawbacks:
The proposed system aims to address the challenges associated with fake account
detection on social media platforms using machine learning. The key components and
features of the proposed system are outlined below:
3.4 Advantages
1. Enhanced Security
2. User Trust and Safety
3. Integrity of User Metrics
4. Mitigation of Misinformation
5. Fair Advertising Ecosystem
6. Resource Optimization
7. Improved User Experience
8. Compliance with Regulations
3.5 Methodology
The methodology for implementing a machine learning-based fake account detection
system on social media involves several key steps. Below is a generalized methodology
that can guide the development and deployment of such a system:
Problem Definition: Clearly define the problem by specifying the objectives, scope, and
desired outcomes of the fake account detection system. Understand the potential impact
on user safety, platform integrity, and overall user experience.
Data Preprocessing: Clean and preprocess the data to handle missing values, outliers,
and inconsistencies. Normalize numerical features, encode categorical variables, and
perform any necessary transformations to make the data suitable for machine learning
models.
Model Selection: Choose appropriate machine learning algorithms based on the nature
of the problem. Common models for fake account detection include decision trees,
random forests, support vector machines, and deep learning models such as neural
networks.
Model Training: Train the selected model using the labeled dataset. Split the dataset into
training and validation sets to assess the model's performance during training. Adjust
hyperparameters and iterate on the model to improve accuracy.
Evaluation Metrics: Define evaluation metrics based on the goals of the fake account
detection system. Common metrics include accuracy, precision, recall, F1 score, and area
under the receiver operating characteristic curve (AUC-ROC).
Validation and Hyperparameter Tuning: Perform model validation using a separate test
dataset to ensure generalization to unseen data. Fine-tune hyperparameters based on
validation results to optimize the model's performance.
User Education and Reporting: Implement educational features to inform users about
the risks associated with fake accounts. Encourage users to report suspicious activities
and provide a feedback loop for continuous improvement.
Regulatory Compliance: Ensure that the fake account detection system complies with
relevant regulations and standards governing online platforms, protecting user rights and
privacy.
CHAPTER – 04
SYSTEM REQUIREMENTS
4.1 HARDWARE REQUIREMENTS
• Hard-Disk : 4O GB.
Functional requirements describe the specific features and capabilities that a system
must have to meet its intended purpose. In the context of a machine learning-based fake
account detection system on social media, the functional requirements may include:
Non-functional requirements define the characteristics and qualities that describe how a
system should behave or perform, rather than specifying specific features. In the context
of a machine learning-based fake account detection system on social media, non-
functional requirements may include:
CONCLUSION
The development of successful strategies for the spam detection and fake user recognition
on social media, there are still many problems to further development by the researchers.
The issues are highlighted as fallow: False news identification on social media is a
problem that needs to be explored because of the serious repercussions of such news at
individual as well as different level. Another related subject that is worth exploring is the
discovery of rumor sources on social media. While a few experiments focused on different
techniques have already been performed to identify the origins of misinformation, more
advanced approaches, e.g., social networkbased approaches, can be extended because of
their demonstrated efficacy. The regions in the model and data may be given various
degrees of prominence depending on their size or their particular significance in the
recognition process. For instance, using this strategy would make it easier to pinpoint
regions where extremely complex problems must be located, such as those that
occasionally arise and the latter. Despite their complexity, these hybrid models ought to
yield superior outcomes. However, occasionally combining these approaches may not
have a significant impact on the outcome. The model will then be prepared for more social
media sites like LinkedIn, Snapchat, WeChat, QQ, etc.
REFERENCES
[1] Van Der Walt, E. and Eloff, J. (2018) Using Machine Learning to Detect Fake
Identities: Bots vs Humans. IEEE Access, 6, 6540-6549.
https://doi.org/10.1109/ACCESS.2018.2796018
[2] Kudugunta, S. and Ferrara, E. (2018) Deep Neural Networks for Bot Detection.
Information Sciences, 467, 312-322.
https://doi.org/10.1016/j.ins.2018.08.019
[3] Ramalingam, D. and Chinnaiah, V. (2018) Fake Profile Detection Techniques in
Large-Scale Online Social Networks: A Comprehensive Review. Computers &
Electrical Engineering, 65, 165-177.
https://doi.org/10.1016/j.compeleceng.2017.05.020
[4] Hajdu, G., Minoso, Y., Lopez, R., Acosta, M. and Elleithy, A. (2019) Use of
Artificial Neural Networks to Identify Fake Profiles. 2019 IEEE Long Island
Systems, Applications and Technology Conference (LISAT), Farmingdale, 3 May
2019, 1-4.
https://doi.org/10.1109/LISAT.2019.8817330
[5] Swe, M.M. and Myo, N.N. (2018) Fake Accounts Detection on Twitter Using
Blacklist. 2018 IEEE/ACIS 17th International Conference on Computer and
Information Science (ICIS), Singapore, 6-8 June 2018, 562-566.
https://doi.org/10.1109/ICIS.2018.8466499
[6] Wanda, P. and Jie, H.J. (2020) DeepProfile: Finding Fake Profile in Online Social
Network Using Dynamic CNN. Journal of Information Security and Applications,
52, Article ID: 102465.
https://doi.org/10.1016/j.jisa.2020.102465
[7] Kodati, S., Reddy, K.P., Mekala, S., Murthy, P.S. and Reddy, P.C.S. (2021)
Detection of Fake Profiles on Twitter Using Hybrid SVM Algorithm. E3S Web of
Conferences, 309, Article No. 01046.
https://doi.org/10.1051/e3sconf/202130901046
[8] Meshram, E.P., Bhambulkar, R., Pokale, P., Kharbikar, K. and Awachat, A. (2021)
Automatic Detection of Fake Profile Using Machine Learning on Instagram.
International Journal of Scientific Research in Science and Technology, 8, 117-
127.
https://doi.org/10.32628/IJSRST218330
[9] Chakraborty, P., Muzammel, C.S., Khatun, M., Islam, S.F. and Rahman, S. (2020)
Automatic Student Attendance System Using Face Recognition. International
Journal of Engineering and Advanced Technology (IJEAT), 9, 93-99.
https://doi.org/10.35940/ijeat.B4207.029320
[10] Sayeed, S., Sultana, F., Chakraborty, P. and Yousuf, M.A. (2021) Assessment of
Eyeball Movement and Head Movement Detection Based on Reading. In:
Bhattacharyya, S., Mršić, L., Brkljačić, M., Kureethara, J.V. and Koeppen, M.,
Eds., Recent Trends in Signal and Image Processing, Springer, Singapore, 95-103.
https://doi.org/10.1007/978-981-33-6966-5_10