0% found this document useful (0 votes)

43 views18 pages

DESERTATION

The research project from Malawi University of Science and Technology focuses on using machine learning to detect smishing attacks, a form of phishing via SMS. The project aims to develop a detection model that categorizes messages as legitimate or smishing, addressing the increasing prevalence of such attacks in Malawi, where low literacy rates hinder traditional awareness methods. The proposed solution includes a web-based interface for users to verify suspicious messages and an API for integration into other systems.

Uploaded by

lawrencechikopa1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views18 pages

DESERTATION

Uploaded by

lawrencechikopa1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MALAWI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MALAWI INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND TECHNOLOGY
COMPUTER SYSTEMS AND SECURITY
RESEARCH PROJECT

Using Machine Learning to Detect Phishing Attacks(Smishing)

GROUP 12
GROUP MEMBERS:
SIMEON MATAKA CIS/028/19
CHINSISI KABUKONDE CIS/006/19
GABRIEL MTHUNZI CIS/029/19
LAWRENCE CHIKOPA CIS/018/19
BLESSINGS NYIRENDA CIS/034/19
THOKOZANI GEORGE CIS/021/19
ALEX IMANI CIS/023/19
JENIFFER BAKALI CIS/001/19

Supervisor: Ralph Tambala

October 3rd, 2023

Acknowledgments
I would like to thank my supervisor, my family, and my friends for their support and guidance throughout this
project.

I
Abstract
I would like to thank my supervisor, my family, and my friends for their support and guidance throughout this
project.

II
List of Contents

Acknowledgments I

Abstract II

1 INTRODUCTION 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Research aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.4 Model Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Structure of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 LITERATURE REVIEW 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Existing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Machine learning for smishing detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Datasets and evaluation matrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Challenges and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.6 Discussion and recent advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 SYSTEM DESIGN 6

3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 API Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.5 Machine learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

III
3.6 Data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.7 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.8 Scalability and performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.9 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.10 Error handling and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.11 Dep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 IMPLEMENTATION AND TESTING 10

4.1 Smishing detection system in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 EVALUATION 11

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 CONCLUSION 11

6.1 Summary of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.2 Research study contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.3 Limitation of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

7 REFERENCES 12

8 APPENDIX 12

IV
List of Tables

V
1 INTRODUCTION

1.1 Background
In a world of increasing inter-connectivity and digitization, where personal, financial, and sensitive information
is routinely exchanged online, the pervasive threat of phishing attacks has emerged as a major concern
for individuals, organizations, and society at large (Kumar & Gouda, 2023). Phishing is a form of social
engineering, in which malicious actors imporsonate legitimate entitities to lure victims into disclosing sensitive
information (Akerlof & Shiller, 2016). According to Hadnagy and Wozniak, social engineering gives you the
inside information you need to mount an unshakable defense(2018).

Smishing is a social engineering attack that uses fake mobile text messages to trick people into download-
ing malware, sharing sensitive information or sending money to cybercrimnals. The term "smishing" is a
combination of "SMS" or "Short message service," which is the technology behind text messages and "phish-
ing" (IBM,n.d) . According to Forbes, smishing is "a malicious practice that aims to deceive people through
text messages which utilizes persuasive messages to trick recipients into revealing sensitive information or
downloading harmful content." Smishing is a significant cybersecurity threat because it can result in financial
losses,identify theft and data breaches. Attackers use sensitive information obtained through smishing to
steal money from bank accounts or credit cards, trick victims into making fraudulent purchases or transferring
money to the attacker (Forbes, n.d). In addition to that, smishing can also be used to trick employees into
giving away sensitive corporate information, which can then be used to launch a large cyber attack , such as
ransomware or business email compromise.

Malawi has a less than a seventy percent literacy rate(macrotrends, 2023), with such levels of computer
literacy, traditional methods of awareness have proven to be inadequate in the face of these dynamic threats.
Combating phishing requires innovative, adaptable, and proactive strategies than can keep the pace with the
rapidly changing threat landscape. Machine learning, a subset of artificial intelligence, has emerged as a
promising technology in the ongoing battle against phishing attacks (Akanbi, Amiri & Fazeldehkordi, 2018).
It’s ability to analyze vast datasets, detect patterns, and adapt to new attack vectors aligns seamlessly with
the dynamic nature of phishing threats.

Recently, the level of smishing attacks in Malawi has increased as more people are using mobile phones
and internet. Customers have given their personal information like personal identification numbers(PIN) for
their bank accounts and mpamba accounts, where by attackers pose as mobile network or service provider
officials (TNM,2020) . Due to lack of awareness regarding information that service providers would request
from customers and channels used for communication customers have been tricked in giving out their personal
details.

1.2 Problem Statement

SMS spam has been increasingly prevalent in recent years. SMS spam is defined as any fictitious text
message that is distributed via a mobile network without the recipient’s knowledge. They are a source of
concern for users. 68 percent of mobile phone users have been impacted by SMS spam , according to a
recent survey (cooke,2023). While most users know the dangers of clicking a link in a text message , fewer
people know the dangers of clicking these links. Detailed smishing statistics in 2023 shows that around
378,509,197 spam texts were sent and received per day in April 2022.

According to recent news, many social media users have reported receiving text messages and said when
they called the sender they were told to send money to a particular mobile number for them to redeem the
purported money transfer([Link],2019). Criminals use identity theft, fake promotional SIM swap services

1
, SMS fraud as well as impersonation by calling random numbers informing people they have property at the
border that requires a fee for clearance([Link],2021). Malawi is losing huge sums of money ($117K) a
month to mobile money fraud,([Link],2023).

Although various datasets are available to test email spam detection algorithms, the datasets to train and test
techniques for SMS spam detection are still limited and small sized. Moreover , unlike emails , the length
of text messages is short, that is , less statistically-differentiable information , due to which the availability
of number of features required to detect spam SMS are less. Text messages are highly influenced by the
presence of informal languages like regional words, idioms,phrases and abbreviations due to which email
spam filtering methods fail in the case of SMS (Mehul Gupta,2018).

The proposed model is a smishing detector in form of a website which will categorize the SMS as legitimate
or smishing based on the results of different detection techniques applied. Users will benefit from this site by
simply copying and pasting suspicious messages for detection before they reply or give in to demands from
cyber criminals. The model will also be provided to user who want to integrate it into their systems via an
API.

1.3 Research aim

1.3.1 Aim

The research aimed to employ machine learning algorithms to detect phishing (smishing) attacks with as few
errors as possible.

1.3.2 Objectives

The research work had the following four specific objectives:

1. To conduct a comprehensive examination and assessment of the available body of literature pertaining
to smishing attacks and their detection techniques leveraging machine learning.

2. To collect, clean and preprocess a dataset of both legit and smishing messages for training and testing
the machine learning models.

3. To develop a machine learning model using suitable algorithms for smishing detection.

4. To evaluate the performance of the developed machine learning model using appropriate evaluation
metrics.

5. To assess the usability and user-friendliness of the smishing detector considering end user experience.

1.3.3 Research questions

With respect to the research objectives, the research work intended to provide answers to the following ques-
tions:

1. What machine learning algorithms or techniques can be used to develop an effective smishing detection
model?

2. How accurate and reliable is the developed machine learning model in detecting smishing attacks?

3. What are the challenges of using machine learning for detecting smishing attacks and how can they be
addressed?

4. How the developed smishing detector affects end user satisfaction?

2
1.4 Methodology
This chapter presents the methodology employed in the research to achieve the stated objectives of devel-
oping and evaluating a smishing detection system based on machine learning models. The methodology
encompasses data collection, preprocessing, model selection, and evaluation techniques.

1.4.1 Data Collection

The first step in building an effective smishing detection system is to collect a diverse and representative
dataset of text messages, including both legitimate and smishing messages. Data was collected from multiple
sources, including public smishing datasets and also data collection among peers. The dataset includes
text messages in english and chichewas, ensuring that the system is capable of detecting smishing attempts
across different regions and languages.

1.4.2 Data Preprocessing

Prior to model training, the collected data underwent preprocessing steps to ensure data quality and suitability
for machine learning tasks. This preprocessing included the following steps:

1. TextCleaning: Removal of special characters, punctuation, and white spaces.

2. Tokenization: Splitting text into individual words or tokens.

3. Stopword Removal: Elimination of common stopwords to reduce noise.

4. Stemming/Lemmatization: Reducing words to their root form for consistency.

1.4.3 Model Selection

To determine the most suitable machine learning models for smishing detection, a comprehensive evaluation
of multiple algorithms was conducted. The following machine learning algorithms were considered:

1. Logistic Regression

2. Random Forest

3. Support Vector Machine (SVM)

4. Naive Bayes

5. Neural Networks

The choice of these models was based on their suitability for text classification tasks and their previous
success in similar domains.

1.4.4 Model Training and Evaluation

Each machine learning model was trained using the preprocessed dataset. The dataset was split into training
and testing sets (80 % training, 20% testing) to assess model performance. The following evaluation metrics
were used:

1. Accuracy: To measure the overall correctness of the model’s predictions.

2. Precision: To quantify the model’s ability to correctly identify smishing messages without false positives.

3. Recall: To measure the model’s ability to detect all actual smishing messages.

3
4. F1-Score: A balance between precision and recall.

Given the prevalence of smishing attacks and their potentially detrimental consequences, there is a compelling
need for a more robust and reliable smishing detection system. The current state-of-the-art methods and
their limitations are outlined in the problem statement and the subsequent literature review in Chapter 2. To
address this issue, a smishing detection system is developed, incorporating multiple machine learning models
selected based on accuracy and precision criteria. Chapter 3 presents this innovative solution, while Chapter
4 & 5 provides a rigorous assessment of its performance. The contribution of this research work to the field is
elaborated upon in Chapter 6.

1.5 Structure of dissertation

The following is how the rest of the dissertation for a smishing detection system is organized:

1. Chapter 2 provides a comprehensive survey of the literature review on smishing detection techniques
and related research.

2. Chapter 3 presents the system designs, outlining the architecture and components of the smishing
detection system.

3. In Chapter 4, we delve into the implementation and testing of the model, a key component of the
smishing detection system.

4. Chapter 5 is dedicated to the evaluation of the smishing detection system, discussing its effectiveness,
performance, and real-world applicability.

5. Chapter 6 concludes the dissertation, summarizing the findings and contributions of the research on
smishing detection and providing insights into future research direction

1.6 Conclusion
In this section we have discussed the need for a smishing detector system and proper approach in developing
this system has been explained. A literature review related to this project has discussed in the following
chapter.

2 LITERATURE REVIEW

2.1 Introduction
Smishing, a portmanteau of "SMS" and "phishing," refers to a deceptive cyber-attack that employs text mes-
sages to trick recipients into divulging sensitive information or performing malicious actions. In an era of
increasing digital communication, the threat of smishing has become a pressing concern for individuals and
organizations alike. This literature review explores the landscape of smishing detection, focusing on the uti-
lization of machine learning methods.

2.2 Existing methods

Review current methods and technologies used for detecting smishing attacks. Discuss traditional methods,
such as rule-based and keyword-based approaches. Explore more advanced techniques, such as machine
learning, natural language processing (NLP), and anomaly detection.

4
2.3 Machine learning for smishing detection
Delve into the use of machine learning algorithms and models for smishing detection. Explain the features
and datasets commonly used for training and evaluating smishing detection systems. Discuss the strengths
and limitations of machine learning in this context.

2.4 Datasets and evaluation matrics

List and describe publicly available datasets that researchers use for smishing detection experiments. Men-
tion any challenges or limitations associated with these datasets. Explain the metrics used to assess the
performance of smishing detection systems, such as precision, recall, F1-score, and ROC curves.

2.5 Challenges and open problems

Identify the challenges and limitations of existing smishing detection methods. Highlight open research ques-
tions and areas where improvements are needed.

2.6 Discussion and recent advances

Discuss recent research papers or developments in the field of smishing detection. Highlight innovative ap-
proaches or technologies that show promise.

2.7 Conclusion

5
3 SYSTEM DESIGN

3.1 System Architecture

The below figure shows the architecture of the whole system. I has a web interface for users to paste in their
messages and an API endpoint at which client applications can connect also. The API and model reside on
the same server, the messages are then store in a database for future trainig of the model.

Figure 1: System architecture

3.2 Components

6
3.3 API Design
The API is designed in a way that is accepts json requests and gives response in json as well.

Figure 2: API dataflow

3.4 Web interface

7
3.5 Machine learning Model

Figure 3: Model architecture

8
3.6 Data flow

3.7 Integration

3.8 Scalability and performance

3.9 Security

3.10 Error handling and logging

3.11 Dep

9
4 IMPLEMENTATION AND TESTING

Figure 4: Data visuals

Figure 5: Smishing and legit percentages

10
4.1 Smishing detection system in action

Figure 6: web interface

5 EVALUATION

5.1 Introduction

5.2 Methods

5.3 Results

5.4 Discussion

5.5 Conclusion

6 CONCLUSION

6.1 Summary of the research work

6.2 Research study contribution

6.3 Limitation of the research work

6.4 Future work

11
7 REFERENCES
Kumar, Mr & Gouda, Sandeepta. (2023). A COMPREHENSIVE STUDY OF PHISHING ATTACKS AND

THEIR COUNTERMEASURES. 10.13140/RG.2.2.36686.13120.

Akerlof, G. A., & Shiller, R. J. (2016). Phishing for Phools: The economics of manipulation and

deception. Princeton University Press.

Hadnagy, C., & Wozniak, S. (2018). Social engineering the Science of Human Hacking.

John Wiley & Sons, Incorporated.

Malawi literacy rate 1987-2023. MacroTrends. (n.d.). [Link]

literacy-rate

Akanbi, O. A., Amiri, I. S., & Fazeldehkordi, E. (2015). A machine learning approach to phishing

detection and Defense. Elsevier.

[Link]

8 APPENDIX

Fraud Detection System Mini Project Report
No ratings yet
Fraud Detection System Mini Project Report
34 pages
Cyberattack Prediction with Machine Learning
No ratings yet
Cyberattack Prediction with Machine Learning
65 pages
Major Project File
No ratings yet
Major Project File
53 pages
Team 09 Report (2) Removed
No ratings yet
Team 09 Report (2) Removed
62 pages
Final Year Stage 2
No ratings yet
Final Year Stage 2
51 pages
Phishingreport
No ratings yet
Phishingreport
19 pages
AI-Based Phishing and Smishing Detection
No ratings yet
AI-Based Phishing and Smishing Detection
39 pages
Phishing Website Detection with ML
No ratings yet
Phishing Website Detection with ML
65 pages
Predicting Vulnerabilities in Software
No ratings yet
Predicting Vulnerabilities in Software
104 pages
Hybrid ML Model for Email Phishing Detection
No ratings yet
Hybrid ML Model for Email Phishing Detection
100 pages
Visvesvaraya Technological University: "Machine Learning Based Approach To Detect Phishing Attacks"
No ratings yet
Visvesvaraya Technological University: "Machine Learning Based Approach To Detect Phishing Attacks"
78 pages
Malicious Site Detection Using ML
No ratings yet
Malicious Site Detection Using ML
58 pages
Cyber-Attack Prediction Using Machine Learning
No ratings yet
Cyber-Attack Prediction Using Machine Learning
11 pages
Phishing Website Detection with ML
No ratings yet
Phishing Website Detection with ML
19 pages
Cyber Threat Detection Performance Evaluation
No ratings yet
Cyber Threat Detection Performance Evaluation
78 pages
SKN Report
No ratings yet
SKN Report
26 pages
Project Report Final Black Book
No ratings yet
Project Report Final Black Book
40 pages
Cyber Attack
No ratings yet
Cyber Attack
131 pages
Phishing Website Detection
No ratings yet
Phishing Website Detection
63 pages
Phishing Attack Detection and Analysis
No ratings yet
Phishing Attack Detection and Analysis
136 pages
Phishing Website Detection DOCUMENTATION
0% (2)
Phishing Website Detection DOCUMENTATION
80 pages
DGA Botnet Detection Project Report
No ratings yet
DGA Botnet Detection Project Report
36 pages
Optimizing Web Application Fuzzing With Genetic Algorithms and Language Theory
No ratings yet
Optimizing Web Application Fuzzing With Genetic Algorithms and Language Theory
79 pages
النسخة بعد الترقيم 6 بعد المراجعة
No ratings yet
النسخة بعد الترقيم 6 بعد المراجعة
89 pages
Untitled
No ratings yet
Untitled
72 pages
Full Proj Report
No ratings yet
Full Proj Report
59 pages
Hybrid ML Phishing Detection System
No ratings yet
Hybrid ML Phishing Detection System
22 pages
Phishing Detection via Hybrid ML Model
No ratings yet
Phishing Detection via Hybrid ML Model
32 pages
Preview: Comparison of Machine Learning Algorithms and Their Ensembles For Botnet Detection
100% (2)
Preview: Comparison of Machine Learning Algorithms and Their Ensembles For Botnet Detection
11 pages
Ensemble ML for Phishing Detection
No ratings yet
Ensemble ML for Phishing Detection
11 pages
Machine Learning for Phishing Detection
No ratings yet
Machine Learning for Phishing Detection
41 pages
Phishing URL Detection Using ML Techniques
No ratings yet
Phishing URL Detection Using ML Techniques
52 pages
Synopsis 043705
No ratings yet
Synopsis 043705
21 pages
Final Thesis Report Merged
No ratings yet
Final Thesis Report Merged
72 pages
Phishing Website Detection with ML
No ratings yet
Phishing Website Detection with ML
95 pages
Phishing Website Detection with ML
No ratings yet
Phishing Website Detection with ML
73 pages
Paper 5665
No ratings yet
Paper 5665
117 pages
Demo
No ratings yet
Demo
50 pages
Phishing Site Detection Using Machine Learning and Browser Extension Integration - 2170041
No ratings yet
Phishing Site Detection Using Machine Learning and Browser Extension Integration - 2170041
56 pages
Applsci 13 04649
No ratings yet
Applsci 13 04649
16 pages
RigmaUmesh Finalprojectreport
No ratings yet
RigmaUmesh Finalprojectreport
60 pages
Phishing Website Detection with ML
No ratings yet
Phishing Website Detection with ML
5 pages
Vinodhini Project
No ratings yet
Vinodhini Project
66 pages
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
Hydara - XSS - 2014
No ratings yet
Hydara - XSS - 2014
17 pages
Publication Management System Overview
No ratings yet
Publication Management System Overview
30 pages
Toxic Comment Classification Project
No ratings yet
Toxic Comment Classification Project
29 pages
Open Source Cyber Threat Intelligence Report
No ratings yet
Open Source Cyber Threat Intelligence Report
47 pages
Summer Intern Report
No ratings yet
Summer Intern Report
25 pages
TSP CMC 51778
No ratings yet
TSP CMC 51778
21 pages
Cyber Attack Classification with ML Techniques
No ratings yet
Cyber Attack Classification with ML Techniques
71 pages
Dummy Research
No ratings yet
Dummy Research
2 pages
Trust Wave Monitor For Cyber Threats: Project Report
No ratings yet
Trust Wave Monitor For Cyber Threats: Project Report
84 pages
Thesis
No ratings yet
Thesis
76 pages
Final Doc of Mini Project Comprised
No ratings yet
Final Doc of Mini Project Comprised
63 pages
Manohar DC Inte
No ratings yet
Manohar DC Inte
17 pages
Mini Project Phishing Website Detection Using ML
No ratings yet
Mini Project Phishing Website Detection Using ML
45 pages
Business Overview
No ratings yet
Business Overview
33 pages
CSS Group - 8 Business Plan
No ratings yet
CSS Group - 8 Business Plan
24 pages
Social Life Form 2 Structured
No ratings yet
Social Life Form 2 Structured
2 pages
Application for Computer Science Lecturer
No ratings yet
Application for Computer Science Lecturer
1 page
Application For Statistical Clerk Pih
No ratings yet
Application For Statistical Clerk Pih
1 page
Application for WFP Intern Position
No ratings yet
Application for WFP Intern Position
1 page
Canva Graphics Design Assignment Guide
No ratings yet
Canva Graphics Design Assignment Guide
1 page
Social Media Marketing Basics Guide
No ratings yet
Social Media Marketing Basics Guide
25 pages
Application For Intern Supply Chain Food Systems Position
No ratings yet
Application For Intern Supply Chain Food Systems Position
1 page
Virtual Server Setup Guide
No ratings yet
Virtual Server Setup Guide
6 pages
Digital Skills Ambassador Requisition Form
No ratings yet
Digital Skills Ambassador Requisition Form
1 page
Form 4 Paper 1
No ratings yet
Form 4 Paper 1
8 pages
Lawrence Chikopa Internal Auditor Application
No ratings yet
Lawrence Chikopa Internal Auditor Application
1 page
Interpersonal Communication Course Certificate
No ratings yet
Interpersonal Communication Course Certificate
1 page
Perfect Secrecy in Cryptography
No ratings yet
Perfect Secrecy in Cryptography
3 pages
Application for Product Data Analyst
No ratings yet
Application for Product Data Analyst
1 page
Information System Audit Process
No ratings yet
Information System Audit Process
20 pages
Teaching Position Application - Lawrence Chikopa
No ratings yet
Teaching Position Application - Lawrence Chikopa
1 page
Help Desk Officer Application - Lawrence Chikopa
100% (1)
Help Desk Officer Application - Lawrence Chikopa
1 page
Report Writing
No ratings yet
Report Writing
24 pages
DBMS Lab Manual for Malla Reddy College
No ratings yet
DBMS Lab Manual for Malla Reddy College
81 pages
API Gateway & BFF Security Patterns
No ratings yet
API Gateway & BFF Security Patterns
27 pages
Software Quality Analyst - Preparation Document 2023
No ratings yet
Software Quality Analyst - Preparation Document 2023
2 pages
15 SAP CPI Interview Questions and Answers - CLIMB
No ratings yet
15 SAP CPI Interview Questions and Answers - CLIMB
9 pages
Swapangeet - Salesforce ResumeSF
No ratings yet
Swapangeet - Salesforce ResumeSF
3 pages
ACI Programmability Part 1 - The ACI Object Model
No ratings yet
ACI Programmability Part 1 - The ACI Object Model
33 pages
Coursera Business API Onboarding Guide
No ratings yet
Coursera Business API Onboarding Guide
11 pages
Tms FNC Maps: Developers Guide
No ratings yet
Tms FNC Maps: Developers Guide
78 pages
Exhibit 22
No ratings yet
Exhibit 22
30 pages
Rate Limit in Springboot
No ratings yet
Rate Limit in Springboot
3 pages
Final Year Project Report 2nd Sem-2
No ratings yet
Final Year Project Report 2nd Sem-2
82 pages
Problem Statement
No ratings yet
Problem Statement
7 pages
FortiNAC REST API V2 Configuration
No ratings yet
FortiNAC REST API V2 Configuration
16 pages
Tech 101 For PMs HelloPM 1640879694
No ratings yet
Tech 101 For PMs HelloPM 1640879694
10 pages
Deliverable 1 - Solution (SRS Document)
No ratings yet
Deliverable 1 - Solution (SRS Document)
10 pages
VhiWEB Backend Developer Test Case PDF
No ratings yet
VhiWEB Backend Developer Test Case PDF
4 pages
Full Stack Java Developer Profile
No ratings yet
Full Stack Java Developer Profile
5 pages
E-Governance Software Guidelines
No ratings yet
E-Governance Software Guidelines
69 pages
Faceme: ® Platform
No ratings yet
Faceme: ® Platform
2 pages
Vinay Resume
No ratings yet
Vinay Resume
6 pages
Unit1 Question
No ratings yet
Unit1 Question
28 pages
Sarthak Agarwal: Software Developer Profile
No ratings yet
Sarthak Agarwal: Software Developer Profile
3 pages
Biomax Data Integration for CNC Monitoring
No ratings yet
Biomax Data Integration for CNC Monitoring
33 pages
Unit IV Wordpress - Notes
No ratings yet
Unit IV Wordpress - Notes
25 pages
Operating Systems Overview and Functions
No ratings yet
Operating Systems Overview and Functions
36 pages
PM Resume
No ratings yet
PM Resume
2 pages
Midnight Public Alpha Addon API Changes
No ratings yet
Midnight Public Alpha Addon API Changes
4 pages
c09 Aws Blu Age Custom Architecture 2405
No ratings yet
c09 Aws Blu Age Custom Architecture 2405
45 pages
Distributed Microservices Architecture For Supply Chain Management System
No ratings yet
Distributed Microservices Architecture For Supply Chain Management System
69 pages
Forcepoint ONE CASB Overview
No ratings yet
Forcepoint ONE CASB Overview
6 pages
Certification Guidance - Cloud Implementation Fundamentals
No ratings yet
Certification Guidance - Cloud Implementation Fundamentals
99 pages

DESERTATION

Uploaded by

DESERTATION

Uploaded by

MALAWI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MALAWI INSTITUTE OF TECHNOLOGY

Using Machine Learning to Detect Phishing Attacks(Smishing)

Supervisor: Ralph Tambala

October 3rd, 2023

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Research aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.4 Model Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Structure of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Existing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Machine learning for smishing detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Datasets and evaluation matrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Challenges and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.6 Discussion and recent advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 API Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.5 Machine learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.8 Scalability and performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.10 Error handling and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 IMPLEMENTATION AND TESTING 10

4.1 Smishing detection system in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.1 Summary of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.2 Research study contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.3 Limitation of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Problem Statement

1.3 Research aim

The research work had the following four specific objectives:

1.3.3 Research questions

4. How the developed smishing detector affects end user satisfaction?

1.4.1 Data Collection

1.4.2 Data Preprocessing

1. TextCleaning: Removal of special characters, punctuation, and white spaces.

2. Tokenization: Splitting text into individual words or tokens.

3. Stopword Removal: Elimination of common stopwords to reduce noise.

4. Stemming/Lemmatization: Reducing words to their root form for consistency.

1.4.3 Model Selection

3. Support Vector Machine (SVM)

1.4.4 Model Training and Evaluation

1. Accuracy: To measure the overall correctness of the model’s predictions.

1.5 Structure of dissertation

2.2 Existing methods

2.4 Datasets and evaluation matrics

2.5 Challenges and open problems

2.6 Discussion and recent advances

3.1 System Architecture

Figure 1: System architecture

Figure 2: API dataflow

3.4 Web interface

Figure 3: Model architecture

3.8 Scalability and performance

3.10 Error handling and logging

Figure 4: Data visuals

Figure 5: Smishing and legit percentages

Figure 6: web interface

6.1 Summary of the research work

6.2 Research study contribution

6.3 Limitation of the research work

6.4 Future work

THEIR COUNTERMEASURES. 10.13140/RG.2.2.36686.13120.

deception. Princeton University Press.

John Wiley &amp; Sons, Incorporated.

Malawi literacy rate 1987-2023. MacroTrends. (n.d.). [Link]

detection and Defense. Elsevier.

You might also like

John Wiley & Sons, Incorporated.