0% found this document useful (0 votes)
43 views18 pages

DESERTATION

The research project from Malawi University of Science and Technology focuses on using machine learning to detect smishing attacks, a form of phishing via SMS. The project aims to develop a detection model that categorizes messages as legitimate or smishing, addressing the increasing prevalence of such attacks in Malawi, where low literacy rates hinder traditional awareness methods. The proposed solution includes a web-based interface for users to verify suspicious messages and an API for integration into other systems.

Uploaded by

lawrencechikopa1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views18 pages

DESERTATION

The research project from Malawi University of Science and Technology focuses on using machine learning to detect smishing attacks, a form of phishing via SMS. The project aims to develop a detection model that categorizes messages as legitimate or smishing, addressing the increasing prevalence of such attacks in Malawi, where low literacy rates hinder traditional awareness methods. The proposed solution includes a web-based interface for users to verify suspicious messages and an API for integration into other systems.

Uploaded by

lawrencechikopa1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MALAWI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MALAWI INSTITUTE OF TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND TECHNOLOGY
COMPUTER SYSTEMS AND SECURITY
RESEARCH PROJECT

Using Machine Learning to Detect Phishing Attacks(Smishing)

GROUP 12
GROUP MEMBERS:
SIMEON MATAKA CIS/028/19
CHINSISI KABUKONDE CIS/006/19
GABRIEL MTHUNZI CIS/029/19
LAWRENCE CHIKOPA CIS/018/19
BLESSINGS NYIRENDA CIS/034/19
THOKOZANI GEORGE CIS/021/19
ALEX IMANI CIS/023/19
JENIFFER BAKALI CIS/001/19

Supervisor: Ralph Tambala

October 3rd, 2023


Acknowledgments
I would like to thank my supervisor, my family, and my friends for their support and guidance throughout this
project.

I
Abstract
I would like to thank my supervisor, my family, and my friends for their support and guidance throughout this
project.

II
List of Contents

Acknowledgments I

Abstract II

1 INTRODUCTION 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Research aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.4 Model Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Structure of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 LITERATURE REVIEW 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Existing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Machine learning for smishing detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Datasets and evaluation matrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Challenges and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.6 Discussion and recent advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 SYSTEM DESIGN 6

3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 API Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.5 Machine learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

III
3.6 Data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.7 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.8 Scalability and performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.9 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.10 Error handling and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.11 Dep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 IMPLEMENTATION AND TESTING 10

4.1 Smishing detection system in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 EVALUATION 11

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 CONCLUSION 11

6.1 Summary of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.2 Research study contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.3 Limitation of the research work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

7 REFERENCES 12

8 APPENDIX 12

IV
List of Tables

V
1 INTRODUCTION

1.1 Background
In a world of increasing inter-connectivity and digitization, where personal, financial, and sensitive information
is routinely exchanged online, the pervasive threat of phishing attacks has emerged as a major concern
for individuals, organizations, and society at large (Kumar & Gouda, 2023). Phishing is a form of social
engineering, in which malicious actors imporsonate legitimate entitities to lure victims into disclosing sensitive
information (Akerlof & Shiller, 2016). According to Hadnagy and Wozniak, social engineering gives you the
inside information you need to mount an unshakable defense(2018).

Smishing is a social engineering attack that uses fake mobile text messages to trick people into download-
ing malware, sharing sensitive information or sending money to cybercrimnals. The term "smishing" is a
combination of "SMS" or "Short message service," which is the technology behind text messages and "phish-
ing" (IBM,n.d) . According to Forbes, smishing is "a malicious practice that aims to deceive people through
text messages which utilizes persuasive messages to trick recipients into revealing sensitive information or
downloading harmful content." Smishing is a significant cybersecurity threat because it can result in financial
losses,identify theft and data breaches. Attackers use sensitive information obtained through smishing to
steal money from bank accounts or credit cards, trick victims into making fraudulent purchases or transferring
money to the attacker (Forbes, n.d). In addition to that, smishing can also be used to trick employees into
giving away sensitive corporate information, which can then be used to launch a large cyber attack , such as
ransomware or business email compromise.

Malawi has a less than a seventy percent literacy rate(macrotrends, 2023), with such levels of computer
literacy, traditional methods of awareness have proven to be inadequate in the face of these dynamic threats.
Combating phishing requires innovative, adaptable, and proactive strategies than can keep the pace with the
rapidly changing threat landscape. Machine learning, a subset of artificial intelligence, has emerged as a
promising technology in the ongoing battle against phishing attacks (Akanbi, Amiri & Fazeldehkordi, 2018).
It’s ability to analyze vast datasets, detect patterns, and adapt to new attack vectors aligns seamlessly with
the dynamic nature of phishing threats.

Recently, the level of smishing attacks in Malawi has increased as more people are using mobile phones
and internet. Customers have given their personal information like personal identification numbers(PIN) for
their bank accounts and mpamba accounts, where by attackers pose as mobile network or service provider
officials (TNM,2020) . Due to lack of awareness regarding information that service providers would request
from customers and channels used for communication customers have been tricked in giving out their personal
details.

1.2 Problem Statement


SMS spam has been increasingly prevalent in recent years. SMS spam is defined as any fictitious text
message that is distributed via a mobile network without the recipient’s knowledge. They are a source of
concern for users. 68 percent of mobile phone users have been impacted by SMS spam , according to a
recent survey (cooke,2023). While most users know the dangers of clicking a link in a text message , fewer
people know the dangers of clicking these links. Detailed smishing statistics in 2023 shows that around
378,509,197 spam texts were sent and received per day in April 2022.

According to recent news, many social media users have reported receiving text messages and said when
they called the sender they were told to send money to a particular mobile number for them to redeem the
purported money transfer([Link],2019). Criminals use identity theft, fake promotional SIM swap services

1
, SMS fraud as well as impersonation by calling random numbers informing people they have property at the
border that requires a fee for clearance([Link],2021). Malawi is losing huge sums of money ($117K) a
month to mobile money fraud,([Link],2023).

Although various datasets are available to test email spam detection algorithms, the datasets to train and test
techniques for SMS spam detection are still limited and small sized. Moreover , unlike emails , the length
of text messages is short, that is , less statistically-differentiable information , due to which the availability
of number of features required to detect spam SMS are less. Text messages are highly influenced by the
presence of informal languages like regional words, idioms,phrases and abbreviations due to which email
spam filtering methods fail in the case of SMS (Mehul Gupta,2018).

The proposed model is a smishing detector in form of a website which will categorize the SMS as legitimate
or smishing based on the results of different detection techniques applied. Users will benefit from this site by
simply copying and pasting suspicious messages for detection before they reply or give in to demands from
cyber criminals. The model will also be provided to user who want to integrate it into their systems via an
API.

1.3 Research aim


1.3.1 Aim

The research aimed to employ machine learning algorithms to detect phishing (smishing) attacks with as few
errors as possible.

1.3.2 Objectives

The research work had the following four specific objectives:

1. To conduct a comprehensive examination and assessment of the available body of literature pertaining
to smishing attacks and their detection techniques leveraging machine learning.

2. To collect, clean and preprocess a dataset of both legit and smishing messages for training and testing
the machine learning models.

3. To develop a machine learning model using suitable algorithms for smishing detection.

4. To evaluate the performance of the developed machine learning model using appropriate evaluation
metrics.

5. To assess the usability and user-friendliness of the smishing detector considering end user experience.

1.3.3 Research questions

With respect to the research objectives, the research work intended to provide answers to the following ques-
tions:

1. What machine learning algorithms or techniques can be used to develop an effective smishing detection
model?

2. How accurate and reliable is the developed machine learning model in detecting smishing attacks?

3. What are the challenges of using machine learning for detecting smishing attacks and how can they be
addressed?

4. How the developed smishing detector affects end user satisfaction?

2
1.4 Methodology
This chapter presents the methodology employed in the research to achieve the stated objectives of devel-
oping and evaluating a smishing detection system based on machine learning models. The methodology
encompasses data collection, preprocessing, model selection, and evaluation techniques.

1.4.1 Data Collection

The first step in building an effective smishing detection system is to collect a diverse and representative
dataset of text messages, including both legitimate and smishing messages. Data was collected from multiple
sources, including public smishing datasets and also data collection among peers. The dataset includes
text messages in english and chichewas, ensuring that the system is capable of detecting smishing attempts
across different regions and languages.

1.4.2 Data Preprocessing

Prior to model training, the collected data underwent preprocessing steps to ensure data quality and suitability
for machine learning tasks. This preprocessing included the following steps:

1. TextCleaning: Removal of special characters, punctuation, and white spaces.

2. Tokenization: Splitting text into individual words or tokens.

3. Stopword Removal: Elimination of common stopwords to reduce noise.

4. Stemming/Lemmatization: Reducing words to their root form for consistency.

1.4.3 Model Selection

To determine the most suitable machine learning models for smishing detection, a comprehensive evaluation
of multiple algorithms was conducted. The following machine learning algorithms were considered:

1. Logistic Regression

2. Random Forest

3. Support Vector Machine (SVM)

4. Naive Bayes

5. Neural Networks

The choice of these models was based on their suitability for text classification tasks and their previous
success in similar domains.

1.4.4 Model Training and Evaluation

Each machine learning model was trained using the preprocessed dataset. The dataset was split into training
and testing sets (80 % training, 20% testing) to assess model performance. The following evaluation metrics
were used:

1. Accuracy: To measure the overall correctness of the model’s predictions.

2. Precision: To quantify the model’s ability to correctly identify smishing messages without false positives.

3. Recall: To measure the model’s ability to detect all actual smishing messages.

3
4. F1-Score: A balance between precision and recall.

Given the prevalence of smishing attacks and their potentially detrimental consequences, there is a compelling
need for a more robust and reliable smishing detection system. The current state-of-the-art methods and
their limitations are outlined in the problem statement and the subsequent literature review in Chapter 2. To
address this issue, a smishing detection system is developed, incorporating multiple machine learning models
selected based on accuracy and precision criteria. Chapter 3 presents this innovative solution, while Chapter
4 & 5 provides a rigorous assessment of its performance. The contribution of this research work to the field is
elaborated upon in Chapter 6.

1.5 Structure of dissertation


The following is how the rest of the dissertation for a smishing detection system is organized:

1. Chapter 2 provides a comprehensive survey of the literature review on smishing detection techniques
and related research.

2. Chapter 3 presents the system designs, outlining the architecture and components of the smishing
detection system.

3. In Chapter 4, we delve into the implementation and testing of the model, a key component of the
smishing detection system.

4. Chapter 5 is dedicated to the evaluation of the smishing detection system, discussing its effectiveness,
performance, and real-world applicability.

5. Chapter 6 concludes the dissertation, summarizing the findings and contributions of the research on
smishing detection and providing insights into future research direction

1.6 Conclusion
In this section we have discussed the need for a smishing detector system and proper approach in developing
this system has been explained. A literature review related to this project has discussed in the following
chapter.

2 LITERATURE REVIEW

2.1 Introduction
Smishing, a portmanteau of "SMS" and "phishing," refers to a deceptive cyber-attack that employs text mes-
sages to trick recipients into divulging sensitive information or performing malicious actions. In an era of
increasing digital communication, the threat of smishing has become a pressing concern for individuals and
organizations alike. This literature review explores the landscape of smishing detection, focusing on the uti-
lization of machine learning methods.

2.2 Existing methods


Review current methods and technologies used for detecting smishing attacks. Discuss traditional methods,
such as rule-based and keyword-based approaches. Explore more advanced techniques, such as machine
learning, natural language processing (NLP), and anomaly detection.

4
2.3 Machine learning for smishing detection
Delve into the use of machine learning algorithms and models for smishing detection. Explain the features
and datasets commonly used for training and evaluating smishing detection systems. Discuss the strengths
and limitations of machine learning in this context.

2.4 Datasets and evaluation matrics


List and describe publicly available datasets that researchers use for smishing detection experiments. Men-
tion any challenges or limitations associated with these datasets. Explain the metrics used to assess the
performance of smishing detection systems, such as precision, recall, F1-score, and ROC curves.

2.5 Challenges and open problems


Identify the challenges and limitations of existing smishing detection methods. Highlight open research ques-
tions and areas where improvements are needed.

2.6 Discussion and recent advances


Discuss recent research papers or developments in the field of smishing detection. Highlight innovative ap-
proaches or technologies that show promise.

2.7 Conclusion

5
3 SYSTEM DESIGN

3.1 System Architecture


The below figure shows the architecture of the whole system. I has a web interface for users to paste in their
messages and an API endpoint at which client applications can connect also. The API and model reside on
the same server, the messages are then store in a database for future trainig of the model.

Figure 1: System architecture

3.2 Components

6
3.3 API Design
The API is designed in a way that is accepts json requests and gives response in json as well.

Figure 2: API dataflow

3.4 Web interface

7
3.5 Machine learning Model

Figure 3: Model architecture

8
3.6 Data flow

3.7 Integration

3.8 Scalability and performance

3.9 Security

3.10 Error handling and logging

3.11 Dep

9
4 IMPLEMENTATION AND TESTING

Figure 4: Data visuals

Figure 5: Smishing and legit percentages

10
4.1 Smishing detection system in action

Figure 6: web interface

5 EVALUATION

5.1 Introduction

5.2 Methods

5.3 Results

5.4 Discussion

5.5 Conclusion

6 CONCLUSION

6.1 Summary of the research work

6.2 Research study contribution

6.3 Limitation of the research work

6.4 Future work

11
7 REFERENCES
Kumar, Mr & Gouda, Sandeepta. (2023). A COMPREHENSIVE STUDY OF PHISHING ATTACKS AND

THEIR COUNTERMEASURES. 10.13140/RG.2.2.36686.13120.

Akerlof, G. A., & Shiller, R. J. (2016). Phishing for Phools: The economics of manipulation and

deception. Princeton University Press.

Hadnagy, C., & Wozniak, S. (2018). Social engineering the Science of Human Hacking.

John Wiley & Sons, Incorporated.

Malawi literacy rate 1987-2023. MacroTrends. (n.d.). [Link]

literacy-rate

Akanbi, O. A., Amiri, I. S., & Fazeldehkordi, E. (2015). A machine learning approach to phishing

detection and Defense. Elsevier.

[Link]

[Link]

8 APPENDIX

12

You might also like