Group 12 Final Report Seminar

A
SEMINAR BASED LEARNING REPORT
ON
“Phishing Attack Detection Using Artificial Intelligence”

SUBMITTED TO THE SAVITRIBAI PHULE UNIVERSITY, PUNE
IN PARTIAL FULFILMENTN OF THE REQUIREMENTS FOR
THE AWARD OF THE
THIRD YEAR ENGINEERING (INFORMATION TECHNOLOGY)
BY
Ms. Mahale Gauri C (S190108552)

Ms. Kawar Rutuja G (S190108547)
Mr. Waghmode Sanjay B (S190108587)
MS. Dhanapune Kirti S (S190108518)
Under Guidance of
Prof. Muneshwar R.N.
DEPARTMENT OF INFORMATION TECHNOLOGY
AMRUTVAHINI COLLEGE OF ENGINEERING, SANGAMNER

A/P: GHULEWADI, SANGAMNER, AHMEDNAGAR, PIN - 422608
YEAR 2021-22
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE i


A/P: GHULEWADI, SANGAMNER, AHMEDNAGAR, PIN - 422608
YEAR 2021-22
CERTIFICATE
This is to certify that Seminar Based learning report entitled

Is submitted as partial fulfilment of curriculum of the TE of information technology
BY

Prof. Muneshwar R.N. Dr. Chaudhari M.A. Dr. Gunjal B.L.

Seminar Guide Seminar Coordinator HOD IT
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE ii

Savitribai Phule Pune University
CERTIFICATE
This is to certify that

Students of TE Information Technology was examined in Seminar Based Learning report entitled

On
…/…/2021
At

YEAR 2021-22
-------------------- ----------------------
(Internal Examiner) (External Examiner)
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE iii

Certificate by Guide
This is to certify that

Has completed the Seminar Based Learning work under my guidance and that, I have verified the
work for its originality in documentation, problem statement, literature survey and conclusion
presented in seminar work
Place: Sangamner Prof. Muneshwar R.N
Date: AVCOE, Sangamner
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE iv

Acknowledgement
I would like to express my profound grateful to Dr. Gunjal B.L. (HOD IT) for providing an
opportunity to complete my academics and present this technical seminar, and for providing me
invaluable guidance for the technical seminar. I would like to show my greatest appreciation to
Prof. Muneshwar R.N. and Dr. Chaudhari M.A. (Seminar Coordinator). I can’t say thank you
enough for his tremendous support and help. The guidance and support received from all the
members who contributed and who are contributing to this report, was vital for the success of the
project. I am grateful for their constant support and help. The project on “Phishing Attack
Detection Using Artificial Intelligence” was very helpful to us in giving the necessary
background information and inspiration in choosing this topic for the project. Our sincere thanks
to Prof. Muneshwar R.N. and Dr. Chaudhari M.A. (Seminar Coordinator). Their
contributions and technical support in preparing this report are greatly acknowledged. Last but not
the least, we wish to thank our parents for financing our studies in this college as well as for
constantly encouraging us to learn engineering. Their personal sacrifice in providing this
opportunity to learn engineering is gratefully acknowledged.
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE v

Abstract
With the significant growth of internet usage, people increasingly share their personal information
online. As a result, an enormous amount of personal information and financial transactions become
vulnerable to cybercriminals. Phishing is an example of a highly effective form of cybercrime that
enables criminals to deceive users and steal important data. Since the first reported phishing attack
in 1990, it has been evolved into a more sophisticated attack vector. At present, phishing is
considered one of the most frequent examples of fraud activity on the Internet. Phishing attacks
can lead to severe losses for their victims including sensitive information, identity theft,
companies, and government secrets. This article aims to evaluate these attacks by identifying the
current state of phishing and reviewing existing phishing techniques. Studies have classified
phishing attacks according to fundamental phishing mechanisms and countermeasures discarding
the importance of the end-to-end lifecycle of phishing. This article proposes a new detailed
anatomy of phishing which involves attack phases, attacker’s types, vulnerabilities, threats, targets,
attack mediums, and attacking techniques. Moreover, the proposed anatomy will help readers
understand the process lifecycle of a phishing attack which in turn will increase the awareness of
these phishing attacks and the techniques being used; also, it helps in developing a holistic anti-
phishing system.
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE vi

Table of Contents
TITLE PAGE
NO.
Certificate 3
i) Acknowledgement 5
ii) Abstract 6
iii) List of Contents 7
iv) List of Figures 8
v) List of Abbreviations 9
1. Introduction 10
2. Literature survey 11
3. Proposed Work 13
4. Architecture and working 16
5. Applications, Advantages, Disadvantages 21
6. Future Scope 22
7. conclusion 23
References 24
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE vii

List of Figure
Phishing attack 10
Personal computer clients are victims of phishing attack 14
Architecture of phishing attack 16
Working of phishing attack 17
Pre-processing and Classification Process 18
Example of KNN 19
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE viii

List of Abbreviations
ML Machine Learning
DL Deep Learning
ANN Artificial neural network
NN Neural-networks
AI Artificial intelligence
HT Hybrid Technique
KNN K -Nearest Algorithm
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE ix

1. Introduction
• A phishing attack has become one of the most prominent attacks faced by internet users,
governments, and service-providing organizations.
• Phishing is the most powerful and popular attack for hacking into emails and web
documents.
• Phishing attack uses fake websites to take sensitive client data, for example, account login
credentials, credit card numbers, etc. Cyber criminals used this attack to hack into bank
account, Facebook account and email account of innocent people.[4]
• Every year most of the biggest cybercrime case involve this attack so we must know what
phishing is and how to protect your accounts from phishing attack.
• For detection of Phishing attack, we used Artificial Intelligence can detect spam phishing,
skewers phishing, and different sorts of attacks.[3]
• Phishing is a type of social engineering attack often used to steal user data, including login
credentials and credit card numbers. It occurs when an attacker masquerading as a trusted
entity, dupes a victim into opening an email, instant message, or text message.
Fig1. Concept of Phishing attack
PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 10

2. Literature Survey
Phishing is a form of cybercrime where an attacker imitates a real person / institution by promoting
them as an official person or entity through e-mail or other communication mediums. In this type
of cyber-attack, the attacker sends malicious links or attachments through phishing e-mails that
can perform various functions, including capturing the login credentials or account information of
the victim. These e-mails harm victims because of money loss and identity theft. In this study, a
software called "Anti Phishing Simulator" was developed, giving information about the detection
problem of phishing and how to detect phishing emails. With this software, phishing and spam
mails are detected by examining mail contents. Classification of spam words added to the database
by Bayesian algorithm is provided.
Paper 1
Abdul Basit, Mahram Zafar Xuan Liu Abdul Rahman Javed ·Zunera Jalil · Kashif Kifayat
A comprehensive survey of AI-enabled phishing attacks detection techniques (IEEE) -
October 2020)
A comparative study of previous works using different approaches is discussed in the above section
with details. Machine learning based approaches, deep learning based approaches, scenario-based
approaches, and hybrid techniques are deployed in past to tackle this problem. A detailed
comparative analysis revealed that machine learning methods are the most frequently used and
effective methods to detect a phishing attack. Different classification methods such as SVM, RF,
ANN, C4.5, k-NN, DT have been used. Techniques with feature reduction give better performance.
Classification is done through ELM, SVM, LR, C4.5, LC-ELM, kNN, XGB, and feature selection
with ANOVA detected phishing attack with 99.2% accuracy, which is highest among all methods
proposed so far but with trade-offs in terms of computational cost.
Paper 2
Muhammet Baykara, Zahit Ziya Gürel - Detection of phishing attacks- March 2018
In this study, a software called "Anti Phishing Simulator" was developed, giving information about
the detection problem of phishing and how to detect phishing emails. With this software, phishing
and spam mails are detected by examining mail contents. Classification of spam words added to
the database by Bayesian algorithm is provided.

Paper 3
Ivan Ortiz-Garc ‘es’, Roberto O. Andrade†, and Maria Cazares - Detection of Phishing
Attacks with Machine Learning Techniques in Cognitive Security Architecture -March 2019
The number of phishing attacks has increased in Latin America, exceeding the operational skills
of cybersecurity analysts. The cognitive security application proposes the use of big data, machine
learning, and data analytics to improve response times in attack detection. This paper presents an
investigation about the analysis of anomalous behavior related with phishing web attacks and how
machine learning techniques can be an option to face the problem. This analysis is made with the
use of an contaminated data sets, and python tools for developing machine learning for detect
phishing attacks through of the analysis of URLs to determinate if are good or bad URLs in base
of specific characteristics of the URLs, with the goal of provide real-time information for take
proactive decisions that minimize the impact of an attack.AI is one of these possible solutions, it
can help to detect anomalous behavior, but even better AI can offer new possibilities to protect
sensible information, and it is capable to detect anomalous behavior quickly; this is why is so
important in new cybersecurity approaches.
Paper 4
Chaminda Hewage, Liqaa Nawaf, Imtiaz Ali Khan-Phishing Attacks: A Recent

Comprehensive Study and a New Anatomy – February 2021
Phishing attacks remain one of the major threats to individuals and organizations to date. As
highlighted in the article, this is mainly driven by human involvement in the phishing cycle. Often
phishers exploit human vulnerabilities in addition to favouring technological conditions (i.e.,
technical vulnerabilities). It has been identified that age, gender, internet addiction, user stress, and
many other attributes affect the susceptibility to phishing between people. In addition to traditional
phishing channels (e.g., email and web), new types of phishing mediums such as voice and SMS
phishing are on the increase. Furthermore, the use of social media-based phishing has increased in
use in parallel with the growth of social media. Concomitantly, phishing has developed beyond
obtaining sensitive information and financial crimes to cyber terrorism, hacktivism, damaging
reputations, espionage, and nation-state attacks. Research has been conducted to identify the
motivations and techniques and countermeasures to these new crimes, however, there is no single
solution for the phishing problem due to the heterogeneous nature of the attack vector. This article
has investigated problems presented by phishing and proposed a new anatomy, which describes
the complete life cycle of phishing attacks.

3. Proposed work
PROJECT TITLE
Phishing attack detection using AI’s
BACKGROUND
A cyberattack is any offensive manoeuvre that targets computer information systems,

infrastructures, computer, or personal computer devices. An attacker is a person or process that
attempts to access data, functions, or other restricted areas of the system without authorization,
potentially with malicious intent. A phishing attack has become one of the most prominent
attacks faced by internet users, governments, and service-providing organizations. Phishing
attacks is usually done through email. The goal is to steal sensitive data like credit card and login
information, or to install malware on the victim's machine. Phishing is a form of identity theft that
occurs when a malicious Web site impersonates a legitimate one in order to acquire sensitive
information such as passwords, account details, or credit card numbers. Though there are several
anti-phishing software and techniques for detecting potential phishing attempts in emails and
detecting phishing contents on websites, phishers come up with new and hybrid techniques to
circumvent the available software and techniques.
NEED OF STUDY
With the significant growth of internet usage, people increasingly share their personal information
online. As a result, an enormous amount of personal information and financial transactions become
vulnerable to cybercriminals. Phishing is an example of a highly effective form of cybercrime that
enables criminals to deceive users and steal important data. This article aims to evaluate these
attacks by identifying the current state of phishing and reviewing existing phishing techniques. [1]

Fig 2. Personal computer clients are victims of phishing attack
Personal computer clients are victims of phishing attack because of the few primary reasons:
(1) Users do not have brief information about Uniform Resource Locator (URLs),
(2) The exact idea about which pages can be trusted,
(3) Entire location of the page because of the redirection or hidden URLs, (4) The URL possess
many possible options, or some pages accidentally entered,
(5) Users cannot differentiate a phishing website page from the legitimate ones.
OBJECTIVES OF STUDY
The objective of this work is to detect phishing attack which has:
• Define phishing and identify various types of phishing scams.

• Determine and evaluate the best set of features to be used for phishing Emails detection
using Manual feature selection based on the Email structure and automated selection techniques.
• Understand how to protect yourself from being hooked by a phishing scam.
• To determine the best classification algorithm for phishing detection.

OUTCOME
Outcomes whether the user has to be notified that the website is a phishing or aware user that the
website is safe.
PREVENTION
1. Employ common sense before handing over sensitive information.
2. Avoid clicking embedded links
3. Keep your software and operating system up to date
DURATION OF PROJECT
This course will run for 1 years and 6 months
Starting
Ending

4. Architecture and working
1. Phishing Attack Architecture:
Fig 3. Architecture of phishing attack

2. How we detect Phishing attack?
Fig 4. Working of how detect phishing attack
Step 1: Dataset:
The first step in building the proposed phishing email classifier is choosing the suitable training
data set which is a real sample of existing emails that consists of both phishing and legitimate
emails (also known as spam and ham emails). The training data set will be used to discover
potentially predictive relationships that will serve as building blocks in the classifier. Our training
data set consists of 10538 emails including 5940 ham emails from spam assess in project [5] and
4598 spam emails from Nazario phishing corpus [5].
Step 2: Pre-processing:
This is the first stage that is executed whenever an incoming mail is received. This step consists of
tokenization. Tokenization: This is a process that removes the words in the body of an email. It
also transforms a message to its meaningful parts. It takes the email and divides it into a sequence
of representative symbols called token.

Step 3: Feature Extraction:
The datasets used here are spam base available at https://archive.ics.uci.edu/ml/ datasets/Spam
base, and personal mail data. The dataset other than personal mails are already feature extracted
and need not be reprocessed. The personal mails are available in raw format and hence needs
header feature extraction. Personal mail at https:// www.cs.cmu.edu /~./Enron/[21], which are
large in number, 0.5M, are feature extracted first and then normalized and then fed to weak a server
for classification. The subject words in email header can be analysed to see if all letters are capital,
if that is the case it is likely that it is a spam as spammers try to highlight or attract attention by
putting every letter in capital. Also, cleverly written words like Money written as M0ney, money,
m o n e y, mooney, M O N E Y etc. are some of the tricks used by spammers and are taken care of
during pre-processing. During the pre-processing stage, a python script is used to segregate such
email as spam.
Fig. 5 Pre-processing and Classification Process

Step 4 : Detection of mail:
KNN (K-Nearest Neighbours) is one of the very straightforward supervised learning algorithms.
However, unlike the traditional supervised learning algorithms, such as Multinomial Naive Bayes
algorithm, KNN doesn’t have an independent training stage, and then a stage where the labels for
the test data are predicted based on the trained model. Rather, the features of every test data item
are compared with the features of every training data item in real time, and then the K nearest
training data items are selected, and the most frequent class among them is given to the test data
item.
In the context of email classification (spam or ham), the features to be compared are the frequencies
of a words in each email. The Euclidean distance is used to determine the similarity between two
emails; the smaller the distance, the more similar.
Once the Euclidean Distance between a test email and each training email is calculated, the
distances are sorted in ascending order (nearest to farthest), and the K-nearest neighbouring emails
are selected. If the majority is spam, then the test email is labelled as spam, else, it is labelled as
ham.

Fig 6. Example of KNN
In the example shown above, K = 5; we are comparing the email we want to classify to the nearest
5 neighbours. In this case, 3 out of 5 emails are classified as ham (non-spam), and 2 are classified
as spam.
For that reason, we choose the knn algorithm.

5. Applications, Advantages, Disadvantages
Application:
1. Internet fraud
2. Identify theft
Advantages:
1. Build secure connection between user mail transfer agent and mail user agent.
2. Eliminate the cyber threat risk level.
3. Protect valuable corporate and personal data
Disadvantages:
1. Need large mail server and high memory requirement.

6. Future Scope
Cloud service providers have already implemented a number of security features to proactively
identify phishing attacks. Machine learning, improved email filtering, and malicious URL
detection are just a handful of capabilities that keep users safe on the web. Some providers even
warn users when replying to emails outside of their corporate domains, particularly important in
an enterprise setting. [4]
While cloud providers are often quick to recognize large scale attack and inform the public about
the right precautions to take when opening shared files, many individuals and organizations are
still subject to costly breaches. Educating users to best practices and making them aware of what
to look for can go a long way in protecting data; organizations must also take a proactive approach
to detecting these threats as they evolve. [4]
Future scope can be to integrate the system with the email service providers to develop a foolproof
system such that the email can be stopped before reaching the user and thereby attacking the
phisher beforehand.

7. Conclusion
We designed and provide proactive solution by which we can identify given URL is malicious. As
per study I concluded that, there are some of the hooks – or signs of a phishing email – that can
indicate an email is not as genuine as it appears to be like, an Unfamiliar Tone or Greeting,
Grammar and Spelling Errors, Inconsistencies in Email Addresses, Links & Domain Names
Threats or a Sense of Urgency, Suspicious Attachments. To prevent this, it's important to learn
about the tactics of phishers. Should be trained on security awareness as part of their orientation.
Inform them to be wary of e-mails with attachments from people they don't know. [4]

References
1) https://www.frontiersin.org/articles/10.3389/fcomp.2021.563060/full
2) https://www.researchgate.net/figure/Stages-in-a-Phishing-attack_fig1_235947501
3) https://www.google.com/search?q=detection+of+phishing+attack+using+artificial+intelli
gence&oq=detection+of+phishing+attack+using+artificial+intelligence+&aqs=chrome..6
9i57j69i59j35i39.6311j0j7&sourceid=chrome&ie=UTF-8
4) https://www.google.com/
5) https://easychair.org/publications/preprint_open/Kpsq
6) https://towardsdatascience.com/spam-email-classifier-with-knn-from-scratch-python-
6e68eeb50a9e
7) https://github.com/diegoocampoh/MachineLearningPhishing
8) https://www.enjoyalgorithms.com/blog/email-spam-and-non-spam-filtering-using-
machine-learning/

Group 12 Final Report Seminar

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 12 Final Report Seminar

Uploaded by

Copyright:

Available Formats

A

SEMINAR BASED LEARNING REPORT

“Phishing Attack Detection Using Artificial Intelligence”

THIRD YEAR ENGINEERING (INFORMATION TECHNOLOGY)

Ms. Mahale Gauri C (S190108552)

Prof. Muneshwar R.N.

DEPARTMENT OF INFORMATION TECHNOLOGY

AMRUTVAHINI COLLEGE OF ENGINEERING, SANGAMNER

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE i

AMRUTVAHINI COLLEGE OF ENGINEERING, SANGAMNER

“Phishing Attack Detection Using Artificial Intelligence”

Ms. Mahale Gauri C (S190108552)

Prof. Muneshwar R.N. Dr. Chaudhari M.A. Dr. Gunjal B.L.

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE ii

Ms. Mahale Gauri C (S190108552)

“Phishing Attack Detection Using Artificial Intelligence”

DEPARTMENT OF INFORMATION TECHNOLOGY

AMRUTVAHINI COLLEGE OF ENGINEERING, SANGAMNER

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE iii

Ms. Mahale Gauri C (S190108552)

Place: Sangamner Prof. Muneshwar R.N

Date: AVCOE, Sangamner

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE iv

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE v

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE vi

iii) List of Contents 7

iv) List of Figures 8

4. Architecture and working 16

5. Applications, Advantages, Disadvantages 21

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE vii

Personal computer clients are victims of phishing attack 14

Architecture of phishing attack 16

Working of phishing attack 17

Pre-processing and Classification Process 18

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE viii

ANN Artificial neural network

KNN K -Nearest Algorithm

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE ix

Fig1. Concept of Phishing attack

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 10

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 11

Chaminda Hewage, Liqaa Nawaf, Imtiaz Ali Khan-Phishing Attacks: A Recent

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 12

Phishing attack detection using AI’s

A cyberattack is any offensive manoeuvre that targets computer information systems,

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 13

The objective of this work is to detect phishing attack which has:

• Define phishing and identify various types of phishing scams.

• Understand how to protect yourself from being hooked by a phishing scam.

• To determine the best classification algorithm for phishing detection.

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 14

1. Employ common sense before handing over sensitive information.

2. Avoid clicking embedded links

3. Keep your software and operating system up to date

This course will run for 1 years and 6 months

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 15

1. Phishing Attack Architecture:

Fig 3. Architecture of phishing attack

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 16

Fig 4. Working of how detect phishing attack

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 17

Fig. 5 Pre-processing and Classification Process

PHISHING ATTACK DETECTION USING ARTIFICIAL INTILLIGENCE 18

emails; the smaller the distance, the more similar.