Phishing Website Detection Using Machine Learning

PHISHING WEBSITE DETECTION
USING MACHINE LEARNING
Thesis submitted to the SASTRA Deemed to be University in partial

fulfillment of the requirements for the award of the degree of
B. Tech. Electronics & Instrumentation Engineering
Submitted by
Padiri Lokesh
(Reg. No: 123006093)
Abhishek Dhanopia
(Reg. No: 123006904)
June 2023
SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING

THANJAVUR, TAMIL NADU, INDIA – 613 401
i
THANJAVUR, TAMIL NADU, INDIA – 613 401
Bonafide Certificate
This is to certify that the thesis titled “Phishing Website Detection using machine
learning” submitted in partial fulfillment of the requirements for the award of degree B.Tech.
Electronics & Instrumentation Engineering to the SASTRA Deemed to be University, is a
bona-fide record of the work done by Mr. Padiri Lokesh( Reg no.123006093),Mr. Abhishek
Dhanopia(Reg no.123006904) during the academic year 2022-23, in the School of Electrical &
Electronics Engineering, under my supervision. This thesis has not formed the basis for the
award of any degree, diploma, associateship, fellowship or other similar title to any candidate
of any University.
Signature of Project Supervisor :
Name with Affiliation : Dr. V.S.Balaji,Senior Asst Professor,SEEE
Date :
Final Project Viva-voce held on____________________________
Examiner 1 Examiner 2
ii
THANJAVUR - 613 401
Declaration
We declare that the thesis titled “Phishing Website Detection using Machine Learning”
submitted by us is an original work done by us under the guidance of Dr. V.S.BALAJI ,
Senior Asst. Professor , School of Electrical and Electronics Engineering, SASTRA
Deemed to be University during the final semester of the academic year 2022-23, in the
School of Electrical and Electronics Engineering. The work is original and wherever we
have used materials from other sources, we have given due credit and cited them in the text of
the thesis. This thesis has not formed the basis for the award of any degree, diploma,
associate ship, fellowship or other similar title to any candidate of any University
Signature of the candidates:
Name of the candidates: Padiri Lokesh

Abhishek Dhanopia
Date:
iii
ACKNOWLEDGEMENTS
We would like to thank our Honorable Chancellor Prof. R. Sethuraman for providing us
with an opportunity and the necessary infrastructure for carrying out this project as a part of
our curriculum.
We would like to thank our Honorable Vice-Chancellor Dr. S. Vaidhyasubramaniam for

the encouragement and strategic support at every step of our college life.
We extend our sincere thanks to Dr. R. Chandramouli, Registrar, SASTRA Deemed to be

University for providing the opportunity to pursue this project.
We extend our heartfelt thanks to Dr. K. Thenmozhi, Dean, School of Electrical &
Electronics Engineering and Dr. A. Krishnamoorthy, Associate Dean, Electronics and
Instrumentation Engineering.
Our guide Dr. V.S.Balaji, Senior Asst. Professor , School of Electrical & Electronics
Engineering was the driving force behind this whole idea from the start. His deep insight in
the field and invaluable suggestions helped us in making progress throughout our project
work.
We also thank the project review panel members for their valuable comments and insights
which made this project better.
We would like to extend our gratitude to all the teaching and non-teaching faculties of the
School of Electrical & Electronics Engineering who have either directly or indirectly helped
us in the completion of the project.
We gratefully acknowledge all the contributions and encouragement from my family and
friends resulting in the successful completion of this project. We thank you all for providing
us an opportunity to showcase my skills through the project.
iv
PHISHING WEBSITE DETECTION
ABSTRACT
In the current era, most of connected to the internet and various social media platform
indirectly keep all their personal information stored in mobile phones, and computer most of
theft are making the loophole and an intruder is trying to send them spam mailing to using
the present most educationalist are unknowing or by knowing we can click the link by the
phishers are trying theft our information so that we are providing a website detection via
machine learning we are preprocessing the data and we can trained the data by collecting
way2 messages website by using database we can train the model so that we can easily
predict the phishing .whereas data mining is unable to process the data so that we can
create the webpage based on we are login in we can find the phishing detection by
modeling and predict the score we can use various types of machine learning algorithms we
can which algorithm is best suits for prediction the results .
Specific contribution:
• Collection of data
• Developed the program for all algorithm
• Create a web interface GUI model for the project ,coding, report making
Specific Learning :
• Understand the distinct security problems by phishing website detection

• Designed different modules to implemethe nt proposed approach , the integrates
them work efficiently
• Understand the backend algorithm
Name of the student: Padiri Lokesh signature of the Guide:
Registration no: 123006093
v
ABSTRACT
Phishing is one of the most common and dangerous attacks among cybercrimes. These
attacks aim to steal the information used by individuals and organizations. The phishing
website will appear the same as the legitimate website and directs the user to a page to
enter the personal details of the user on the fake website. Phishing websites contain various
hints in their contents and web browser-based information. Some of the previous work in
phishing detection is the Machine Learning approach, fuzzy logic. Machine Learning
algorithms are capable of handling large datasets efficiently and the performance of
machine learning-based techniques relies on the types of classifiers, and features used. In
the proposed method Machine Learning is used to implement the Ip system that can detect
the phishing website. The 30 features have been considered as the features of the website.
The Django server is implemented for implementing the API call with the browser extension
developed for user interaction. The user interacts with the system with the browser
extension and in the browser itself, the user gives the website URL and checks whether the
website is phishing or legitimate.
Specific Contribution:
• Studied the basic architecture required

• Acquired data set and shortlisted required data.
• Storing the data in the database in my SQL
• Training and testing the data set
Specific Learning:
• Understanding the Mechanism of Phishing URL

• Learned the MYSQL and SQl YOG Tools
• Learned training the required model using python
Keywords: Phishing, legitimate website, Extreme Learning Machine, Features

classification, Information security.
Name of the student: Abhishek Dhanopia signature of the Guide:
Registration no:123006904
vi
TABLE OF CONTENT
TITLE Page. No
BONAFIDE CERTIFICATE ii
DECLARATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
LIST OF FIGURES viii
LIST OF TABLES ix
ABBREVIATION x
1. INTRODUCTION 1
1.1 BREIF OF PHISHING 1

1.2 TYPES OF PHISHING 1
1.3 TYPICAL PHISHING 3
1.4 EXISTING SYSTEM 3
1.5 PROPOSED SYSTEM 4
2. LITERATURE SURVEY 5
3. OBJECTIVE 6
4. METHODOLOGY 7
4.1 Implementation 7
4.2Features 7
4.3Address bar-based features 7
4.4 Abnormal-based features 9
5. RESULTS AND DISCUSSION 16
6. CONCLUSIONS AND FUTURE PLANS 18
7 REFERENCES 19
7.1 Similarity Check Report 21
vii
LIST OF FIGURES
FIGURE NO TITLE Page no
4.1 Use case diagram 13
4.2 Class diagram 13
4.3 Sequence Diagram 14
4.4 Collaboration Diagram 14
5.1 GUI of Home page 15
5.2 Load of data set 15
5.3 View of data set 16
5.4 Training of data set 16
5.5 Graph of various Algorithm 17
5.6 Prediction of results 17
viii
LIST OF TABLES
Table number Table Name Page number
2.1 Literature survey 5
4.1 Address bar based features 8
and its condition
4.2 Abnormal based features and 9
its condition
4.3 Domain based features and 10
its condition
ix
ABBREVATION
ML- Machine learning
DL- Deep Learning
HTTPS- Hyper Text Transfer Protocol
URL-Uniform Resources Locator
DNS-Domain Nameserver
x
CHAPTER 1
INTRODUCTION
1. INTRODUCTION TO PHISHING
Phishing is defined as, the attempt to obtain sensitive information such as usernames, passwords,
and credit card details, often for malicious reasons, by masquerading as a trustworthy entity in an
electronic communication. Trying to get unsuspecting users to give up their money, credentials or
privacy is a particularly insidious form of social engineering that can have disastrous effects on
people's lives.
The word phishing is an evolution of the word fishing by hackers who frequently replace the
letter 'f' with the letter 'ph' in a typed hacker dialect. The word arises from the fact that
users, or phish, are lured by the mimicked communication to a trap or hook that retrieves
their confidential information.
In the last few years, there has been an alarming trend of an increase in both the number
and sophistication of phishing attacks. As the definition suggests, phishing is a novel cross-
reed of social engineering and technical attacks designed to elicit confidentially; information
from the victim. The collected information is then used for several nefarious deeds including
fraud, identity theft, and corporate espionage. The growing frequency and success of these
attacks have d several researchers and corporations to take the problem seriously. They
have attempted to address it by considering new countermeasures and researching new
and novel techniques to prevent phishing.
1.2 Types of phishing attacks:
Despite their many varieties, the common denominator of all phishing attacks is their use of
a fraudulent pretense to acquire valuables. Some major categories include are shown in
Figure 1 and descriptions of different types of phishing attacks are mentioned below:
1
Figure 1.1: Types of Phishing Attacks
Spear Phishing:
Spear phishing is a targeted phishing attack that involves highly customized lure
content. To perform spear phishing, attackers will typically do reconnaissance work,
surveying social media and other information sources about their intended target.
Spear phishing may involve user logging into fake websites and opening documents
by clicking on links that automatically install malware.
Whaling:
Whaling is a form of phishing in which the attack is directed at high-level or senior
executives within specific companies with the direct goal of gaining access to their
credentials and/or bank information. The content of the email may be written as a
legal subpoena, customer complaint, or other executive issue.
Clone phishing:
Clone phishing is a type of phishing attack whereby a legitimate, and previously delivered,
email containing an attachment or link has had its content and recipient addresses taken and
used to create an almost identical or cloned email. The attachment or link within the email is
replaced with a malicious version and then sent from an email address spoofed to appear to
come from the original sender. It may claim to be a resend of the original or an updated
version of the original. Typically this requires either the sender or recipient to have been
previously hacked for the malicious third party to obtain the legitimate email.
Link manipulation
Misspelled URLs or the use of subdomains are common tricks used by phishers.
Even digital certification does not solve this problem because it is quite possible for a phisher
to purchase a valid certificate and subsequently change content to spoof.
Filter evasion
Phishers have sometimes used images instead of text to make it harder for anti-phishing filters to
detect the text commonly used in phishing emails.
Website forgery
Some phishing scams use JavaScript commands to alter the address bar of the website The
fraudulent website that supports the phishing email is designed to mirror the legitimate
website it is purporting to be. The fraudsters use multiple methods to do this, including using
genuine-looking images and text, disguising the URL in the address bar, or removing the
address bar altogether. The purpose of the website is to trick consumers into thinking they
are at the company's genuine website and giving their personal information to the trusted
company they think they are dealing with.
Covert redirect
Covert redirect is a subtle method to perform phishing attacks that makes links appear
legitimate. out redirect a victim to an attacker's website. The flaw is usually masqueraded
under a log-in popup based on an affected site's domain.
Normal phishing attempts can be easy to spot because the malicious page's URL will usually
be different from the real site link. For covert redirect, an attacker could use a real website
instead y corrupting the site with a malicious login popup dialogue box.
Social engineering:
Users can be encouraged to click on various kinds of unexpected content for a variety of
technical and social reasons.
For example, a malicious attachment might masquerade as a benign linked Google Doc.
2
Alternatively, users might be outraged by a fake news story, click a link, and become
infected.
Voice phishing:
Not all phishing attacks require a fake website. Messages that claimed to be from a bank
told users to dial a phone number regarding problems with their bank accounts .
1.3 A TYPICAL PHISHING ATTACK:
Currently, the most common form of phishing attacks includes three key components: the
lure, the hook, and the catch. They are as described below.
The Lure consists of a phisher spamming a large number of users with an email message
that typically, in a convincing way appears to be from some legitimate institution that has a
presence on the internet. The message often uses a convincing story to encourage the user
to follow a URL hyperlink encoded in the email to a website controlled by the phisher and to
provide it with certain requested information. The social engineering aspect of the attack
normally makes itself known in the lure, as the spam gives some legitimate-sounding reason
for the user to supply confidential information to the website that is hyperlinked by the spam.
The Hook typically consists of a website that mimics the appearance and feel of that of a
legitimate target institution. In particular, the site is designed to be as indistinguishable from
the targets as possible. The purpose of the hook is for victims to be directed to it via the lure
portion of the attack and for the victims to disclose confidential information to the site.
Examples of the type of confidential information that is often harvested include usernames,
passwords, social- security numbers in the U.S.(or other national ID numbers in other parts
of the world), billing addresses, checking account numbers, and credit card numbers. The
Hook website is generally designed both to convince the victim of its legitimacy and to
encourage the victim to provide confidential information to it with as little suspicion on the
victim's part as possible.
The Catch is the third portion of the phishing attack, which some alternatively call the kill. It
involves the phisher or a cashier making use of the collected information for some nefarious
purpose such as fraud or identity theft.
1.4 EXISTING SYSTEM:
Whereas in the case of the existing system means that what is the previous system
says a Manual human intervention is not that much applicable and error-prone.
Legacy and Conventional Data Mining Algorithms can’t deal with huge volumes of
data, slower and more inaccurate.
1.5 PROPOSED SYSTEM:
Machine Learning is cutting edge and trending for different kinds of diverse
applications in a society where it can deal with tons of data, refined and revised
algorithms, and available heavy processing power in terms of GP algorithms, and
3
available heavy processing power in terms of GP
Architecture:
4
CHAPTER 2
Literature Survey
Author Year of publication Outcomes
J. Shad and S. Sharma 2021 Hyperlinks features that can

be used to differentiate
between defective and non-
defective websites. There are
six main approaches such as
heuristic, blacklist, Fuzzy
Rule, machine learning,
image processing, and
CANTINA-based approach
Y. Sönmez, T. Tuncer, H. 2019 The objective of phishing

Gökal, and E. Avci website URLs is to purloin
personal information like
user names, passwords, and
online banking transactions.
Phishers use websites that
are visually and semantically
similar to those real
websites.
T. Peng, I. Harris, and Y. 2008 Phishing attacks are one of

Sawa, the most common and least
defended security threats
today. approach is novel
compared to previous work
because it focuses on the
natural language text
contained in the attack,
performing semantic
analysis of the text to detect
malicious intent.
5
CHAPTER 3
OBJECTIVE
Public facing difficulties by clicking the URL
Need for a solution using digital technologies such as Machine-Learning
Objectives: Create a Website Platform using Python and ML algorithms
The platform will analyze ML algorithms for the best accuracy.
The platform will return the status of the URL aim to predict Phishing or
legitimate website
6
CHAPTER-4
METHODOLOGY
4.1 IMPLEMENTATION
To implement the Phishing website detection system the dataset is collected, the dataset I
collected from the Phishing tank where the phishing website list can be obtained. From the
phishing tank, 30,647 phishing websites are obtained. The legitimate website list is obtained
from the Alexa Ranking website from which the 58,000 legitimate websites can be obtained.
The other source is the UCI repository which contains the 11,045 website list which contains
both legitimate and phishing websites.
4.2 Features
The dataset is maintained with the 30 features of the websites. The features of the website
that are considered are classified as follows:
Address bar-based features
Abnormal-based features
HTML and javascript based features
Domain-based features
4.3Address bar-based features:
The address bar-based features can be retrieved by analyzing the URL of any website. There
are about 12 address bar-based features. They are mentioned in this section. The basic
structure of any URL is in the below format:
protocol://subdomain. Domain name. country code/directory/filename
URL is the first thing to analyze a website to decide whether it is a phishing URL or not. URI
of phishing domains has some distinctive points. Features that are related to these points an
obtained when the URL is processed. Some of the URL-Based Features are given below.
Digit count in the URL.
The total length of the URL.
Checking whether the URL is Typosquatting or not.
Checking whether it includes a legitimate brand name or not.
Using the IP address: If an IP address is used as an alternative of the domain name,using
the hexadecimal code. this type of URLS are not legitimate. The rule for this is
Long URL: Attackers use the long urls to hide the suspicious part in the address bar.
Tiny URL: URL shortening is a method on the "World Wide Web" in which a URL may be
made considerably smaller in length and still lead to the required webpage. This is
accomplished by means of an "HTTP Redirect" on a domain name that is short, which links
to the webpage that has a long URL
URL's having" @ "symbol: Using "@" symbol in the URL leads the browser to ignore
everything preceding the "@" symbol and the real address often follows the "@" symbol. The
occurrence of "//" symbol,prefix or suffix(-) symbol, number of dots etc. are comes under the
address bar based features
7
FEATURES CONDITION
Using IP address IF [domain part has IP address → phishing]
Otherwise→ legitimate}
Long URL IF{ URL length <54→ legitimate
{else if URL length → Suspicious
{otherwise→ phishing
Tiny URL IF{tiny URL→ phishing
{other wise→ Legitimate
URL having “@” symbol IF{ URL having@ symbol→ phishing
{otherwise→ Legitimate
Redirecting using”//” IF{ The position of the last occurrence of //in the URL>7→
→ phishing
{Otherwise→ Legitimate
Adding prefix or suffix IF{ Domain part includes(-)symbol→ Phishing
separated by(-) the domain {otherwise→ Legitimate
Subdomain and multi IF{ Dots in domain part=1→ Legitimate

Sub domain Dots in domain part=2→suspicious
Otherwise→ Phishing
HTTPS IF{ Use https anjd issuer is trusted and age of
certificate> 1 year→ legitimate
{using https and issuer is not trusted → sus
Otherwise→ Phishing
Domain Registration Length If{ Domain expires<1 year → phishing
{ Otherwise→ Phising
Favicon IF { favicon loaded from external domain→
phishing
Otherwise→ Legitimate
Using non- standard port IF{ port is of the preferred status →
phishing
Exisitence of”HTTPS” in IF{ using http token in domain part of the url
Domain part → phishing
Table 4.1: Address bar based features and its condition
8
4.4 Abnormal based features
The features which are unusual such as request URL ,links , submitting
information to email etc all the kind of features comes under the abnormal
based features
Features CONDITIONS
Request URL IF{% of request URL <22% → Legitimate
{% of request URL >22% and 61% →
Suspicious
{ otherwise → phishing
URL of anchor IF{% of anchor <31% → Legitimate
{% of URL of anchor >31% and 67% →
Suspicious
{ otherwise → phishing
Links in< meta > scripts <link >tags IF{%links <17% → Legitimate
% of links >17% and 18% →
suspicious
Otherwise → phishing
Server from handler (SFH) IF{ SFH is about : blank or is empty →
phishing
Otherwise→ legitimate
Submitting information to email IF1{ using mail() or mail to : Function
to submit user→ phishing
Abnormal URL IF{ the hostname is not included in
url→ phishing
4.5 Domain-based features:

The features related to the domain of the website come under this category. To retrieve
these kinds of features, we should depend on third-party websites. The purpose of phishing
domain detection is to detect phishing domain names. Therefore, passive queries related to
domain names that we want to classify as phishing or not provide useful information to us.
Some useful Domain-Based Features are given below.
Is domain name or its IP address in blacklists of well-known reputation services?
How many days passed since the domain was registered?
Is the registrant name hidden?
9
Page-Based Features are using information about pages which are calculated reputation
ranking services. Some of these features give us information about how much reliable a
website is. The features like page rank, global rank etc.
The below table mentions the domain-based features and its condition. To retrieve this of any
website application should be online.
Feature Condition
Age of domain IF { Age of domain > 6 → Legitimate
otherwise → phishing
DNS record If { no DNS record for domain→ phishing

otherwise → legitimate
Website traffic If{Website rank < 1000000 → legitimate
Website rank > 1000000 → suspicious
Page rank IF{ page rank < 0.2 → phishing
Google index IF (Webpage indexed by google →

legitimate
Number of links pointing to page IF { link pointing to webpage=0→
phishing
IF link pointing to page > 0 and ≤ 2 →
suspicious
Otherwise → legitimate
Statical reports based on features IF{ host belongs to phishing IP→ phishing
Input Design:
In an information system, input is the raw data that is processed to produce output. During
the input design, the developers must consider the input devices such as PC, MICR, OMR,
etc.
Therefore, the quality of system input determines the quality of system output. Well-
designed input forms and screens have following properties −
• It should serve specific purpose effectively such as storing, recording, and retrieving
the information.
10
• It ensures proper completion with accuracy.
• It should be easy to fill and straightforward.
• It should focus on user’s attention, consistency, and simplicity.
• All these objectives are obtained using the knowledge of basic design principles
regarding −
o What are the inputs needed for the system?
o How end users respond to different elements of forms and screens.
Objectives for Input Design:
The objectives of input design are −
• To design data entry and input procedures
• To reduce input volume
• To design source documents for data capture or devise other data capture methods
• To design input data records, data entry screens, user interface screens, etc.
• To use validation checks and develop effective input controls.
Output Design:
The design of output is the most important task of any system. During output design,
developers identify the type of outputs needed, and consider the necessary output controls
and prototype report layouts.
Objectives of Output Design:
The objectives of input design are:
• To develop output design that serves the intended purpose and eliminates the
production of unwanted output.
11
• To develop the output design that meets the end user’s requirements.
• To deliver the appropriate quantity of output.
• To form the output in appropriate format and direct it to the right person.
• To make the output available on time for making good decisions.
MODULES:
1. User:
1.1 View Home page:
Here user view the home page of the phishing website prediction web application.
1.2 View Upload page:
In the about page, users can learn more about the phishing prediction.
1.3 Input Model:
The user must provide input values for the certain fields in order to get results.
1.4 View Results:
User view’s the generated results from the model.
1.5 View score:
Here user have ability to view the score in %
2. System
2.1 Working on dataset:
System checks for data whether it is available or not and load the data in csv files.
2.2 Pre-processing:
Data need to be pre-processed according the models it helps to increase the accuracy of
the model and better information about the data.
2.3 Training the data:
After pre-processing the data will split into two parts as train and test data before
training with the given algorithms.
2.4 Model Building
To create a model that predicts the personality with better accuracy, this module will
help user.
2.5 Generated Score:
2.6 Here user view the score in %
2.7 Generate Results:
We train the machine learning algorithm and calculate the personality prediction.
12
4.1 USE case diagram
4.2 CLASS DIAGRAM
13
4.3 SEQUENCE DIAGRAM
4.4COLLABORATION DIAGRAM:
14
CHAPTER-5
RESULTS AND DISCUSSION

5.1 Home Page:
Here user view the home page of phishing website prediction web application.
5.2 Load:
In the load page, users can load the website dataset.
15
5.3 View:
Here we can see the uploaded data set.
5.4 Model:
Here we can train our data using different algorithm.
16
5.5 GRAPHS:
5.6 Prediction:
This interface shows the detection result that whether the website is a phishing website or
legitimate.
17
CHAPTER-6
CONCLUSION AND FUTURE SCOPE

In this project, the phishing website detection method is implemented with the help of the
machine learning techniques. The proposed system is able to detect the phishing website and
the legitimate website properly. The browser extension developed makes the user easily
interact with the system. The server implemented is deployed in the internet then the user
does not need to bother about the server. He just interacts with the browser extension. The
browser extension will connect to the server specified in the extension and in the server with
the help of the classifier which is trained in hand with the dataset collected will predict the
given feature.
The future scope of the project is that they may do research on the features and improve the
features list by updating by adding the new features or by editing the existing features. They
may implement a better machine learning model which can be obtained by collecting a large
dataset of phishing sites and legitimate sites than in the proposed and in future scope the
mobile application which can detect the phishing website will be helpful
18
CHAPTER-7
REFERENCES
1. J. Shad and S. Sharma, “A Novel Machine Learning Approach to Detect Phishing

Websites Jaypee Institute of Information Technology,” pp. 425–430, 2018.
2. Y. Sönmez, T. Tuncer, H. Gökal, and E. Avci, “Phishing web sites features

classification based on extreme learning machine,” 6th Int. Symp. Digit. Forensic
Secur. ISDFS 2018 - Proceeding, vol. 2018–Janua, pp. 1–5, 2018.
3. T. Peng, I. Harris, and Y. Sawa, “Detecting Phishing Attacks Using Natural Language
Processing and Machine Learning,” Proc. - 12th IEEE Int. Conf. Semant. Comput.
ICSC 2018, vol. 2018–Janua, pp. 300–301, 2018.
4. M. Karabatak and T. Mustafa, “Performance comparison of classifiers on reduced

phishing website dataset,” 6th Int. Symp. Digit. Forensic Secur. ISDFS 2018 -
Proceeding, vol. 2018–Janua, pp. 1–5, 2018.
5. S. Parekh, D. Parikh, S. Kotak, and P. S. Sankhe, “A New Method for Detection of

Phishing Websites: URL Detection,” in 2018 Second International Conference on
Inventive Communication and Computational Technologies (ICICCT), 2018, vol. 0,
no. Icicct, pp. 949–952.
6. K. Shima et al., “Classification of URL bitstreams using bag of bytes,” in 2018 21st
Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN),
2018, vol. 91, pp. 1–5.
7. A. Vazhayil, R. Vinayakumar, and K. Soman, “Comparative Study of the Detection of

Malicious URLs Using Shallow and Deep Networks,” in 2018 9th International
Conference on Computing, Communication and Networking Technologies, ICCCNT
2018, 2018, pp. 1– 6.
8. W. Fadheel, M. Abusharkh, and I. Abdel-Qader, “On Feature Selection for the

Prediction of Phishing Websites,” 2017 IEEE 15th Intl Conf Dependable, Auton.
Secur. Comput. 15th Intl Conf Pervasive Intell. Comput. 3rd Intl Conf Big Data Intell.
Comput. Cyber Sci. Technol. Congr., pp. 871–876, 2017.
9. X. Zhang, Y. Zeng, X. Jin, Z. Yan, and G. Geng, “Boosting the Phishing Detection
Performance by Semantic Analysis,” 2017.
10. L. MacHado and J. Gadge, “Phishing Sites Detection Based on C4.5 Decision Tree
Algorithm,” in 2017 International Conference on Computing, Communication,
Control and Automation, ICCUBEA 2017, 2018, pp. 1–5.
19
CHAPTER-7
APPENDIX
TITLE
Phishing website detection using Machine Learning

URL OF CODE
https://drive.google.com/file/d/1wmrxTIRsBvYr9-EdsSYQm5m5_BcxEAN6/view?usp=share_link
20
7.1 Similarity Check Report
21

Phishing Website Detection Using Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Phishing Website Detection Using Machine Learning

Uploaded by

Copyright:

Available Formats

PHISHING WEBSITE DETECTION

USING MACHINE LEARNING

Thesis submitted to the SASTRA Deemed to be University in partial

B. Tech. Electronics & Instrumentation Engineering

(Reg. No: 123006093)

(Reg. No: 123006904)

SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING

Signature of Project Supervisor :

Name with Affiliation : Dr. V.S.Balaji,Senior Asst Professor,SEEE

Final Project Viva-voce held on____________________________

Signature of the candidates:

Name of the candidates: Padiri Lokesh

We would like to thank our Honorable Vice-Chancellor Dr. S. Vaidhyasubramaniam for

We extend our sincere thanks to Dr. R. Chandramouli, Registrar, SASTRA Deemed to be

• Understand the distinct security problems by phishing website detection

• Understand the backend algorithm

Name of the student: Padiri Lokesh signature of the Guide:

Registration no: 123006093

• Studied the basic architecture required

• Understanding the Mechanism of Phishing URL

Keywords: Phishing, legitimate website, Extreme Learning Machine, Features

Name of the student: Abhishek Dhanopia signature of the Guide:

LIST OF FIGURES viii

1.1 BREIF OF PHISHING 1

1.3 TYPICAL PHISHING 3

1.4 EXISTING SYSTEM 3

1.5 PROPOSED SYSTEM 4

4.4 Abnormal-based features 9

5. RESULTS AND DISCUSSION 16

6. CONCLUSIONS AND FUTURE PLANS 18

4.1 Use case diagram 13

4.2 Class diagram 13

4.3 Sequence Diagram 14

4.4 Collaboration Diagram 14

5.1 GUI of Home page 15

5.2 Load of data set 15

5.3 View of data set 16

5.4 Training of data set 16

5.5 Graph of various Algorithm 17

5.6 Prediction of results 17

ML- Machine learning

DL- Deep Learning

HTTPS- Hyper Text Transfer Protocol

URL-Uniform Resources Locator

1.2 Types of phishing attacks:

1.5 PROPOSED SYSTEM:

Author Year of publication Outcomes

J. Shad and S. Sharma 2021 Hyperlinks features that can

Y. Sönmez, T. Tuncer, H. 2019 The objective of phishing

T. Peng, I. Harris, and Y. 2008 Phishing attacks are one of

Public facing difficulties by clicking the URL

Need for a solution using digital technologies such as Machine-Learning

Objectives: Create a Website Platform using Python and ML algorithms

The platform will analyze ML algorithms for the best accuracy.

Subdomain and multi IF{ Dots in domain part=1→ Legitimate

Table 4.1: Address bar based features and its condition

4.5 Domain-based features:

Is domain name or its IP address in blacklists of well-known reputation services?

How many days passed since the domain was registered?

Is the registrant name hidden?

DNS record If { no DNS record for domain→ phishing

Google index IF (Webpage indexed by google →

• It should be easy to fill and straightforward.