You are on page 1of 19

Government Polytechnic Khamgaon

Computer Department

Capstone Project Execution


on
Phishing Website Detection System
Presented By -

2100210097 27 Aasawari Kshirsagar


2100210098 28 Rasika Majgaonkar Guided By: Prof. V. M. Bande
2100210112 37 Vaishnavi Sable
2100210125 49 Tanvi Wankhede
CONTENTS

 Introduction
 Result Analysis
 Project Overview
 Demo
 Problem Statement
 Challenges Encountered
 Methodology
 Lessons Learned
 Tools and Technologies
 Conclusion
 Project plan and Timeline
 References
 Models Implemented
INTRODUCTION

 Phishing Website attack is a type of cyber threat where attackers create


deceptive websites that mimic legitimate ones, aiming to trick users into
divulging sensitive information.

 The ultimate goal of a phishing attack is to exploit the victim's trust and
obtain sensitive information that can be used for fraudulent activities,
unauthorized access, or identity theft.

 Phishing website detection involves the use of machine learning


techniques to identify and block websites.
PROJECT OVERVIEW

 The Phishing Website Detection project aims to create a robust system


that accurately identifies whether a user-entered website is a phishing or
not.

 The project aims to improve the accuracy of identifying phishing


websites compared to existing models, addressing the growing social
issue of increased phishing attacks despite strong security measures.

 The ultimate objective is to contribute to overcoming this social problem


by implementing a highly effective phishing detection system
PROBLEM STATEMENT
METHODOLOGY

01 02 03 04

Model Deployment and


Data Collection Feature Extraction Implementation Monitoring
Utilized the Kaggle dataset as our We focus on feature selection and and Training Used Python Flask for integration
primary source of data. engineering. We carefully choose Implemented machine learning so that we can get a domain to
Preprocess the dataset by features that are highly indicative models using the selected Deploy our website on the
handling missing values, of phishing behavior. By selecting features. Trained these models on Internet. We can also embed our
removing duplicates, and and engineering these features, the preprocessed dataset to learn software in browsers by providing
normalizing features to ensure we aim to provide our models patterns and relationships extension.
data quality and consistency with the necessary information to between features and phishing
make accurate predictions. behavior.
TOOLS AND TECHNOLOGIES

 Anaconda Environment with Python : Anaconda provides a convenient environment for managing Python
packages and dependencies.

 Python Flask, HTML, CSS, JS : For designing the user-interface and backend integration.

 Machine Learning with Python Libraries : for Training our model using Scikit-learn's algorithms and evaluate its
performance. Once trained, integrate the model into your Flask application to perform real-time detection.
EXECUTION PLAN
Week 9
Week 2 6/3/24 to 20/3/24
8/1/24 to 7/1/24 We Performed Testing of
We studied different all the models
datasets and decided implemented and selected
which dataset to use. the best working model.

Week 1 Week 5 Week 11


1/1/24 to 7/1/24 7/2/24 to 6/3/24 20/3/24 to 31/3/24
We created the user We Trained all the We completed the
interface for our selected models on the Documentation
project best possible features.
CHALLENGES ENCOUNTERED

1. Finding Suitable Environment -

Anaconda is one of the best environment as it already includes most of the pre
installed libraries such as scikit learn,pandas,etc

2. Accuracy -

We use technique called hyperparameter tuning to increase accuarcy of the algorithms and
to find out the parameters that contributes maximum to the accuracy
MODELS IMPLEMENTED

• Ensemble Technique

1] Bagging
Random Forest Algorithm
XGBOOST

• Ensemble Technique
2] Boosting Diagram:

Not Phishing
LOGISTIC REGRESSION

• Logistic regression is a statistical method used for binary classification by estimating the probability of a binary
outcome based on one or more predictor variables.
K - NEAREST NEIGHBOUR (KNN)

• The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric,
such as Euclidean distance.
RESULT ANALYSIS

 We have implemented few classification models to predict phishing websites.

 As the Random Forest Classifier and XG Boost classifier has performed better than other models but XG Boost

classifier gives better accuracy with real time environment so we choose XG Boost classifier as our final model.

Algorithms Accuracy
Random Forest 99.97%
XG Boost Classifier 99.56%
Logistic Regression 93%

KNN 91%
DEMO
LESSONS LEARNED

 Python Proficiency: Acquired proficiency in Python to comprehend machine learning algorithms, leveraging essential libraries such as
NumPy, Pandas, Matplotlib, and Scikit-learn. This foundation facilitated the implementation and understanding of complex machine learning
models.

 Environment Choice: Opted for Anaconda as the primary environment due to its efficient package management system and robust support for
data science tools. This choice streamlined the setup process and ensured compatibility with project requirements.

 Dataset Selection and Preprocessing: Identified and acquired a suitable dataset tailored to the project's requirements. Prioritized data
preprocessing to enhance data quality and ensure optimal input for model training, laying a solid foundation for accurate and reliable results.

 Hyperparameter Tuning: Recognized the significance of hyperparameters in model performance and allocated ample resources and time for
hyperparameter tuning. This proactive approach enabled the fine-tuning of model parameters, enhancing overall predictive accuracy and
robustness.
CONCLUSION

 Our testing showed that XGBoost outshined Random Forest, Logistic Regression, and KNN in our phishing website
detection system.
 We chose to implement XGBoost due to its superior performance, leading to an outstanding accuracy of 99.56%.
 By leveraging XGBoost, we not only achieved exceptional accuracy but also met all project deadlines, successfully
executing our phishing website detection system.
REFERENCES

 https://ieeexplore.ieee.org/document/9730579
 https://ieeexplore.ieee.org/document/10169697
 https://ieeexplore.ieee.org/document/10249799
 https://ieeexplore.ieee.org/document/9824544
 https://ieeexplore.ieee.org/document/10049452
THANK YOU

You might also like