Professional Documents
Culture Documents
Computer Department
Introduction
Result Analysis
Project Overview
Demo
Problem Statement
Challenges Encountered
Methodology
Lessons Learned
Tools and Technologies
Conclusion
Project plan and Timeline
References
Models Implemented
INTRODUCTION
The ultimate goal of a phishing attack is to exploit the victim's trust and
obtain sensitive information that can be used for fraudulent activities,
unauthorized access, or identity theft.
01 02 03 04
Anaconda Environment with Python : Anaconda provides a convenient environment for managing Python
packages and dependencies.
Python Flask, HTML, CSS, JS : For designing the user-interface and backend integration.
Machine Learning with Python Libraries : for Training our model using Scikit-learn's algorithms and evaluate its
performance. Once trained, integrate the model into your Flask application to perform real-time detection.
EXECUTION PLAN
Week 9
Week 2 6/3/24 to 20/3/24
8/1/24 to 7/1/24 We Performed Testing of
We studied different all the models
datasets and decided implemented and selected
which dataset to use. the best working model.
Anaconda is one of the best environment as it already includes most of the pre
installed libraries such as scikit learn,pandas,etc
2. Accuracy -
We use technique called hyperparameter tuning to increase accuarcy of the algorithms and
to find out the parameters that contributes maximum to the accuracy
MODELS IMPLEMENTED
• Ensemble Technique
1] Bagging
Random Forest Algorithm
XGBOOST
• Ensemble Technique
2] Boosting Diagram:
Not Phishing
LOGISTIC REGRESSION
• Logistic regression is a statistical method used for binary classification by estimating the probability of a binary
outcome based on one or more predictor variables.
K - NEAREST NEIGHBOUR (KNN)
• The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric,
such as Euclidean distance.
RESULT ANALYSIS
As the Random Forest Classifier and XG Boost classifier has performed better than other models but XG Boost
classifier gives better accuracy with real time environment so we choose XG Boost classifier as our final model.
Algorithms Accuracy
Random Forest 99.97%
XG Boost Classifier 99.56%
Logistic Regression 93%
KNN 91%
DEMO
LESSONS LEARNED
Python Proficiency: Acquired proficiency in Python to comprehend machine learning algorithms, leveraging essential libraries such as
NumPy, Pandas, Matplotlib, and Scikit-learn. This foundation facilitated the implementation and understanding of complex machine learning
models.
Environment Choice: Opted for Anaconda as the primary environment due to its efficient package management system and robust support for
data science tools. This choice streamlined the setup process and ensured compatibility with project requirements.
Dataset Selection and Preprocessing: Identified and acquired a suitable dataset tailored to the project's requirements. Prioritized data
preprocessing to enhance data quality and ensure optimal input for model training, laying a solid foundation for accurate and reliable results.
Hyperparameter Tuning: Recognized the significance of hyperparameters in model performance and allocated ample resources and time for
hyperparameter tuning. This proactive approach enabled the fine-tuning of model parameters, enhancing overall predictive accuracy and
robustness.
CONCLUSION
Our testing showed that XGBoost outshined Random Forest, Logistic Regression, and KNN in our phishing website
detection system.
We chose to implement XGBoost due to its superior performance, leading to an outstanding accuracy of 99.56%.
By leveraging XGBoost, we not only achieved exceptional accuracy but also met all project deadlines, successfully
executing our phishing website detection system.
REFERENCES
https://ieeexplore.ieee.org/document/9730579
https://ieeexplore.ieee.org/document/10169697
https://ieeexplore.ieee.org/document/10249799
https://ieeexplore.ieee.org/document/9824544
https://ieeexplore.ieee.org/document/10049452
THANK YOU