You are on page 1of 6

2022 IEEE 14th International Conference on Computer Research and Development

An Ensemble Method for Phishing Websites


Detection Based on XGBoost
Jiaqi Gu* Hui Xu
Dept of Computer Science School of Engineeering and Applied Science
Purdue University Harvard University
2022 14th International Conference on Computer Research and Development (ICCRD) | 978-1-7281-7721-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCRD54409.2022.9730579

West Lafayette, IN, US Cambridge, MA, US


*Corresponding author: gu222@purdue.edu
hui.xu@post.harvard.edu

Abstract—Nowadays, the internet is spreading widely around single models which process the training data in an efficient way
the world. Beside the benefits the internet brings to its users, there and output relatively decent predictions. However, when facing
are many potential harms such as phishing scams. These scams act real-world large-scale problems like analyzing phishing URLs,
like normal websites, stealing confidential information from users. they can no longer work accurately. Instead of sticking with
To protect the privacy of the internet users from being imperiled these single models, there is a far more feasible option: ensemble
by scams, it is necessary to find a way to detect those phishing methods. Ensemble methods are basically a combination of
websites. When considering detection, Machine Learning methods multiple single models to achieve a better performance. How
that have the best possible performances are promising techniques these methods do the combination differs them. After the
to resolve the problem since they can make reliable predictions or
merging phase, an optimal predictive model is produced which
classifications for unseen scenarios upon given data. In this paper,
I present a ensemble model to detect phishing websites using URL
gives a more compelling predictive accuracy with some
features. I used the dataset named “Phishing website Detector – acceptable cost of efficiency, or runtime. This paper is going to
phishing website dataset” from Kaggle. Next, several models using propose an original set of single models combined by an
all types of mainstream ensemble algorithms (such as stacking, ensemble method and compare it with other high-performance
boosting, and bagging) were built. Finally, different methods ensemble methods presented in previous research.
measuring the performance of models were used. The reason why
only ensemble models are selected as the methods to be II. RELATED WORK
implemented is mainly due to their overall strong performance. As The ensemble methods presented by the researchers in
for the results, the proposed XGBoost model combining Random previous studies are briefly explained in this section.
Forest and K-Nearest Neighbors outperforms all other models
(Random Forest, AdaBoost, Parameter-tuned XGBoost, Stacking, Stacking Algorithm combining Random Forest with K-
and Voting), achieving an accuracy of 99.74% on training data Nearest Neighbors and Bagging Algorithm, this set of
and 96.44% on testing data. The confusion matrix generated combination was presented by Ammara Zamir and his
according to the performance of the XGBoost model shows that it colleagues [6]. The model outperformed all other types of
correctly predicts 1523 positive terms out of 1594 in total and 1995 classifiers built for comparison and achieved an accuracy of 97.4%
negative terms out of 2054 in total. This resulting method can in detecting phishing websites. Similar to this high performing
effectively detect phishing websites. Furthermore, if integrated outcome, the stacking of GBDT, XGBoost, and LightGBM
into application or web extension, it can protect internet users’ produces a predictive accuracy of 97.3% by Yukun Li and his
privacy from being imperiled when they accidentally step into colleagues [5].
phishing websites traps.
AdaBoost implemented by Anggit F. Nigraha and Luthfia
Keywords—ensemble method, Phishing Websites Detection, Rahman [3] effectively enhanced the performance of phishing
machine learning scams detection. In the finalizing part of the experiment, a
predictive accuracy of 97.4% was achieved on the provided
I. INTRODUCTION dataset which seemed promising compared to other algorithms.
Nowadays the internet is spreading widely around the world. Besides, Abdulhamit Subasi and Emir Kremic [1] compare their
Every internet user can be the victim of the attack from phishing AdaBoost method with MultiBoosting in detecting phishing
websites. These phishing websites are cyber eavesdroppers websites, finding out the supremacy of AdaBoost in predictive
trying to steal confidential information from users stealthily. To performance.
prevent the privacy of the users from being imperiled, a valid Extreme Gradient Boosting, or XGBoost, is presented in the
and robust resolution needs to be presented. Here Machine experiments leaded by Dharani M. [7] and Rakesh Pavan,
Learning is a promising field to address this issue since the Madhumitha Nara, and Suhas Gopinath [10]. The XGBoost
techniques from it can detect potential dangers by learning given model built by the teams of Dharani [7], boasting an accuracy of
data and training predictive models. Mostly used models are 93% in phishing websites detection, outperformed the predictive

978-1-7281-7721-2/22/$31.00 ©2022 IEEE 214


Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.
accuracy from Random Forest which is 91%. Besides, the ∑𝐵
2
𝑏=1(𝑓𝑏 (𝑥)−𝑦𝑝𝑟𝑒𝑑 )
XGBoost model built in another experiment might be the more σ=√ ()
𝐵−1
compelling one since the Rakesh and his colleagues [10]
mentioned a unique optimization method. With this This is the standard deviation of the predictions from the
optimization applied to the XGBoost classifier, the performance results of all single trees.
was taken to an even higher level (an accuracy of 97.08% and a
larger gap between other techniques and this optimized B. The Proposed Model
XGBoost). 1) XGBoost (Extreme Gradient Boosting)
Random Forest was examined in the experiment conducted Extreme Gradient Boosting is an ensemble method that
by S. Jagadeesan and his team [9]. They did a comparison increments several weak learners to generate a strong one to
between Random Forest model and two different SVM models. deliver a better predictive performance. The general algorithm
After the comparison, it was concluded that Random Forest for this method is described below.
boasts an overall better performance (an accuracy of 90.12% on Input:
given dataset) in predicting phishing websites. The other two
SVM models produced an accuracy of 95.19% and 89.63%. Training set {(𝑥𝑖 , 𝑦𝑖 )}𝑁
𝑖=i , a differentiable loss function
𝐿(𝑦, 𝐹(𝑥)), and a learning rate 𝛼.
A voting algorithm was presented by Abdul Basit and his
colleagues [8]. The algorithm took Random Forest (or RF) as Algorithm:
the base learner and was combined with ANN (or Artificial
Neural Network) and C4.5. The model showed a relatively high First, initialize model with a constant value:
predictive accuracy and ROC Area score in phishing which
looks promising in phishing scams detection. 𝐹0 (𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛𝜃 ∑𝑛𝑖=1 𝐿(𝑦𝑖 , 𝜃) ()

III. METHODOLOGIES For weak learner m in all weak learners M, compute


In this section, the basic working principles of the models gradients and hessians:
under combination in the proposed ensemble methods are going
𝜕𝐿(𝑦𝑖 ,𝑓(𝑥𝑖 ))
to be explained. After that, the combining principles and the 𝑔𝑚 (𝑥𝑖 ) = [ ] ()
𝜕𝑓(𝑥𝑖 )
optimization of the method will be clarified. 𝑓(𝑥)=𝑓𝑚−1 (𝑥)

A. Models for Combination 𝜕2 𝐿(𝑦𝑖 ,𝑓(𝑥𝑖 ))


ℎ𝑚 (𝑥𝑖 ) = [ ] ()
1) KNN (K-Nearest Neighbors) 𝜕𝑓(𝑥𝑖 )2 𝑓(𝑥)=𝑓𝑚−1 (𝑥)
K-Nearest Neighbors is a simple algorithm that takes in
training data and classifies unseen data based on a distance After the computation of gradients and hessians, fit a base or
function. The distance function can be any of one from weak learner.
Euclidean, Manhattan, or Minkowski distance functions. In this
case we use Euclidean distance: To Achieve this we need to use the training set
𝑔 (𝑥 )
{𝑥𝑖 , − 𝑚(𝑥𝑖)}𝑁
𝑖=1 by solving the optimization problem below:
ℎ𝑚 𝑖

√∑𝑘𝑖=1(𝑥𝑖 − 𝑦𝑖 )2 () 2
1 𝑔𝑚 (𝑥𝑖 )
𝜙𝑚 = 𝑎𝑟𝑔𝑚𝑖𝑛𝜙=Φ ∑𝑁
𝑖=1 ℎ𝑚 (𝑥𝑖 ) [− − 𝜙(𝑥𝑖 )] (7)
2 ℎ𝑚 (𝑥𝑖 )
The function computes the length of the line segment
between the training data point x and the predicting data point y 𝑓𝑚 (𝑥) = 𝛼𝜙𝑚 (𝑥) ()
in a k dimensional Euclidean space. The computation result is
the distance between the two vector spaces. Based on this, the Then, update the model:
algorithm compares and returns K training points that are closest
to the predicting point. Thereafter, the label of the predicting 𝑓𝑚 (𝑥) = 𝑓𝑚−1 (𝑥) + 𝑓𝑚 (𝑥) ()
point can be determined by the majority label of the K neighbors.
2) RF (Random Forest) After the iterations of m in M, return the output:
Random Forest is a decision tree like ensemble method used
for classification. The training algorithm for Random Forest 𝑓(𝑥) = 𝑓𝑀 (𝑥) = ∑𝑀
𝑚=0 𝑓𝑚 (𝑥) ()
applies the technique of bagging to the tree learners. For
predictions, unseen data points x is passed into all the regression The output stands for the predicted y labels.
trees trained and take the average:
IV. EXPERIMENT
1
𝑦𝑝𝑟𝑒𝑑 = ∑𝐵𝑏=1 𝑓𝑏 (𝑥) () A. Data Set
𝐵
The starting point of the entire experiment is about fetching
where B is the number of trees. and preprocessing data set. The data set named “phishing
website dataset” from Kaggle is first collected. The data set
Additionally, an error, or uncertainty can be computed: contains 11054 instances while each instance has 32 features.

215
Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.
The reason of why sacrificing processing efficiency to use this C. XGBoost Models (The Proposed Methods)
feature rich dataset is that large feature-set provides more After separately experimenting with RF and KNN, these two
detailed learning references for the algorithms according to the are going to be combined as base learners by the ensemble
words from J. Rashid and his colleagues [2]. Besides, this is a algorithm XGBoost. Firstly, the base models of RF and KNN
widely-used dataset since it includes the most standard attributes are defined. The same set of parameters are still being used here
of a typical URL. The detailed explanation of the physical (10 for the number of estimators in RF, Gini impurity for
meanings of those features can be referenced to the related information gain, …; 3 for the number of neighbors in KNN, 30
Kaggle site. As for the data processing, the column of class label for the lead size…). Then, an XGBoost model is built upon the
is first separated from the rest of the dataset to create feature set base learners using the default parameters as well. To be more
and label set. Then the feature set containing all attributes of a specific, the booster used here is a tree booster “gbtree” since
URL is standardized via removing the mean and scaling to unit RF combined is a tree-based model. As for the tree booster, the
variance. The reason why the features are not being sifted like learning rate is set to 0.3 which is the step size shrinkage used in
what Rakesh R and his colleagues [4] did is that all the features, update to prevent overfitting. Another parameter correlated to
after examining the URLs, are important. Thereafter, the overfitting issue is the maximum depth of a tree. This parameter
training and testing sets are generated using the standardized is set to 6 which keeps the model at an acceptable level of
feature set and the label set using a 0.67:0.33 train-test ratio. The complexity, making the model not that likely to overfit as well.
meta-information of the split sets is given below: Also, gamma is set to 0 which means that the minimum loss
reduction required to make a further partition on a leaf node of
TABLE I. TRAIN TEST DISTRIBUTION
the tree is 0. This makes the algorithm on its least conservative
Train Test level. Furthermore, the number of threads used to run the model
is set to be the maximum number of threads available. This
Phishing 3330 1567 defines the running efficiency. What is more, the subsample
ratio of the training instances taking in by the model is set to 1.
Non-phishing 4076 2081 Subsampling occurs once in each boosting iteration. In this case,
during each subsampling, the model samples the entire training
Total 7406 3648 data prior to growing trees. The sampling method here is the
uniform method which indicates that each training instance has
an equal probability of being selected. Thereafter, training and
B. Random Forest (RF) and KNN Models testing data of the data set are passed into the model to train the
classifier and making predictions. Eventually, with the predicted
Next several steps are to train and test the models using the labels produced by the model, an overall performance is
data set as given above. In this step specifically, the base learners measured.
used by the proposed method RF and KNN are separately
trained and tested. For RF, a model with a default number of D. Related Baseline Models
estimators (which is 10) is built. The number of estimators In this step, several high-performance models as mentioned
denotes the number of trees in the forest. As for each tree, the in the related work section are going to be tested. These include:
maximum depth is not limited. Besides the tree number and its
depth, the criteria for the function to measure the quality of a 1) XGBoost (Tree-based models as base learners with
single split (criterion) is set to be “gini” which stands for the specific parameters setting):
information gain standard Gini impurity. As for the splits, the As stated by the team of Dharani [7], the XGBoost model
minimum number of samples required to split an internal node, built here use the tree-based models and linear models (logistic
or min_sample_split as the specific petameter, is set to 2. regression) provided by default as base learners. The objective
Furthermore, for these internal/leaf nodes, the minimum number of the model is binary classification. According to their
of samples required to be at them is set to 1 and the minimum parameter tuning, the number of trees is 1000 and the maximum
weighted fraction of the sum of weights required to be at them depth of each tree being 5. What is more, the learning rate is set
is set to 0. After setting all the major parameters, the training to 0.1 to prevent overfitting. As for the minimum loss reduction
data is passed for model construction, and the testing data is required to make further partitions on leaf nodes of the trees, the
passed for performance measurements. After experimenting value is set to 0. This is the same as the one in the proposed
with RF, a model of KNN is also trained using the data set. The method. Furthermore, the subsample ratio of the training
number of neighbors is set to the default value 3. Beside the instances is set to 0.8 which means that the model randomly
number of neighbors, the uniform weight function is used in samples the training data in a ratio of 8:2 prior to growing trees.
prediction. The function indicates that all points in each The subsample ratio of columns when constructing every single
neighborhood are weighted in an equal manner. What is more, tree is also 8:2. For the weights of those instances, the minimum
the leaf size passed to BallTree or KDTree in the method is set sum of which needed in a child is 1 in this case. If the tree
to 30. This affects the speed of query construction and memory partition step leads to a leaf node with the sum of instance weight
required to store the tree. With the parameters being set up and smaller than this value, the building process will abandon any
training data being passed into the model, the testing data is further partitioning. For the classes of these instances,
passed in for data points classification and the results are considering the case where these classes are unbalanced, the
recorded. parameter that controls the balance of positive and negative
weights is set to 1. Besides, the random number seed is 27 and
the maximum number of threads to run the model is 4 which

216
Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.
limits the model from using too much system resources. After class probabilities. After the values are assigned to each of these
setting the parameters, the training data is passed in for model parameters, the training data is passed into the model. Then the
construction. Then the testing data is passed into the model and testing data is passed into the model to make
specific results are generated. The performance is measured prediction/classification. Eventually, the performance is
according to the results as the last step. measured given the results generate by the model.
2) Voting (RF, Artificial Neural Network, and CART/C4.5): V. RESULTS
As stated by Abdul Basit and his colleagues [8], the Voting
method takes Random Forest, ANN, and CART/C4.5 as the A. RF and KNN and the Proposed Method
leaners/estimators. As for the Random Forest, all the major The overall performances of RF, KNN, and the proposed
parameters are set in the same way as mentioned in the XGBoost are going to compared. Since it is necessary to first
explanation of the proposed method except the number of elucidate that the ensemble method here is better than its base
estimators, or trees. In this case, the number of trees is 100. As learners not in all aspects but in some determining areas. (Note
for ANN, or Artificial Neural Network, the number of input, that the measurements below are all for the testing data set) The
hidden, and output neurons are all set to 10. After constructing way to derive F1 Score will be first explained. (Note that TP
the network, the maximum number of iterations is set to 3000 stands for True Positive, FP stands for False Positive, and FN
and batch size is set to 100. As for the last learner, CART using stands for False Negative)
Decision Tree as the core classifier is selected over C4.5 in this
case. This is because CART is very similar to C4.5 while more 𝐹1 =
𝑇𝑃
()
1
easily to implement. After setting the parameters of these three 𝑇𝑃+ (𝐹𝑃+𝐹𝑁)
2
learners, a combination of the learners by the Voting algorithm
is done. Note that the voting method here is “soft” which TABLE II. PERFORMANCE OF RF, KNN, AND THE PROPOSED XGBOOST
predicts the class label based on the argmax of the sums of the
predicted probabilities. This is for an ensemble of well- RF KNN XGBoost(Proposed)
calibrated classifiers. Then the training data is passed in to build
the Voting model and the testing data is passed into the F1 Score 0.9678 0.9503 0.9643
constructed model thereafter. Lastly, the performance is
measured according to the results generated by the model.
Accuracy 0.9679 0.9503 0.9643
3) Stacking (RF, KNN, and Bagging):
As stated by Ammara Zamir and his colleagues [6], the Runtime 0.5631s 2.2566s 0.2840s
Stacking method combines Random Forest, K-Nearest
Neighbors, and Bagging algorithm together. As for Random
Forest, all its parameters are set to the same values as the ones It is not hard to see from Table 1 that the proposed XGBoost
in the proposed method except the number of trees/estimators. method outperforms KNN in both F1 Score/Accuracy and
In this case, the tree number is 100. As for K-Nearest Neighbors, Runtime. As for RF, although its F1 Score and Accuracy are
all its parameters, except the number neighbors, are also set to higher than the proposed method, it sacrifices its generality (as
the same values as the ones in the proposed method. Different mentioned, it uses 100 estimators). What is more, its Runtime is
from the 3-neighbor learner built previously, here a 5-neighbor slower than the one from the proposed method. To view the
learner is under construction. After setting up the two base performance difference among these three models, their ROC
learners, the final estimator Bagging classifier is implemented. Curves are plotted below (next page):
The base estimator of Bagging classifier itself is Decision Tree
classifier. With this tree structure, the number of trees/base
estimators is set to 10. To train each of these base estimators,
both the number of samples and features to draw from the
training data is set to 1. After setting these major parameters of
Bagging classifier, the Stacking model is built upon the three
learners. Then the training data and testing data are passed into
the model for training and prediction. Thereafter, a measurement
of performance is conducted using the predictive results
produced by the model.
4) AdaBoost (Decision Tree as base learners and default
parameters setting):
As stated by Anggit F. Nigraha and Luthfia Rahman [3], an
AdaBoost with its base estimator as Decision Tree classifier is
constructed. By default, the number of trees/estimators is set to
50. And the learning rate is set to 1.0. A higher learning rate
increases the contribution of each classifier. As for the algorithm
in the model, the real boosting algorithm “SAMME.R” is used
in this case since the base estimator supports the calculation of Fig. 1. ROC Curves of RF, KNN, and the proposed XGBoost

217
Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.
Although the ROC Curves of the three models indicate that After studying Fig 2 above, it is clear that no AUC Scores of
they all have outstanding predictive performance, it is clear in the ROC Curves generated by any of the baseline models
Fig 1 that the curve of the proposed XGBoost model has an AUC surpass the AUC Score from the proposed XGBoost model. To
Score that is almost the same as the one from the Random Forest be more specific, the AUC Score from the parameter tuned
model and is obviously larger than the one from the KNN model. XGBoost is very close to the one from the curve of the proposed
To conclude, considering the results from both Table 1 and Fig model. However, other scores from the curves of the Stacking,
1, the proposed XGBoost method delivers a better overall and Voting models are slightly worse than the score from the
performance than its two base learners RF and KNN. curve of the proposed model. And the score from the AdaBoost
model is obviously inferior under comparison. Therefore,
B. The Proposed Method and the Baseline Models considering the results from both Table 2 and Fig 2, the proposed
After testing out the superiority of the proposed XGBoost XGBoost method delivers a better overall performance than all
method over its own base learners, the predictive results from the related baseline methods.
the baseline models are compared to the one from the proposed
method. VI. CONCLUSION
In this work, a comparison between the proposed method
TABLE III. PERFORMANCES OF THE PROPSOED XGBOOST AND
BASELINE MODELS
(XGBoost using RF and KNN as base learners) with the other
advanced ensemble methods is conducted. Each of these
Accuracy Runtime methods are built upon specific settings of parameters which
effectively boosts the predictive performances. What is more,
XGBoost (Proposed) 0.9643 0.2840s the data set used for training and testing reflects real-world
XGBoost 0.9613 2.4397s phishing scenarios and are to some extent feature rich. As for the
comparison result, the experiment shows that the proposed
AdaBoost 0.9331 0.3421s method outperforms all other methods in either the dimension of
predictive accuracy (96.44%) or the runtime of the entire
Stacking 0.9605 6.0123s
learning and classification process (0.2841 seconds).
Voting 0.9643 8.3052s For future work there are two major aspects that are worth
considering: 1) Data set enlargement: although the data set that
is currently in use is large enough, expanding the current data
Examining Table 2, thing is clear that the Accuracy of the set or adding more real-world and large-scale data set is
proposed XGBoost method in predicting phishing or non- necessary for further testing; 2) Further optimization of the
phishing labels is better than the ones of default settings proposed method: regardless the strong performance produced
XGBoost with specific parameters assignment, AdaBoost, and by the proposed method, there are still space for optimization of
Stacking combining RF and KNN. As for the Voting that uses the method to produce even better outcomes.
RF, Artificial Neural Network, and CART, although its
Accuracy as high as the one from the proposed method, its REFERENCES
Runtime is far behind. The same as the previous part, ROC [1] Subasi, Abdulhamit & Kremic, Emir. (2020). Comparison of Adaboost
Curves are also plotted for an even clearer comparison among with MultiBoosting for Phishing Website Detection. Procedia Computer
the models’ performances. Science. 168. 272-278. 10.1016/j.procs.2020.02.251.
[2] J. Rashid, T. Mahmood, M. W. Nisar and T. Nazir, "Phishing Detection
Using Machine Learning Technique," 2020 First International Conference
of Smart Systems and Emerging Technologies (SMARTTECH), 2020, pp.
43-46, doi: 10.1109/SMART-TECH49988.2020.00026.
[3] A. F. Nugraha and L. Rahman, "Meta-Algorithms for Improving
Classification Performance in the Web-phishing Detection Process," 2019
4th International Conference on Information Technology, Information
Systems and Electrical Engineering (ICITISEE), 2019, pp. 271-275, doi:
10.1109/ICITISEE48480.2019.9003952.
[4] Rakesh R, Kannan A, Muthurajkumar S, Pandiyaraju V and SaiRamesh
L, "Enhancing the precision of phishing classification accuracy using
reduced feature set and boosting algorithm," 2014 Sixth International
Conference on Advanced Computing (ICoAC), 2014, pp. 86-90, doi:
10.1109/ICoAC.2014.7229752.
[5] Li, Yukun & Yang, Zhenguo & Chen, Xu & Yuan, Huaping & Liu,
Wenyin. (2018). A stacking model using URL and HTML features for
phishing webpage detection. Future Generation Computer Systems. 94.
10.1016/j.future.2018.11.004.
[6] Zamir, Ammara & Khan, Hikmat & Iqbal, Tassawar & Yousaf, Nazish &
Aslam, Farah & Anjum, Almas & Hamdani, Maryam. (2020). Phishing
web site detection using diverse machine learning algorithms. The
Electronic Library. ahead-of-print. 10.1108/EL-05-2019-0118.
[7] Dharani M., Soumya Badkul, Kimaya Gharat, Amarsinh Vidhate,
Fig. 2. ROC Curves of the proposed XGBoost and baseline models Dhanashri Bhosale. Detection of Phishing Websites Using Ensemble

218
Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.
Machine Learning Approach. ITM Web Conf. 40 03012 (2021) DOI: [9] Jagadeesan, S., Chaturvedi, A., & Kumar, S. (2018). URL phishing
10.1051/itmconf/20214003012 analysis using random forest. International Journal of Pure and Applied
[8] A. Basit, M. Zafar, A. R. Javed and Z. Jalil, "A Novel Ensemble Machine Mathematics, 118(20), 4159-4163.
Learning Method to Detect Phishing Attack," 2020 IEEE 23rd [10] R. Pavan, M. Nara, S. Gopinath and N. Patil, "Bayesian Optimization and
International Multitopic Conference (INMIC), 2020, pp. 1-5, doi: Gradient Boosting to Detect Phishing Websites," 2021 55th Annual
10.1109/INMIC50486.2020.9318210. Conference on Information Sciences and Systems (CISS), 2021, pp. 1-5,
doi: 10.1109/CISS50987.2021.9400317.

219
Authorized licensed use limited to: Welcome Shri Guru Gobind Singhji Inst of Eng & Tech Nanded. Downloaded on February 16,2024 at 11:55:01 UTC from IEEE Xplore. Restrictions apply.

You might also like