Azeez 2021

computers & security 108 (2021) 102328
Available online at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/cose
Adopting automated whitelist approach for

detecting phishing attacks
Nureni Ayofe Azeez a, Sanjay Misra b,c,d,∗, Ihotu Agbo Margaret a,

Luis Fernandez-Sanz d, Shafi’i Muhammad Abdulhamid e
a Department of Computer Sciences, University of Lagos, Lagos, Nigeria
b Department of Computer Engineering, Atilim University, Ankara, Turkey
c Department of Computer Engineering, Covenant University, Ota, Nigeria
d Department of Computer Science, University of Alcala, Madrid Spain
e Department of Cyber Security Sciences, Federal University of Technology, Minna, Nigeria
a r t i c l e i n f o a b s t r a c t
Article history: Phishing is considered a great scourge in cyberspace. Presently, there are two major chal-
Received 18 November 2019 lenges known with the existing anti-phishing solutions. Low detection rate and lack of quick
Revised 16 April 2021 access time in a real-time environment. However, it has been established that blacklist so-
Accepted 14 May 2021 lution methods offer quick and immediate access time but with a low detection rate. This
Available online 24 May 2021 research paper presents an automated white-list approach for detecting phishing attacks.
The white-list is determined by carrying out a detailed analysis between the visual link
Keywords: and the actual link. The similarities of the known trusted site are calculated by juxtaposing
Phishing the domain name with the contents of the whitelist and later match it with the IP address
Blacklist before a decision is made and further analyzing the actual link and the visual link by calcu-
Whitelist lating the similarities of the known trusted site. The technique then takes a final decision
Cybersecurity on the extracted information from the hyperlink, which can also be obtained from the web
False positive address provided by the user. The experiments carried out provided a very high level of ac-
False negative curacy, specifically, when the dataset was relatively at the lowest level. Six different datasets
were used to perform the experiments. The average accuracy obtained after the six experi-
ments was 96.17% and the approach detects phishing sites with a 95.0% true positive rate. It
was observed that the level of accuracy varies from one dataset to another. This result shows
that the proposed method performs better than similar approaches benchmarked with. The
efficiency of the approach was further established through its computation time, memory,
bandwidth as well as other computational resources that were utilized with the minimum
requirements when compared with other approaches. This solution has provided immense
benefits over the existing solutions by reducing the memory requirements and computa-
tional complexity, among other benefits. It has also shown that the proposed method can
provide more robust detection performances when compared to other techniques.
© 2021 Elsevier Ltd. All rights reserved.
∗
Corresponding author.
E-mail addresses: nazeez@unilag.edu.ng (N.A. Azeez), sanjay.misra@covenantuniversity.edu.ng (S. Misra), ihotu_agbo@yahoo.com (I.A.
Margaret), luis.fernandezs@uah.es (L. Fernandez-Sanz), shafii.abdulhamid@futminna.edu.ng (S.M. Abdulhamid).
https://doi.org/10.1016/j.cose.2021.102328
0167-4048/© 2021 Elsevier Ltd. All rights reserved.
2 computers & security 108 (2021) 102328
The white-list, on the other hand, contains legitimate web-

1. Introduction sites; but just like the blacklist, it is also difficult to maintain a
global white-list; it is not possible to develop a database for a
The use of Cyberspace keeps increasing as it plays a significant
white-list that has all available authentic and legitimate web-
role in today’s commerce and business activities (Azeez and
sites because the websites are of a large scale and rapidly in-
Otudor, 2016), providing a lot of online services which tend
creasing (Azeez et al., 2015); hence the need of a personalized
to simplify our lives. These services allow us to access in-
white-list. In this work, we adopted the white-list approach
formation ubiquitously. Online banking via the web, for in-
that gives room for automatic updates and subsequently pro-
stance, has become very popular as many users have become
vides effective and reliable protection against any form of
used to it (Ludl et al., 2007). The ubiquitous nature of internet
phishing attacks. The proposed method of phishing detection
technology for information sharing has undoubtedly brought
and blocking of the URL and web pages will improve phishing
about various forms of attacks. Prominent among them are
detection and therefore reduce cybercrime. The importance of
replay and phishing, pharming, masquerading, and denial of
the existence of anti-phishing is to control the way private and
service (Suryavanshi and Jain, 2015). It has, however, been es-
confidential information of persons and organizations are ex-
tablished that the poor nature of security measures has con-
posed, which in turn usually brings about the loss of property
tributed immensely to the vulnerable nature of the network
and revelation of confidential details to the public (Azeez and
(Adebowale et al., 2019), as well as identity hiding, fame, and
Ademolu, 2016). Therefore, the main contributions of this re-
notoriety (Khonji et al., 2013).
search work are as follows:
Phishing can succinctly be defined as fraudulent and suspi-
cious practices that involve sending or disseminating various
• We propose a secure architecture for the anti-phishing sys-
electronic mails claiming to originate from a reliable individ-
tem.
ual or company to lure the target to divulge classified personal
• We present an anti-phishing algorithm using the Auto-
information such as credit card number, passwords, and other
mated Whitelist Approach.
biometric details that could be used on behalf of the owner
• We implemented the proposed algorithm in a real-life web
to carry out nefarious activities. Phishing provides a medium
application environment.
for numerous computer attacks to give room for sharing so-
• We experimented to evaluate the performance of the pro-
cially inspired messages to various internet users by request-
posed Automated Whitelist Approach to phishing attacks
ing the valuable and confidential information that could be
with standard metrics.
finally used against them to transact illegally on their behalf
(Khonji et al., 2013). Phishers do transmit their messages via
SMS, computer games, VoIP, and webpages (Biedermann et al., The purpose of this paper is to practically and sci-
2014). entifically demonstrate the application of Whitelisting for
Several anti-phishing solutions and techniques have been an anti-phishing solution. The rest of the paper is orga-
developed. However, phishers have enhanced their strategies nized as follows: Section 2 briefly presents the literature
to overcome the newly developed techniques (Cao et al., 2014). review. Section 3 presents the proposed system architec-
Through this form of cyber-attacks, many valuable and secret ture, and Section 4 presents a phishing detection algo-
information has been revealed to the detriment of internet rithm. Also, Section 5 explains the evaluation metrics, while
users. The process of phishing the website is given in Fig. 1 Section 6 gives the experimental results. Section 7 concludes
(Basnet et al., 2008). Conventionally, the blacklist is the most the paper.
common approach used for phishing detection. The blacklist
contains phishing sites; maintaining a blacklist, however, re-
quires a whole lot of resources needed to report and verify 2. Related works
suspicious websites. Also, it is quite difficult to maintain a
global blacklist because new phishing sites emerge all the time Phishing makes use of social engineering techniques to ille-
(Cao et al., 2014). gally obtain information from internet users (Blasi, 2009). The
motive for doing this is to take over the monetary ownership
Fig. 1 – Traditional method of anti-phishing solutions.

computers & security 108 (2021) 102328 3
and subsequently cause a great financial loss to any inter- originality and legitimacy of a webpage by using features of
net user (Azeez and Babatope, 2016). There are various anti- hyperlinks. In the work of , they proposed a technique for the
phishing techniques already proposed in the literature. Some detection of various forms of phishing attacks by implement-
are basically for handling phishing attacks at both web and ing a typical and prototype web browser processing each elec-
email levels. The present methods are, however, inadequate tronic mail for any phishing attack (Jain and Richariya, 2011).
and inefficient (Jain and Richariya, 2011). Their approach detects phishing emails using link-based fea-
tures. A combination of their algorithm and the prototype of a
2.1. Different approaches to anti-phishing web browser helps users to receive notification of a likely at-
tack Choudhary and Jain (2017). The main challenge with this
The entire database for all legal and legitimate login is be- approach is in its inability to incorporate other features along
ing kept by the Automated Individual Whitelist (AIWL). When- with the hyperlinks.
ever a submission is made by any user, the database is quickly In Liu et al. (2010), a zero-hour phishing attack was ad-
searched and verified. If the Login User Interfaces (LUIs) of the dressed by proposing a solution that could identify and
webpage is legitimate user will be allowed to continue. If how- quickly offer hyperlinks in the source code of a site. At the
ever, it is not part of the database, a warning will be given to same time, the latter will, however, be discovered by using a
the user to intimate him/her of the danger in proceeding with reliable search engine for identifying and searching the most
such interface. There are two main components for AIWL: the frequent keywords. The Term Frequency-Inverse Document
white-list of legitimate LUIs and the white-list maintainer. Frequency (TF-IDF) algorithm is very useful in this case. At the
The former is used for checking if a URL is suspicious or fa- end of both direct and indirect extraction of the associated
miliar while the latter is used to decide on whether to store webpage, the technique makes a comparison of the associ-
LUI in the database of white-list (Jain and Richariya, 2011). ated and suspicious web pages using ranking relation, layout
To determine how well users can recognize and identify similarity relations, and link relation Azeez et al. (2011). The
phishing web pages with anti-phishing tools, Li et al. (2014) de- weakness of this approach is that the solution could not be
signed and conducted usability tests for two types of integrated into web browsers as lightweight plug-ins to pro-
phishing-detection applications: blacklistbased and whitelist- vide alerts for phishing attacks.
based anti-phishing toolbars. The research outputs show no The work of Naresh et al. (2013) puts forward a Link-Guard
substantial performance changes between the applications. solution for a new end-host anti-phishing algorithm that uses
They detected that, in many web browsing cases, a substan- generic features of hyperlinks in several attacks. The method
tial quantity of valuable and applied information for users is detects both unknown and known phishing attacks. The Link-
absent, such as information clarifying professional web page Guard was implemented in Windows XP, which was able to
security certificates. However, the certificates are not vital in detect not only known but unknown phishing attacks with
guaranteeing customer privacy and protection. reduced and minimal false negatives. Link-Guard was proven
This paper provides a multilayer model to detect phish- to be light weighted and prevents attacks in real-time Naresh
ing, titled PhiDMA(Phishing Detection using Multi-filter Ap- et al. (2013). This approach, however, could not detect emerg-
proach). The PhiDMA model incorporates five layers: Auto ing attacks targeting IoT devices and systems.
upgrade whitelist layer, URL features layer, Lexical signature Content-based phishing detection has proposed the re-
layer, String matching layer, and Accessibility Score compar- search Zhang et al. (2007). The technique takes a set of fea-
ison layer (Sonowal and Kuppusamy, 2020). A prototype im- tures from many web pages. Through the process, the TF-IDF
plementation of the proposed PhiDMA model is built with an of a site can be easily calculated. It can subsequently create a
accessible interface so that persons with visual impairments lexical signature. The application submits the main five terms
shall access it without any barrier. The result from the experi- with the highest values of TF-IDF to the search engine. It will
ment shows that the model is capable to detect phishing sites eventually use the top “n” results to verify the originality and
with an accuracy of 92.72%. legitimacy of a site. This technique is, however, affected by the
In this paper, phishing websites are detected by extracting language used in the site.
the various features from the collection of phishing and legit- CANTINA+ was proposed in 2011 by (Xiang et al., 2011).
imate websites obtained from PhishTank and starting point This technique is a clear, rich feature-based machine learn-
directory service (Pilton et al., 2021). This constructed feature ing procedure to identify and detect phishing sites. The solu-
vector is further processed by the proposed feature selection tion uses both the URL of a website and the Document Object
moduleGenFeato to obtain the reduced set of features. This Model (DOM) as sources of the features extraction. It was ob-
reduced feature vector is further processed by the proposed served that 92 % of a true positive rate and 0.4 % of a false pos-
phishing detection modulePhiDecto predict the type of a web- itive rate were conveniently achieved. The limitation of this
site approach is in its failure to consider some reasonable number
The research (Jain and Richariya, 2011) developed an auto- of features as the features considered are very few; hence the
updated white-list of legitimate websites. This is in an attempt results obtained could not be relied upon.
to solve the problem of phishing in cyberspace. The solution An e-banking anti-phishing method was proposed in
assists various internet users to immediately verify the status Aburrous et al. (2010) in 2010. They adopted a unique method
of the site to be accessed. Whenever a user attempts to access of an intelligent but resilient approach for detecting and iden-
a site that is not in the auto-updated white-list of legitimate tifying phishing in e-banking. The solution which is based
websites, he/she will be warned of the implication of disclos- on data mining algorithms and fuzzy logic symbolizes the e-
ing his sensitive information. The solution further verifies the banking phishing website and verifies its procedures by cat-
egorizing phishing and subsequently defining phishing sites equate background information and history of various phish-
attack features with layer structure. The approach did not take ing attacks with a comprehensive explanation of the motives
into consideration the best feature sets. Also, a few numbers of of phishers. They went further to provide an up-to-date tax-
emerging technologies that could immensely assist in phish- onomy of different types of phishing attacks along with the
ing classification were not considered. taxonomy of different solutions to safeguard internet users
An architecture that was based on transparent virtual- from any form of phishing attacks as evident in a couple of
ization techniques for detecting phishing was proposed by literature. The article was finalized by providing insights into
Biedermann et al. (2014). The solution can be adopted and de- the challenges of the existing solution (Gupta et al., 2018).
ployed as a security measure for virtual machines deployed in The work principally deals with the review of previous related
the cloud (Azeez and Iliyas ,2016). The architecture can show works; hence there is no empirical evaluation.
and prevent phishing attacks by analyzing perceptual simi- Having realized the great damage done by various phish-
larity and resemblance between fingerprints. What is more, it ing email activities across the global network Almomani et al.
identifies and quickly detects Man-in-the-Browser (MitB) at- (2013) attempted to carry out an all-inclusive survey of var-
tacks by verifying sections of a site which has been processed ious phishing email filtering approaches. The main objective
by the browser Nureni and Irwin (2010). Verification of a lim- was to develop a complete solution to phishing emails. The so-
ited section of the browser as well as limited features is the lution was presented to adequately identify phishing emails
major constraint of this work. at a different phase of attack by adopting machine learn-
The combination of machine learning techniques and a ing strategies. The authors went further to conduct a com-
personalized white-listing approach was adopted by (Belabed parative assessment of various identified filtering techniques
et al., 2012). The filtering of a blocked phishing site was (Almomani et al., 2013). This is a survey of phishing email
achieved by the white-list while unblocked phishing sites by filtering techniques. There is a scientific result to contend
the white-list were filtered out by Support Vector Machine with.
classifier positioned and structured to classify attacks. The ap- There is no gainsaying the fact that Botnets have nega-
proach could not detect whether legitimate websites are at- tively affected both individuals and uncountable organiza-
tached by a DNS spoof. Also, the reliance on one feature of the tions across the globe. The effect is so monumental that many
classification model on a search engine can negatively influ- developed countries of the world are feeling the impacts. The
ence the ease of use and responsiveness of the tool in the case evolution of Citadel’s Conficker and Zeus has led to the adop-
of search engine dysfunction. tion of the Domain Name System (DNS) to conceal themselves
A technique based on analysis of URLs posted on social me- and avoid possible detection. Against the backdrop of this
dia sites and near real-time gathering was used in developing development, Alieyan et al. (2019) adopted a DNS response
an automated phishing detection and analysis (White et al., and query behavioral approach to detecting possible abnormal
2012). They characterized each page fetched with computed DNS response and query. This result obtained has an accuracy
values such as several links and images. They did a compara- of 99.35% and a false-positive rate of 0.25%. The effectiveness
tive assessment of the distance between the images as a mode of this approach was further affirmed when comparing this
of comparison. This was achieved after capturing a screen- approach with other prominent DNS-based approaches. The
shot of the page images and their computation. The approach challenge with this approach is in its only focus on DNS query
is, however, not efficient in handling emergent phishing chal- and response behaviors. This technique is not sufficient for
lenges (Choudhary and Jain, 2017). adequate evaluation to draw a reliable conclusion.
Having realized “zero-day” phishing attack as a novel strat- Cascading style sheet (CSS) and uniform resource identifier
egy being used by a cybercriminal, specifically, the phishers to (URI) matching-based phishing detection framework were de-
wreak havoc on the internet user (Almomani et al., 2013) pro- veloped by Mishra and Gupta (2018). The objective is to iden-
posed is a unique framework that adopts an approach that tify and detect zero-day phishing attacks which are charac-
was developed on a hybrid learning approach. The approach terized by stealing classified personal information of inter-
was termed phishing dynamic evolving neural fuzzy frame- net users. To circumvent the phishing attacks, the proposed
work (PDENF), which is an adaptive online is improved upon framework adopted the main characteristics of typical attacks
offline learning technique to dynamically identify the phish- for both CSS and URI matching. The proposed system proves
ing email before its arrival into the user’s account. The ap- to be effective by showing a True Negative rate of 100% as well
proach is proposed to perform better for high-speed learning as a True Positive of 93.27%. The intelligent phishing detec-
with little and low memory footprint as it reduces the com- tion system is limited to Uniform Resource Identifier (URI),
plexity of the rule base. The challenge with this approach is and Cascading Style Sheet (CSS) matching; hence the concept
its inability to provide the computational overhead with other utilized could not be reliable enough for detecting phishing in
approaches. PDENF is suggested to work for high-speed “life- a global network.
long” learning with a low memory footprint and minimizes To further ensure that transaction online is free of Phisher
the complexity of the rule base and configuration with few and cybercriminals who are hell-bent in defrauding internet
numbers of rules creation for email classification (Almomani users, the authors verified both the literal and conceptual con-
et al., 2013). The main challenge with this work is its inability sistency between the web content and uniform resource loca-
to provide the computational overhead of this approach with tor (URL). The approach attains a 99.1% degree of accuracy. It
other similar approaches. proves to be very useful and reliable in identifying phishing
In an attempt to bring into the limelight, (Gupta et al., sites on the global network (Azeez et al., 2020). The main chal-
2018) did a wonderful and commendable job by providing ad- lenge with this approach is that the content of the websites
Fig. 2 – The architecture of the proposed anti-phishing system.
and the content between the URL are examined only by the et al. (2013). They adapted evolving connectionist system that
conceptual similarity. relied heavily on a hybrid machine learning technique. The
In the work of Gupta et al. (2017), a comprehensive survey framework was improved by an online learning technique to
was carried out on phishing which is regarded as a promi- identify repeatedly and dynamically any email that is consid-
nent challenge to networks and systems. The authors ex- ered phishing enveloped by an unknown zero-day email be-
plained the historical background of various phishing attacks fore it reaches the owner’s account. The framework proposed
as well as the intention of attackers. Furthermore, they pro- has never been tested; hence its efficiency could not be guar-
vided the taxonomy of different types of available phishing anteed.
attacks. They went further by providing present and past pro- There are different approaches already proposed by re-
posed solutions as presented by different authors. Not only searchers across the globe to protect cyberspace from phish-
these, have equally explored a couple of literature to explain ing. Many of the proposed approaches are very efficient, while
challenges and issues mitigating the fight against phishing at some are unexpectedly weak in terms of performance and
a global level (Gupta et al., 2017). This research, however, could reliability. Consequent to this development, Almomani et al.
not provide empirical evidence of the forms of phishing tech- (2020) presents a comprehensive survey of the state of the
niques considered since their work is solely based on content art research on various attacks and proffer solutions against
analysis. phishing in the global network. The authors went further to
To prevent internet fraudsters from stealing classified in- conduct a comparative study and evaluation of several filter-
formation, (Mishra and Gupta, 2018) came up with an intelli- ing approaches. The paper’s limitation is in its inability to
gent phishing detection framework aimed at detecting zero- handle and prevent the challenges associated with zero days
day phishing attacks. The adopted approach was based on phishing email prediction and detection.
cascading style sheet (CSS) and Uniform Resource Identifier Going by a couple of literature reviewed and the weak-
(URI) matching. According to the authors, the approach was nesses noticed so far, the following questions are hereby
very reliable and efficient in detecting various categories of posed as a guide for solving this work.
phishing attacks as the True Positive, and True Negative val-
ues were 93.27% and 100% (Mishra and Gupta, 2018). The main
1 How can phishing be determined by comparing the do-
challenge with this approach is in its inability to compare the
main name with the contents of the white-list and later
results with other similar researches to ascertain its effective-
matches it with the IP address before a decision is made
ness.
and further analyzing the actual link and the visual link?
Implementation of a novel approach tagged “dynamic
2 How can the solution obtained in “1” be deployed for a real-
evolving neural fuzzy framework” was proposed by Almomani
time application?
the web address supplied by the user. The algorithm is de-

3. System architecture tailed in Algorithm 1.
The reason behind the extraction of hyperlinks is that the
There are three (3) main modules for the proposed architec-
phishing site copies the content of the page content from a tar-
ture as shown in Fig. 2. The URL and DNS matching sub-
geted original or legitimate webpage which may have many
modules constitute the first module. These modules contain a
fakes and mimicked hyperlinks pointing to the targeted le-
white-list. The IP address and the domain name are contained
gitimate page. Some of the URLs available in the phishing
on the white list. Anytime a site is submitted, the application
database may be redirected to their corresponding original
juxtaposes the domain name with the contents of the white-
or legitimate websites. If however, the webpage is a genuine
list and later matches it with the IP address before a decision
one, it will not point to a phishing page. The algorithm for de-
is finally taken.
tecting phishing decides on the status of any URL based on
There are things that might happen whenever a user sub-
three metrics: null links present in the source code, a webpage
mits a site. It is either the user is a first-time or regular user.
that does not contain hyperlinks, and foreign links present
If he/she is a first-timer in using or accessing the website, the
in source code (Vanhoenshoven et al., 2016). The following
site’s domain will not be available in the white-list hence the
adopted phishing detection algorithm was proposed by (Jain
and Gupta, 2016).
second module will become operational. The role of the sec-

ond module is to confirm and affirm if a webpage is phishing 4.1. A webpage that does not contain any hyperlinks
(Nivedha et al., 2017). This is done by extracting hyperlinks and
subsequently apply a phishing detection algorithm. This is de- It has been established that HTML tags are easily traceable;
tailed in another section of this paper. The algorithm checks hence, anti-phishing solutions can easily extract detailed in-
if the hyperlink is a legitimate or phishing type. If the status is formation from them. Consequently, an attacker can develop
latter, the system gives a warning to the user Ayofe et al. (2010). and design a webpage that can conveniently bypass any anti-
If the status is former, the system will update the whitelist phishing solution. One main unique feature is that if a web-
database. site is legitimate, extraction can easily be carried out on at
least one of the hyperlinks (Bo et al., 2016). If the total links
extracted are zero, the website is considered a legitimate one.
If however, there is no hyperlink extraction of the webpage is
4. Phishing detection algorithm regarded as a phishing type.
We determine the white-list by carrying out a comprehensive 4.2. A webpage that contains null pointer
analysis between the actual link and the visual link. Also, the
similarities of the known trusted site can be calculated. The Whenever the link is not pointing to any webpage or docu-
proposed solution takes a final decision on the extracted in- ment, it is referred to as a null link or null pointer. It is usually
formation from the hyperlink, which is equally obtained from denoted by <a href="#">. It returns a link on the same page
whenever a click is made on a link. The attackers use a null

Table 1 – The database used to test system.
pointer to achieve their ulterior motive Chiba et al. (2016).
S/N Database Phishing/ URL of Dataset
Legitimate
4.3. Number of links pointing to own domain and foreign
domain 1 PhishTank Phishing https://www.phishtank.com
2 Alexa Legitimate http://www.alexa.com/topsites
Immediately the null link attributes are checked and estab-

lished; the algorithm will decide with the result of the ex-
tracted hyperlink set (Xiong et al., 2015). The majority of the
hyperlinks are directed to the same domain if the hyperlink
is legitimate while in the case of phishing sites, the majority
of the hyperlinks are directed to their respective targeted do- 6. Implementation and results
main or foreign domain. The total number of links extracted
from the source code of the webpage and the total number of The solution was implemented with PHP on WampServer
links pointing to a foreign domain is calculated by the algo- 3.0.6, 64 bit. It inputs the URL of the suspicious page is sup-
rithm. The decision on the nature of hyperlink is determined plied to verify its legitimacy. Subsequently, the entire database
by the following equation as obtained in (Jain and Gupta, 2016): and parent domain of the supplied URL is verified within the
white-list. It is confirmed illegitimate only if the status is fi-

L − NDi nally defined as phishing thereafter a warning is given to the
Ratio = (1)
L user on the consequences. There is an extraction of hyperlinks
in the webpage through the parsing of the HTML file. Also,
Where NDi = the total number of links pointing to the own pattern matching was used to obtain the corresponding links

domain, L = the total number of links extracted from the from the pages that are not properly formed. The status of the
page source of the suspicious webpage. suspicious web pages was confirmed by comparing extracted
and stored IP addresses.
Dataset - For evaluation purposes, a dataset of 200 sites
5. Evolution metrics has been utilized, that is, 140 phishing and 60 legitimate web-
sites. The dataset consists of both phishing and legitimate web
The following are the metrics used for evaluation: false- pages. The phishing webpages are collected from the Phish-
negative rate, true negative rate, false-positive rate, and true Tank (https://www.phishtank.com). While the legitimate web
positive rate. They are defined as follows (Sun et al., 2015): pages are taken from Alexa (http://www.alexa.com) as shown
True positive rate (TPR) can simply be defined as the rate at in Table 1. Fig. 3 shows the implementation page, where a user
which a phishing site is identified and classified as true phish- can input the URL of a web page for the system to analyze.
ing. It is denoted as: Once the URL has been typed, the user clicks on the ‘Analyse’
button.
TPR = ( (NP → P)/NP) ∗ 100 (2) Fig. 3 shows the implementation page where a URL had
been entered, that is ‘yahoo.com’ gotten from ‘alexa.com’, and
it was shown to be a legitimate site.
False-positive rate (FPR) can be defined as the rate at which
In Fig. 4, the populated database is shown after a series of
phishing sites identified and classified as legitimate from the
input of domains and analysis of the domains by the system.
total available phishing sites. It is denoted as:
The populated database is also shown after a series of input
of domains and analysis of the domains by the system.
FPR = ( (NP → L )/NP ) ∗ 100 (3)
Table 2 shows the result of the phishing with the first 50
datasets (comprising of 35 phishing and 15 legitimate web-
False-negative rate (FNR) can be defined as the actual rate
pages). The analysis shows that all webpages were correctly
of legitimate websites classified and identified as phishing
identified.
from the total available websites. It is denoted as:
Fig. 5 shows the total number of web pages (phishing and
legitimate) against the correctly detected webpages (phishing
FNR = ( (NL → P )/NL ) ∗ 100 (4)
and legitimate) system after analyzing the first 50 datasets
comprising of 35 phishing and 15 legitimate web pages. The
True negative rate (TNR) simply means the rate of legiti- system was later populated with additional datasets ranging
mate sites also classified as legitimate. It is denoted as: from 150, 300, 400, 600, and 800. The objective of doing this is
to obtain a reliable level of accuracy so as to draw a consistent
TNR = ( (NL → L )/NL ) ∗ 100 (5) and dependable inference from the metrics used for evalua-
tion. Each of the datasets used is labeled as A1-A6. Fig. 8 shows
Accuracy (A) determines the rate of correct identification the level of accuracy for each of the datasets used for evalua-
of both legitimate and phishing websites when considering tion. The analysis shows that all webpage were correctly iden-
all the available websites. It is denoted as: tified, (Fig. 6).
Table 3 shows the result of the phishing system after an-
Accuracy = ( (NL → L + NL → L)/(NL + NP ) ) ∗ 100 (6) alyzing the second 50 dataset (comprising of 35 phishing and
Fig. 3 – Screenshot showing result of an analyzed legitimate URL.
Fig. 4 – Screenshot of Populated Whitelist Database 1.
Table 2 – Result of the phishing system 1.
Total Total le- Phishing Phishing Legitimate Legitimate True False True False Accuracy
Phishing gitimate classi- classified classified classified Positive negative negative Positive
fied as as as as rate rate rate rate
phishing legitimate legitimate phishing
(A1)35 15 35 0 15 0 100.00% 0.00% 100.00% 0.00% 100.00%
(A2)105 45 98 15 37 15 82.2% 17.8% 82.2% 17.8% 80.2%
(A3)234 66 216 31 53 31 84.6% 19.7% 80.30% 15.4% 82.30%
(A4)412 88 391 43 66 43 79.10% 25% 75% 20.9% 76%
(A5)516 84 475 67 81 67 82.4% 5.6% 94.4% 17.6% 80.1%
(A6)674 126 619 82 99 82 76.10 21.3% 78.7% 23.90 74.7%
Fig. 5 – Graph of the total number of webpages (phishing and legitimate) against the correctly detected webpages (phishing
and legitimate) 1.
Fig. 6 – Level of Accuracy per Dataset.
Phishing gitimate classified classified classified as classified as Positive negative negative Positive
as as legitimate phishing rate rate rate rate
phishing legitimate
(B1)35 15 34 1 14 1 97.14% 6.67% 93.33% 2.86% 96.00%
(B2)115 35 97 20 33 20 95.30% 5.58% 94.42% 4.70% 92.30%
(B3)244 56 213 32 54 32 92.70% 3.36% 96.64% 7.30% 91.50%
(B4)418 82 389 45 66 45 89.20% 19.5% 80.5% 10.8% 87.20%
(B5)511 89 451 69 80 69 88.6% 10.12% 89.88% 11.4% 84.40%
(B6)694 106 615 84 101 84 88.50% 4.72% 95.28% 11.5% 83.57%
15 legitimate web pages). The analysis shows that 2 web pages the dataset comprising of 35 phishing and 15 legitimate web
were not correctly identified. pages. The analysis shows that 2 web pages were not correctly
Fig. 7 shows the total number of web pages (phishing and identified. For further evaluation, efforts were made to utilize
legitimate) against the correctly detected webpages (phish- 150, 300, 400, 600, and 800 datasets. Each of the datasets used
ing and legitimate) system after analyzing the second 50 of
and legitimate) 2.
Phishing Total Phishing Phishing Legitimate Legitimate True False- True False Accuracy
legitimate classified classified classified as classified as Positive negative negative Positive
phishing legitimate
(C)35 15 33 2 13 2 94.29% 13.33% 86.67% 5.71% 95.80%
(C2)115 35 95 22 33 22 89.90% 5.72% 94.28% 10.1% 87.45%
(C3)244 56 209 38 53 36 90.35% 5.36% 94.64% 9.65% 88.67%
(C4)416 84 384 48 68 48 86.70% 19.91% 80.09% 13.3% 84.67%
(C5)511 89 465 63 72 63 79.9% 19.12% 80.88% 20.1% 77.95%
(C6)694 106 618 90 92 90 84.20% 13.21% 86.79% 15.8% 81.80%
is labeled as B1-B6. Fig. 10 shows the level of accuracy for each 15 legitimate web pages. The analysis shows that 4 web pages
of the datasets used for evaluation. were not correctly identified.
Table 4 shows the result of the phishing system after ana- Fig. 9 shows the total number of web pages (phishing and
lyzing the third 50 of the dataset comprising 35 phishing and legitimate) against the correctly detected webpages (phish-
and legitimate) 3.
ing and legitimate) system after analyzing the third 50 of Furthermore, there was an increment in the total number
the dataset comprising of 35 phishing and 15 legitimate web of datasets to 150, 300, 400, 600, and 800. Each of the datasets
pages. The analysis shows that 4 web pages were not correctly used is labeled as D1-D6. Fig. 12 shows the level of accuracy for
identified. As done previously, further evaluations were car- each of the dataset, while various values obtained are shown
ried out on 150, 300, 400, 600, and 800 datasets. Each of the in Table 5.
datasets used is labeled as C1-C6. Fig. 10 shows the level of ac- Table 6 shows the overall result of the analysis carried out
curacy for each of the datasets used for evaluation while the by the system; it shows the true positive rate, the false nega-
values obtained for each of the metrics evaluated are provided tive rate as well as the accuracy of 94.50%.
in Table 4. Fig. 13 shows the overall total number of web pages (phish-
Table 5 shows the result of the phishing system after an- ing and legitimate) against the correctly detected webpages
alyzing the last 50 of the dataset comprising 35 phishing and (phishing and legitimate) system after analyzing the dataset
15 legitimate web pages. The analysis shows that 5 web pages comprising of 140 phishing and 60 legitimate web pages.
were not correctly identified. Both high true positive rates and low false-negative rates
Fig. 11 shows the total number of web pages (phishing and are required in any efficient anti-phishing system. The cal-
legitimate) against the correctly detected webpages (phish- culated values for each false-negative rate, true positive rate,
ing and legitimate) system after analyzing the last 50 of the and accuracy are presented (Fig. 14). In an attempt to side-step
dataset comprising of 35 phishing and 15 legitimate web the anti-phishing solution, 44.28% of phishing pages comprise
pages. The analysis shows that 5 web pages were not correctly of no-hyperlinks in their source code which is a strategy be-
identified. ing used by internet phishers to lure and deceive internet
Total Total le- Phishing Phishing Legitimate Legitimate True False- True False Accuracy
phishing legitimate
(D1)35 15 31 4 14 1 88.57% 6.67% 93.33% 11.43% 94.55%
(D2)107 43 88 22 40 20 85.80% 6.98% 93.02% 14.20% 83.10%
(D3)247 53 209 40 51 40 87.10% 3.38% 96.62% 12.90% 84.52%
(D4)414 86 381 45 74 45 86.56% 14.00% 86.00% 13.44% 84.11%
(D5)515 85 464 61 75 61 87.34% 11.76% 88.24% 12.66% 84.23%
(D6)690 110 620 90 90 90 86.90% 18.19% 81.81% 13.10% 85.34%
and legitimate).

Table 6 – Overall Result of the phishing system.
phishing legitimate
(E1)140 60 133 7 56 4 95.00% 6.67% 93.33% 5.00% 94.50%
(E2)235 65 212 11 61 16 89.07% 6.15% 93.85% 10.93 87.45%
(E3)326 74 285 21 71 23 85.15% 4.05% 95.95% 14.85% 82.78%
(E4)414 86 347 34 83 32 88.24% 3.49% 96.51% 11.76% 86.45%
(E5)510 90 439 41 76 44 89.93% 15.56% 84.44% 10.07% 87.41%
(E6)643 157 560 68 106 66 95.21% 32.48% 67.52% 4.79% 92.49%
Fig. 13 – Graph of the overall total number of webpages (phishing and legitimate) against the correctly detected webpages
(phishing and legitimate).

Table 7 – Details of hyperlinks.
Webpages Total Instances No. of webpages that No. of webpages that No. of webpages that
contain no contain null point to a foreign
hyperlinks hyperlinks domain
Phishing 140 62 42 29
Legitimate 60 0 0 4
Fig. 15 – Graph showing the details of the hyperlink features.
users. 30.0 % of phishing pages comprise null hyperlinks in similarities of the known trusted site can be calculated and
their source codes. After verifying both the null and no link quickly determined. It verifies the correctness and legitimacy
attributes, the phishing detection algorithm calculates the ra- of any webpage by certain hyperlink or URL features. This
tio of links to foreign domain versus link the total number of technique has provided great benefits over the existing so-
available links. These details are shown in Table 7. lutions by reducing the memory requirements and compu-
Table 7 shows the number of web pages that contain no tational complexity, among other benefits. It has also shown
hyperlinks, null hyperlinks as well as those that pointed to a that the proposed method can provide more robust detection
foreign domain. The pages that are legitimate are seen to have performances when compared to other techniques.
hyperlinks as opposed to the phishing pages that have a num- At this juncture, it is very pertinent to reference questions
ber of web pages without hyperlinks or having null hyperlinks. earlier posed, which served as a guide for arriving at a solution
Fig. 15 shows that the phishing pages that contain a higher in this work.
number of pages that do not have hyperlinks or have null hy-
perlinks with more pointing to a foreign domain in compari- 1 How can phishing be determined by comparing the do-
son to the legitimate pages. main name with the contents of the white-list and later
Table 8 compares the new technique for implementing matches it with the IP address before a decision is made
anti-phishing solution with (Chiew et al., 2020; Hadia et al., and further analyzing the actual link and the visual link?
2016; Anandita et al., 2017; Al-Janabi et al., 2017; Al-garadi
et al., 2016; Wu et al., 2018). The degrees of accuracy obtained
It is noteworthy to state categorically that this question has
for these techniques justify the new approach adopted in this
been successfully answered. In the course of implementation,
work as the degree of accuracy, which is 95.0% surpasses all
efforts were made to juxtapose each of the domain names
shown in Table 8.
with what was contained in the white-list. The results ob-
From the experimental evaluation of this technique, it is
tained were matched with the Internet Protocol address. Fi-
shown that the approach iseffective to detect phishing sites
nally, both the visual and actual links were analyzed before ar-
as it has an average of 96.17% level of accuracy, 95.0% true
riving at the values obtained in Tables 2–6 which were graph-
positive rate, and a 6.67% false-negative rate. It is considered
ically explained in Figs. 7-16. The metrics used for evaluation
as the best way for the anti-phishing solution because the
and the values obtained are shown clearly in those tables.
Table 8 – Anti-phishing methods based on Machine Learning (ML) used for benchmarking the computational overhead of
the new approach.
Authors Degree of ML technique Method name

Accuracy
Chiew et. al., 2020 94.6% SVM, Naive Bayes, C4.5, JRip, A new hybrid ensemble feature selection
and PART classifiers. framework for machine learning-based phishing
detection system
Hadi et. al., 2016 0.925% CBA, CMAR, MCAR, and ECAR A new fast associative classification algorithm
for detecting phishing websites
Anandita, D.P et. al., 2017 93.8498% SVM, RF, KNN A Novel Ensemble Based Identification of
Phishing Emails
Al-Janabi M. et. al., 2017. 92% RF, LR, NB, k-NN Using supervised machine learning algorithms
to detect suspicious URLs in online social
networks
Al-garadi et. al., 2016 94.3% Naïve Bayes (NB), Support Cybercrime detection in online communications:
vector machine (SVM), Random The experimental case of cyberbullying
forest, KNN detection in the Twitter network
Li et. al., 2014 95.3% Blacklist and Whitelist Blacklistbased and Whitelist-based
anti-phishing toolbars
Sonowal and Kuppusamy, 2020 92.72%% PhiDMA Phishing Detection using Multi-filter Approach
Pilton et al., 2021 95.29% PhishTank Evaluating privacy - determining user privacy
expectations on the web
Wu C. et. al., 2018 93.5% reinforcement learning (RL) - Enhancing Machine Learning Based Malware
Gym-plus Detection Model by Reinforcement Learning
Proposed Technique 96.17% Automated White-List Adopting Automated White-List (AWL) Approach
for Anti-Phishing Solution
1 How can the solution obtained in “1” be deployed for a real- work. In future work, it is also worthwhile to examine the con-
time application? sequence of feature selection with Classification algorithms.
Also, efforts are being made to add some other similarity-
Efforts are currently being made by three selected Univer- based features with the sole aim of adopting machine learning
sities in Nigeria as well as four research institutes where this algorithms for improving the accuracy of the approach. What
approach will be tested in a real-time mode to appraise its ef- is more? The response technique for the most visited legiti-
ficiency and reliability. mate websites can be minimized by including the browsing
historical information of URLs to the database of the white-
list.
7. Conclusion and future work
Authors’ contribution
The clandestine moves and consequential effects of phish-
ing across the globe are alarming. Many people have suf-
Conceptualization- NA and SM; Data curation- IAM and NA;
fered losses online through the illegitimate actions of phish-
Investigation-IAM; Methodology, SM., and LFS; Resources, NA
ers. Against these backdrops, the authors deemed it fit in car-
and SM; Supervision, SM and NA; Validation, IAM; Visualiza-
rying out the research in order to reduce and even amelio-
tion and Writing – original draft, LFS, NA and SM.;Writing—
rate the efforts being made by cybercriminals to dupe various
review & editing, SMA, SM and LFS.
internet users. The research work proposed an auto-updated
white-list of genuine and legitimate websites which are re-
quired to be supplied by each internet user so as to protect
Declaration of Competing Interest
him/her against any form of phishing attacks. This technique
verifies and checks for the legitimacy of a site by making use of
We declare that we don’t have conflict of interest with any-
hyperlink features and identities. This work proposes a prac-
body.
tical and effective solution for phishing attacks. While other
approaches like blacklist solution and machine learning algo-
R E F E R E N C E S
rithms are not very successful- the present work uses an au-
tomated white-list approach for handling phishing. This tech-
nique verifies the correctness and legitimacy of any webpage
Azeez NA, Otudor AE. Modelling and simulating access control in
by certain hyperlink or URL features and is found as very ef-
wireless ad-hoc networks. Fountain J. Natl. Appl. Sci.
fective (comparatively) as it detects phishing sites with 95.0%
2016;5(2):18–30.
true positive. Ludl C, McAllister S, Kirda E, Kruegel C. On the effectiveness of
For future work, authors are currently working on the in- techniques to detect phishing sites. In: DIMVA ’07 Proceedings
clusion of other features apart from those considered in this of the 4th international conference on Detection of Intrusions
and Malware, and Vulnerability; 2007. p. 20–39. machine-learning framework for detecting phishing web
doi:10.1007/978-3-540-73614-1_2. sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 2011;14(2).
Suryavanshi N, Jain A. A review of various techniques for Aburrous M, Hossain MA, Dahal K, Thabtah F. Intelligent
detection and prevention for phishing attack. Int. J. Adv. phishing detection system for e-banking using fuzzy data
Comput. Technol. (IJACT) 2015;4(3):41–6. mining. Expert Syst. Appl. 2010;37:7913–21. Retrieved from
Adebowale MA, Lwin KT, Sánchez E, Hossain MA. Intelligent www.elsevier.com/locate/eswa.
web-phishing detection and protection scheme using Biedermann S, Ruppenthal T, Katzenbeisser S, et al. Data-centric
integrated features of Images, frames and text. Expert Syst. phishing detection based on transparent virtualization
Appl. 2019;115(2019):300–13. technologies. In: Twelfth Annual Conference on Privacy,
Khonji M, Iraqi Y, Jones A. Phishing Detection: A Literature Security and Trust (PST); 2014. p. 215–23.
Survey. IEEE Commun. Surv. Tutor. 2013;15(4):2091–121. Azeez NA, Iliyas HD. Implementation of a 4-tier cloud-based
doi:10.1109/SURV.2013.032213.00009. architecture for collaborative health care delivery. Nigerian J.
Biedermann S, Ruppenthal T, Katzenbeisser S. Data-centric Technol. Dev. 2016;13(1):17–25.
phishing detection based on transparent virtualization Nureni AA, Irwin B. Cyber security: Challenges and the way
technologies. In: Twelfth Annual Conference on Privacy, forward. Comput. Sci. Telecommun. 2010;29:56–69.
Security and Trust (PST); 2014. p. 215–23. Belabed A, Aïmeur E, Chikh A, et al. A personalized white-list
Cao J, Li Q, Ji Y, He Y, Guo D. Detection of forwarding based approach for phishing webpage detection. In: Seventh
malicious urls in online social networks. Int. J. Parallel International Conference on Availability, Reliability and
Program. 2014;44(1):1–18. doi:10.1007/s10766-014-0330-9. Security; 2012. p. 249–54.
Basnet R, Mukkamala S, Sung AH. Detection of phishing attacks: White JS, Matthews JN, Stacy JL. A method for the automated
a machine learning approach. Soft Comput. Appl. Ind. detection of phishing websites through both site
2008:373–83. characteristics and image analysis. Proc. SPIE 8408, Cyber
Azeez NA, Olayinka AF, Fasina EP, Venter IM. Evaluation of a Sensing 2012, 2012.
flexible column-based access control security model for Choudhary N, Jain AK. Towards filtering of SMS spam messages
medical-based information. J. Comput. Sci. Appl. using machine learning-based technique. In: Advanced
2015;22(1):14–25. Informatics for Computing Research, First International
Azeez NA, Ademolu O. Cyberprotector: identifying compromised Conference, ICAICR 2017 Jalandhar. Springer Nature
urls in electronic mails with bayesian classification. In: 2016 Singapore; 2017. p. 18–30.
International Conference Computational Science and Almomani A, Gupta BB, Wan T, Altaher A, Manickam S. Phishing
Computational Intelligence (CSCI). IEEE; 2016. p. 959–65. dynamic evolving neural fuzzy framework for online
Blasi M. Techniques for Detecting Zero-Day Phishing Websites detection “zero-day” phishing email. Indian J. Sci. Technol.
MSc Thesis. Ames, Iowa: Iowa State University; 2009. 2013;6(1):122–6 VolISSN:0974-6846.
Azeez NA, Babatope AB. AANtID: an alternative approach to Gupta BB, Arachchilage NAG, Psannis KE. Defending against
network intrusion detection. J. Comput. Sci. Appl.. Int. J. phishing attacks: taxonomy of methods, current issues and
Nigeria Comput. Soc. 2016:129–43. future directions. Telecommun. Syst. 2018;67:247–67.
Jain A, Richariya V. Implementing a web browser with phishing doi:10.1007/s11235-017-0334-z.
detection techniques. WorldComput. Sci. Inf. Technol. J. Almomani A, Gupta BB, Atawneh S, Meulenberg A, Almomani E.
(WCSIT) 2011;1(7):289–91 Retrieved 2016. A survey of phishing email filtering techniques. IEEE
Li L, Berki E, Helenius M, Ovaska S. Towards a contingency Commun. Surv. Tutor. 2013:1–21.
approach with whitelist-and blacklist-based anti-phishing Al-garadi MA, Varathan KD, Ravana SD. Cybercrime detection in
applications: what do usability tests indicate? Behav. Inf. online communications: The experimental case of
Technol. 2014;33(11):1136–47. cyberbullying detection in the Twitter network. Comput. Hum.
Sonowal G, Kuppusamy KS. PhiDMA–a phishing detection model Behav. 2016;63(2016):433–43.
with multi-filter approach. J. King Saud Univ.-Comput. Inf. Sci. Al-garadi MA, Varathan KD, Ravana SD. Cybercrime detection in
2020;32(1):99–112. online communications: The experimental case of
Pilton C, Faily S, Henriksen-Bulmer J. Evaluating cyberbullying detection in the Twitter network. Computers in
privacy-determining user privacy expectations on the web. Human Behavior 2016;63:433–43.
Comput. Security 2021;105. Alieyan K, Almomani A, Anbar M, Alauthman M, Abdullah R,
Choudhary N, Jain AK. Towards filtering of SMS spam messages Gupta BB. DNS rule-based schema to botnet detection.
using machine learning-based technique. In: Advanced Enterprise Inf. Syst. 2019. doi:10.1080/17517575.2019.1644673.
Informatics for Computing Research, First International Mishra A, Gupta BB. Intelligent Phishing detection system using
Conference, ICAICR 2017 Jalandhar. Springer Nature similarity matching algorithms. Int. J. Inf. Commun. Technol.
Singapore; 2017. p. 18–30. 2018;12(1–2):51–73. doi:10.1504/IJICT.2018.089022.
Liu G, Qiu B, Wenyin L. Automatic detection of phishing target Azeez NA, Salaudeen BB, Misra S, Damaševičius R, Maskeliūnas R,
from phishing webpage. In: 20th International Conference on et al. Identifying phishing attacks in communication
Pattern Recognition (ICPR); 2010. p. 4153–6. networks using URL consistency features. Int. J. Electron.
Azeez NA, Iyamu T, Venter IM, et al. Grid security loopholes with Security Digital Forens. 2020;12(2):200–13.
proposed countermeasures. In: 26th International Gupta BB, Nalin AG, Arachchilage Psannis KE. Defending against
Symposium on Computer and Information Sciences. London: phishing attacks: taxonomy of methods, current issues and
Springer; 2011. p. 411–18. future directions. Article Telecommun. Syst. 2017:1–22.
Naresh U, Sagar VU, Reddy MC. Intelligent phishing website Almomani A, Wan T, Manasrah A, Altaher A, Baklizi M. An
detection and prevention system by using link guard enhanced online phishing e-mail detection framework based
algorithm. IOSR J. Comput. Eng. (IOSR-JCE) 2013;14(3):28–36. on “Evolving connectionist system”. Indian J. Sci. Technol.
Zhang Y, Hong J, Cranor L. CANTINA: a content-based approach 2013;9(3) Vol.
to detecting phishing websites. In: 16th International Almomani A, Gupta BB, Atawneh S, Meulenberg A, Almomani E.
WorldvWide Web Conference (WWW2007); 2007. p. 639–48. A survey of phishing email filtering techniques. IEEE
Xiang G, Hong J, Rose C, et al. Cantina+: a feature-rich Commun. Surv. Tutor. 2020:1–21 Accepted for Publication.
Nivedha S, Gokulan S, Karthik C, Gopinath R, et al. Improving
phishing URL detection using fuzzy association mining. Int. J. Sanjay Misra is a full Professor of Com-
Engineering and Sci. (IJES) 2017:21–31. puter Engineering at Covenant University
Ayofe AN, Adebayo SB, Ajetola AR, Abdulwahab AF. A framework (400-500 ranked by THE(2019)) Nigeria.
for computer-aided investigation of ATM fraud in Nigeria. Int. He is PhD. in Inf. & Know. Engg (Soft-
J. Soft Comput. 2010;5(13):78–82. ware Engg) from the Uni of Alcala, Spain
Vanhoenshoven F, Napoles G, Falcon R, Vanhoof K, Koppen M, & M.Tech.(Software Engg) from MLN
et al. Detecting malicious URLs using machine learning National Institute of Tech, India. As of
techniques. In: 2016 IEEE Symposium Series on today(21.05.2020)- As per SciVal(SCOPUS-
Computational Intelligence (SSCI). IEEE; 2016. p. 1–8. Elsevier) analysis)- He is the most productive
Bo S, Akiyama M, Takeshi Y, Hatada M, et al. Automating url researcher(no.1) https://t.co/fBYnVxbmiL in
blacklist generation with similarity search approach. IEICE Nigeria during 2012-17,13-18,14-19 & 15-20
Trans. Inf. Syst. 2016;99(4):873–82. (in all disciplines),in comp science no 1 in
Chiba D, Yagi T, Akiyama M, Shibahara T, Yada T, Mori T, Goto S, the country & no 2 in the whole continent.
et al. Domainprofiler: discovering domain names abused in Total around 500 articles (SCOPUS/WoS) with 400 co authors
future. In: Dependable Systems and Networks (DSN), 2016 from around the world (-105 JCR/SCIE) in the core & appl. area
46th Annual IEEE/IFIP International Conference on. IEEE; 2016. of Soft Engg, Web engg, Health Informatics, Cybersecurity, In-
p. 491–502. telligent systems, AI etc. He got several awards for outstanding
Xiong C, Li P, Zhang P, Liu Q, Tan J, et al. Mird: Trigrambased publications (2014 IET Software Premium Award(UK)), and from
malicious url detection implanted with random domain name TUBITAK-Turkish Higher Education and Atilim University). He
recognition. Appl. Tech. Inf. Security 2015;2015:303–14 has delivered more than 100 keynote/invited talks/public lectures
Springer. in reputed conferences and institutes (traveled to more than 60
Jain AK, Gupta BB. A novel approach to protect against phishing countries). He got several awards for outstanding publications
attacks at client side using auto-updated white-list. EURASIP J. (2014 IET Soft. Premium Award(UK)), &from TUBITAK-Turkish
Inf. Security 2016;9:1–11. doi:10.1186/s13635-016-0034-3. Higher Education,& Atilim Uni). He edited 49 LNCS, 4 LNEE, 2 CCIS
Sun B, Akiyama M, Yagi T, Hatada M, Mori T, et al. Autoblg: & 7 IEEE proc, 3 books, EIC of ‘IT Personnel and Project Manage-
Automatic url blacklist generator using search space ment, Int J of Human Capital & Inf Technology Professionals -IGI
expansion and filters. In: 2015 IEEE Symposium on Computers Global & editor in various SCIE journals.
and Communication (ISCC). IEEE; 2015. p. 625–31.
Chiew KL, Tan CL, Wong K, Yong KSC, Tiong WK. A new hybrid Ihotu Agbo Margaret Ihotu Agbo Margaret
ensemble feature selection framework for machine completed her Master of Science (MSc) de-
learning-based phishing detection system. Inf. Sci. gree programme in Computer Science from
2020;484(2019):153–66. the University of Lagos, Nigeria in 2018. She
Hadia W, Aburuba F, Alhawarib S. A new fast associative obtained her BSc (Hons.) degree in Computer
classification algorithm for detecting phishing websites. Appl. Engineering from Madonna University, Nige-
Soft Comput. 2016;48(2016):729–34. ria in 2012. Her research interests include
Mishra A, Gupta BB (2018) Intelligent phishing detection system Cyber Security and Internet of Things (IoT).
using similarity matching algorithms International Journal of She is a member of Nigeria Computer Soci-
Information and Communication Technonology. Vol. 12, Issue ety (NCS).
1-2
Anandita, DPY, Priyanka P, Kumar D, Tripathi R (2017) A Novel Luis Fernandez-Sanz is an associate profes-
Ensemble Based Identification of Phishing Emails. Conference sor at Dept. of Computer Science of Univer-
ICMLC 2-17, February 24–26, 2017, Singapore, Singapore. 2017 sity of Alcalá (UAH). He earned a degree in
ACM. ISBN 978-1-4503-4817-1/17/02 Computing in 1989 at Polytechnic University
Al-Janabi M, Quincey E, Andras P. In: 2017 IEEE/ACM International of Madrid (UPM) and his Ph.D. in Comput-
Conference on Advances in Social Networks Analysis and ing with a special award at University of the
Mining. Using supervised machine learning algorithms to Basque Country in 1997. With more than 20
detect suspicious URLs in online social networks; 2017. years of research and teaching experience
Wu C, Shi J, Yang Y, Li W. Enhancing Machine Learning-Based (at UPM, Universidad Europea de Madrid and
Malware Detection Model by Reinforcement Learning. ICCNS UAH), he has also been engaged in the man-
2018;2018 November 2–4, 2018, Qingdao, China. agement of the main Spanish Computing
Professionals association (ATI: www.ati.es)
Nureni Ayofe Azeez his B.Tech. (Hons.) from as vice-president and he is chairman of ATI
the Federal University of Technology, Akure, Software Quality group. He has held the position of vice-president
Nigeria in 2005, MSc from the University of of CEPIS (Council of European Professional Informatics Societies:
Ibadan, Oyo State, Nigeria in 2008, and Ph.D. www.cepis.org) from 2011-2013 and from 2016 to the present. His
from University of the Western Cape, South general research interests are software quality and engineering,
Africa in 2013, all in Computer Science. His accessibility, elearning and ICT professionalism and education.
areas of research include Security & Privacy,
Access Control, Grid and Cloud Computing, Shafi’i Muhammad Abdulhamid is the for-
Sensor Networks, E-Health and ICT4D. He is mer Head of Department (HOD) and a pi-
a recipient of The Young Scientist Award at oneer member of the first Cyber Security
the 22nd International CODATA Conference Science Department in Nigeria at the Fed-
that was held in Cape Town, South Africa in eral University of Technology (FUT) Minna.
October 2010. He is currently a Senior Lec- He receives his PhD in Computer Science
turer in the Department of Computer Sciences, University of La- from the prestigious University of Technol-
gos, Nigeria. ogy Malaysia (UTM), MSc in Computer Sci-
ence from Bayero University Kano (BUK),
Nigeria and a Bachelor of Technology in
Mathematics with Computer Science from
the FUT Minna, Nigeria. His current research and Technology (JESTHC) Elsevier, Brazilian Journal of Science and
interests are in Cyber Security, Cloud Com- Technology (BJST) Springer to mention but a few. He has also
puting, Soft Computing, Internet of Things Security, Malware De- served as Program Committee (PC) member in many National and
tection and Big Data Analytics. He has published many academic International Conferences. He is one of the pioneer instructors at
papers in reputable International journals, conference proceed- the Huawei Academy of FUT Minna and a holder of Huawei Cer-
ings and book chapters. He has been appointed as an Editorial tified Network Associate (HCNA) and Huawei Certified ICT Profes-
board member for Big Data and Cloud Innovation (BDCI) and sional (HCIP). He is as well a member of IEEE Computer Society,
Journal of Computer Science and Information Technology (JCSIT). International Association of Computer Science and Information
Currently, he is the Chief Editor of a Scopus indexed book with Technology (IACSIT), Computer Professionals Registration Council
IGI Global publishers titled “Advanced Security Strategies in Next of Nigeria (CPN), International Association of Engineers (IAENG),
Generation Computing Models”. He has also been appointed as a The Internet Society (ISOC), Cyber Security Experts Association of
reviewer of several ISI and Scopus indexed International journals Nigeria (CSEAN) and Nigerian Computer Society (NCS). Presently,
such as Journal of Network and Computer Applications (JNCA) El- he is an Assistant Professor of Cybersecurity in IT Department,
sevier, Applied Soft Computing (ASOC) Elsevier, Journal of King Community College of Qatar (CCQ), Doha, Qatar and also doubles
Saud University Computer and Information Sciences (JKSU-CIS) as a Senior Lecturer of Cyber Security Science, FUT Minna, Nige-
Elsevier, Neural Computing and Applications (NCAA) Springer, ria. He is currently supervising both Masters and PhD students in
Cluster Computing Springer, Egyptian Informatics Journal (EIJ) El- Qatar, Malaysia and Nigeria.
sevier, IEEE Access Journal (U.S.A.), Wireless Networks Springer,
Plos One Journal, an International Journal Engineering Science

Azeez 2021

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Azeez 2021

Uploaded by

Copyright:

Available Formats

computers & security 108 (2021) 102328

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cose

Adopting automated whitelist approach for

Nureni Ayofe Azeez a, Sanjay Misra b,c,d,∗, Ihotu Agbo Margaret a,

© 2021 Elsevier Ltd. All rights reserved.

The white-list, on the other hand, contains legitimate web-

Fig. 1 – Traditional method of anti-phishing solutions.

Fig. 2 – The architecture of the proposed anti-phishing system.

the web address supplied by the user. The algorithm is de-

second module will become operational. The role of the sec-

whenever a click is made on a link. The attackers use a null

Immediately the null link attributes are checked and estab-

Fig. 3 – Screenshot showing result of an analyzed legitimate URL.

Fig. 4 – Screenshot of Populated Whitelist Database 1.

Table 2 – Result of the phishing system 1.

Fig. 6 – Level of Accuracy per Dataset.

Table 3 – Result of the phishing system 2.

Fig. 8 – Level of Accuracy per Dataset.

Table 4 – Result of the phishing system 3.

Fig. 10 – Level of Accuracy per Dataset.

Table 5 – Result of the phishing system 4.

Fig. 12 – Level of Accuracy per Dataset.

Table 6 – Overall Result of the phishing system.

Fig. 14 – Level of Accuracy per Dataset.

Table 7 – Details of hyperlinks.

Fig. 15 – Graph showing the details of the hyperlink features.

Authors Degree of ML technique Method name

You might also like