Performance Analysis of Data Mining, Machine Learning and Fuzzy Logic Algorithms For Detecting Phish

Similarity Report ID: oid:27535:15688990
PAPER NAME
Performance analysis of Data mining, Ma

chine learning and Fuzzy logic algorithm
s for detecting phish
WORD COUNT CHARACTER COUNT
5287 Words 28796 Characters
PAGE COUNT FILE SIZE
6 Pages 149.5KB
SUBMISSION DATE REPORT DATE
Apr 4, 2022 10:51 PM GMT+5:30 Apr 4, 2022 10:55 PM GMT+5:30
26% Overall Similarity

The combined total of all matches, including overlapping sources, for each database.
16% Internet database 15% Publications database
Crossref database Crossref Posted Content database
17% Submitted Works database
Excluded from Similarity Report

Bibliographic material Quoted material
Cited material Small Matches (Less then 10 words)
Summary
Performance analysis of Data mining, Machine
learning and Fuzzy logic algorithms for detecting
phishing URLs
Aryan Dosajh (2K18/MC/023) Ashmeet (2K18/MC/025) Anirudh Awasthi (2K18/MC/016)
(Mathematics and Computing) (Mathematics and Computing) (Mathematics and Computing)
Delhi Technological University Delhi Technological University Delhi Technological University
Delhi, India Delhi, India Delhi, India
aryandosajh2 k18mc23@dtu.ac.in ashmeet2 k18mc025@dtu.ac.in anirudhawasthi2 k18mc016@dtu.ac.in
Sumedha Seniaray Payal Dabas

(Mathematics and Computing) (Mathematics and Computing)
Delhi Technological University Delhi Technological University
Delhi, India Delhi, India
sumedhaseniaray@dtu.ac.in payaldabas@dtu.ac.in
Abstract—Phishing is now one of the leading Cyber threats, mation, such as our credit/debit card information? Obviously,
where the victim’s sensitive information like Username, Pass- you will be less hesitant when entering your credentials into a
word, Payment Card details are obtained by an illegitimate trusted e-commerce website, and you will be concerned when
website whose link is generally shared via E-mails. Such sites
are generally fake created by the crook which is made similar to submitting your data into a website where you are hearing
the trustworthy site. These web applications look like an official the name for the first time. This is because you’ve never used
page of any company such as an e-commerce websites, bank this website before, and you’re more likely to input critical
applications, college portals, etc. but have a slight variation in the information on a website that you’ve used previously and have
URLs which the user generally misses out on. These websites can a high trust-factor for. Now, hackers generally take advantage
also be unique websites which aren’t clones of any other websites
but instead promise the user of some kind of rewards in exchange of this psychology, or to put it another way, this trust-factor,
for their personal details. Our aim through this project is to and pose as a trustworthy company in order to steal your
detect such websites so that we can caution the user to proceed to sensitive data or personal information. This is referred to as
such websites only if necessary. We’ve implemented two different phishing. Phishing is an assault on a target’s personal/sensitive
approaches to solve this problem. One being a machine learning information, such as passwords, bank account numbers, and
classification approach which includes Fuzzy pattern classifiers
both the top down and bottom up algorithms while the other email addresses, by impersonating a convincing company.
makes use of data mining as data mining techniques can be an Fishing, as the name implies, is a sport in which we utilise
effective tool to detect phishing websites. a worm or other bug as a bait to capture fish. In a phishing
Index Terms—Fuzzy logic, phishing, data mining, malicious assault, the bait is the website, which is generally a cloned
urls version that appears to be genuine. But then we submit our
useful information and fall prey to hacker’s seduction. You are
I. I NTRODUCTION tricked into entering your personal information on that website
Every day, everyone who uses the internet uses a web by a hacker.
application. They may be accessed through our smartphone
II. TYPES OF PHISHING TECHNIQUES
applications, some on our desktops or laptops, or through a
web browser. Consider the following scenario: we’re using a A. Email Phishing
web browser to purchase something online. Now, when we The majority of phishing assaults are carried out via email.
look for a thing, we come across several websites and we In email phishing, the target is generally unknown. The hacker
like the goods on two of them. Now, one of these websites or thief sends an automated email that looks like it came from
is a well-known, secure, and reliable e-commerce site, while one of our social networking sites. The thief then acquires
the other is less secure and well-known yet offers you a access to your email account by tricking you into putting your
considerably higher discount on your product. Now that we’ve sensitive credentials or data into that. Because your contacts
decided to pay for our merchandise online, we must input are linked to your email account, hackers gain access to all of
our credit/debit card information. The next question is, which them and send them an email including your ID, which they
website would you trust with your personal or sensitive infor- use to deceive them. This is how it all comes together.
B. Spear Phishing which method is more effective and which makes more sense
It is more personalized and more targeted form of phishing. to use for determination of these websites from the original
The target is known well here, a typical scenario is where or the good websites [?], [6]. Since it is feasible to rely on an
a hacker knows everything about the target such as their accurate clustering algorithm that can handle a large number
name, occupation, address, family members, and even their of samples in a reasonable length of time. As a result, a novel
16
hobby. Then the attacker sends an email to the target which is approach, fuzzy rough set–Web robot detection (FRS-WRD),
16
professionally created or cloned as per say which seems like based on fuzzy rough set theory, is suggested in this study to
it comes from a believable source. Then the email is tailored better classify and cluster Web visitors of three real-world Web
18
to each recipient sites. [28]. In [21], the author used the URL to detect phishing
sites automatically by extracting and verifying different terms
20
C. Angular Phishing of a URL through search engine. A real-time anti-phishing
It is also called social media phishing (phishing through system based on seven distinct classification algorithms and
social media). In this form of phishing, a hacker sends out an natural language processing (NLP) characteristics is proposed
email or posts a message on your social media with a link in this paper. The method differs from prior research in the
15
or pretending as a customer service agent and the crook lures literature in the following ways: language independence, use
victims to hand over confidential information. of a large amount of phishing and genuine data, real-time
execution, identification of new websites, independence from
D. Social Engineering third-party services, and use of feature-rich classifiers. A fresh
19
Social contact is used to carry out this sort of phishing. It dataset is created to measure the system’s performance, and
employs psychological techniques to fool users into disclosing the experimental findings are tested on it. According to the
security information. This form of attack is carried out in a experimental and comparative findings of the developed clas-
series of phases. First, the scammer researches the probable sification algorithms, the Random Forest method with solely
weak areas of the targets that would be used in the attack. NLP-based characteristics performs best, with more than 95
13
The scammer then attempts to acquire the victim’s trust percent accuracy rate in detecting phishing URLs [12]. Fuzzy
before presenting a circumstance in which the target provides pattern tree induction, a unique machine learning approach
26
sensitive information. Baiting, scareware, pretexting, and spear for categorization, was recently introduced. A pattern tree is
phishing are some social engineering phishing techniques. a hierarchical, tree-like structure with inner nodes denoted by
4
generalised (fuzzy) logical operators and leaf nodes denoted
E. Links manipulation by fuzzy predicates on input characteristics. A pattern-tree
The major focus of phishing is on links. There are various classifier is made up of a collection of these pattern trees, one
ingenious techniques to modify a URL so that it seems to be a for each class label. This sort of classifier is intriguing for a
13
real URL. One approach is to display harmful URLs as hyper- number of reasons. The learning method switches the pattern
links with names on websites. Another way is to use misspell tree construction direction from bottom-up to top-down. In
URLs that appear like valid URLs, such as ghoogle.com. IDN addition, a new termination criterion is presented that is more
Spoofing is a type of typosquatting that is much more difficult suited to the learning issue at hand [1].
to detect than the previously mentioned link manipulation
3 IV. PROPOSED METHODOLOGY
methods because the attackers use a character in a non-English
language that looks exactly like an English character, such as As we have discussed earlier, what phishing is, types of
a Cyrillic ”c” or ”a” instead of their English counterparts. phishing and the related anti-phishing approaches, that are
generally used in system software and antivirus. Now, there’s
III. RELATED WORK one more feature named Data mining which not only confirms
24
Anti-phishing technology is meant to keep you safe from the full-scale features of the URLs as per the heuristic design
phishing scams. Many anti-phishing approaches, including but also work for very large dataset’s in order to remove those
software-based anti-phishing strategies, have been presented. classes where heuristics couldn’t work up on and to correct
The following are some of those methods: a) Detection by verdict. Please refer to Fig. 1. to understand the logical flow.
Blacklist, b) Detection by Visual similarity c) Detection by Now the techniques we have used are
12
Heuristic based approach. For an effective phishing detection
approach based on machine learning, overall testing findings A. Dataset Description
reveal that when combined with the help of SVM classifier, One of the most difficult aspects of our research was the
the suggested approach has the best performance, successfully paucity of phishing datasets. Despite the fact that several sci-
12 3
discriminating 95.66 percent of phishing and suitable websites entific publications on phishing detection have been published,
while utilising just 22.5 percent of the original functionality most of them have not given the dataset that they utilized in
[11]. People have used many machine learning models to their research. We have used 2 different datasets to implement
detect phishing websites. We have compared these studies our code and algorithms. The first dataset that we used in our
of the following papers and the Machine Learning methods research has been sourced from the UCI repository. It contains
they have used to implement and get a comparative study of 25 features which can be classified into 4 types namely,
Status bar Customization
•
Disabling right click
•
• Using pop-up window
• IFrame Redirection
4) Domain based features

• Age of Domain
• DNS record
B. Machine Learning algorithms

We’ve used multiple classification algorithms which include
Decision Tree, Random Forest ,K near Neighbours, Gradient
Boosting etc.
• Decision Tree- A well-known classification approach is
5
decision tree classifiers. A decision tree is a tree struc-
ture that looks like a flowchart, with an internal node
representing a feature or characteristic, a branch repre-
senting a decision rule, and each leaf node representing
Fig. 1. Flowchart depicting the approach of the project. the conclusion.The root node is the topmost node in a
decision tree. It learns to split data according on the
23 value of an attribute. It recursively splits the tree, which
3
Address Bar Features, Abnormal Based Features, HTML and is known as recursive partitioning. This feature offers
Java Script Based Features and Domain Based Features. The the tree classifier more resolution, allowing it to cope
second dataset which is a part of our work has been sourced with a wider range of data sets, whether numerical or
from Kaggle. It contains more than 130,000 unique entries. categorical.However, the decision tree has a number of
It comprises of 2 columns, the URL itself and the prediction flaws that cause it to abuse data. Furthermore, upgrading
column which has two classes - good or bad. Good indicates a decision tree with fresh samples is challenging.
6
that the URL has nothing phishy going on and the website is • Linear SVC- The Linear Support Vector Classifier (SVC)
safe to visit while bad signals that the URL contains malicious approach conducts classification using a linear kernel
stuff and must be avoided. The granular level details of the function and works well with a large number of data.
first dataset are as follows:- When compared to the SVC model, the Linear SVC adds
1
1) Address bar based features extra parameters such as penalty normalisation (’L1’ or
• Use the IP address ’L2’) and loss function. Because linear SVC is based on
• Long url to hide the suspicious part the kernel linear technique, the kernel method cannot be
7
• Using URL shortening services ”TINY URL” modified. A Linear SVC (Support Vector Classifiergoal
• URLs having ”@” symbols )’s is to fit to the data you give, providing a ”best
• Redirection using ”//” fit” hyperplane that divides or categorises your data.
• Adding prefix and suffix seperated by ”(-)” to the After obtaining the hyperplane, you can next input some
domain characteristics to your classifier to get the ”predicted”
• Sub domains and multi sub domains class. This distinguishes this particular algorithm rather
• HTTPS suitable for our uses, though you can use this for many
• Domain registration length situations
• Favicon • RBF SVC- Radial Basis Function (RBF) kernel support
11
• Using non standard port Vector Classifier(SVC). RBF is the default kernel used
11
• Existence of ”HTTPs” token to the domain part of within the sklearn’s SVM classification algorithm. we
the URL can control individual points’ influence on the overall
2) Abnormal based features algorithm. The larger gamma is, the closer other points
must be to affect the model.
• Request URL
• NuSVC - The nu-support vector classifier (Nu-SVC) is
• URL of Anchor
similar to the SVC with the only difference that the
• Links in ¡meta¿, ¡script¿ and ¡link¿ tags
nu-SVC classifier has a nu parameter to control the
• Server form handler (SFH) 2
number of support vectors. The nu-SVM was proposed by
• Submitting information to email
Scholkopf et al has the advantage of using a parameter
• Abnormal URL
nu for controlling the number of support vectors. The
3) HTML and Javascript based features parameter C in the ordinary SVM formulation is replaced
• Website Forwarding by a parameter nu which is bounded by 0 and 1. Earlier
the parameter C could have taken any positive value, thus universal set is not well specified. Numerous classifier
this additional bound is beneficial in implementation. The systems (classifier ensembles) combine the choices of
parameter nu represents the lower and upper bound on the multiple classifiers to generate a single (crisp or soft)
number of examples that are support vectors and that lie class label [19].
4
on the wrong side of the hyperplane, respectively • Fuzzy Pattern Tree Top Down Classifier - Fuzzy pattern
10
• One Class SVC - SVM is also increasingly being utilised tree induction was recently developed as a revolutionary
in one-class problems, in which all of the data belongs machine learning approach for classification. A pattern
to a single class. In this example, the algorithm is trained tree is a hierarchical, tree-like structure with inner nodes
4
to understand what is ”normal,” so that when new data is denoted by generalised (fuzzy) logical operators and leaf
presented, the algorithm can determine whether or not it nodes denoted by fuzzy predicates on input characteris-
belongs to the group. If not, the new data is categorised tics. A pattern-tree classifier is made up of a collection of
as unusual or anomalous. The machine learning applica- these pattern trees, one for each class label. This sort of
25
tion’s purpose is to use training data to differentiate test classifier is intriguing for a number of reasons,[R. Senge
9
data from a variety of classifications. But what if you just and E. Hüllermeier, ”Top-Down Induction of Fuzzy Pat-
have data from one class and your purpose is to evaluate tern Trees,” [1].
fresh data to see whether it is similar or dissimilar to the We implemented these methods on our first data set. Using
training data? A method for this task, which gained much the textual URL present in our dataset, we have generated
popularity the last two decades, is the One-Class Support multiple features out of it. All our features are categorical and
Vector Machine. have either 1/0 or -1/0/1 as output values. A few examples of
21
• Random Forest- Random Forest, as the name indicates, the categorical variables created are as follows:
is made up of a vast number of distinct decision trees
3 • URL Length- It is observed that extremely lengthy URLs
that work together to determine the output. In a random
have a greater chance of being phishy URLs. Hence we
forest, each tree provides the class prediction, and the
created this as a feature to verify this using our algorithms
outcome is the most predicted class among the decision
and see its impact on the classification. A URL with
of trees. Random Forest achieves such fantastic results
3 length less than 54 gets a value of 1, length between
because the trees defend each other from individual faults.
54 and 75 gets a value of 0 while all others get a value
Although some trees may anticipate the incorrect answer,
28 of -1 since a length of more than 75 is indicative that the
many other trees will correct the final forecast, allowing
website may be phishy.
the trees to advance in the proper direction as a group.The
• Use of URL Shortening Services- Since phishing web-
fundamental disadvantage of Random Forests is their lack
sites obviously can’t have the exact same name as a
of repeatability due to the random nature of the forest
legitimate website, it’s possible that some people might
generation process. Furthermore, because the final model
be able to spot this difference and won’t fall prey to
and subsequent outcomes comprise numerous separate
the scam. To tackle this challenge scammers tend to use
decision trees, they are difficult to comprehend [21].
URL shortening services (like bitly) since these services
• K near neighbours- K-nearest Neighbors (KNN) is a
are also used by legitimate websites and are trust worthy
non-parametric and lazy machine learning approach used
amongst people on the first look. So any website which
for regression and classification problems. In KNN, no
has a shortened URL gets the value of 1 under this
assumptions about the underlying data distribution are
14 column while others get 0.
required. The KNN method predicts the values of new
data points based on feature similarity, which means that We look at many more characteristics of the URL like
a value will be assigned to the new data point based on presence of prefix or suffix, URL having @ symbol, presence
how closely it resembles the points in the training set. of double slash (//) redirecting, whether the URL has a
• Gradient Boosting- Gradient boosting is a type of ma- subdomain or not, etc. and add these to our list of features
chine learning enhancement. It is based on the idea that as well.
when the best probable future model is combined with 8
past models, the overall prediction error is minimised. C. Data mining
The key concept is to establish the intended outcomes Data mining is the technique of predicting outcomes by
for the next model in order to minimise error. How identifying anomalies, patterns, and correlations in huge data
are the goals established? The desired outcome for each sets. You may utilize this information to enhance revenues,
occurrence in the data is defined by how much changing lower expenses, strengthen customer connections, reduce risks,
the forecast of that case changes the total prediction and more by employing a variety of ways [15]. We’re using
inaccuracy. these techniques on our second dataset as we aim to achieve
17
• Fuzzy Pattern Classifier- a technique for automatically better results. Since we have created categorical values based
training the knowledge-representing membership func- on our first dataset, to have a different approach while classi-
tions of a Fuzzy-Pattern-Classification system, which fying the URLs we decided to stay close to Natural Language
works even when there is little data available and the Processing techniques in the second one. This would give more
versatility to our results and help us determine which data TABLE I
pre-processing techniques are better suited for our problem M ACHINE L EARNING C LASSIFIERS
statement of classifying phishy URLs. The techniques we have CLASSIFIERS ACCURACY
used for data mining are as follows:- DECISION TREE 95.587832
RANDOM FOREST 96.875856
• Tokenizer - We’ve used tokenizer because the URL LINEAR SVC 92.765141
string contains some words which are more useful than RBF SVC 94.738284
others in detecting whether the URL is genuine or not. NuSVC 92.244451
One class SVM 48.835297
Examples of such words are ’virus’, ’.dat’, ’.exe’, etc. K NEAREST NEIGHBOURS 95.039737
RegexpTokenizer was used to implement this. This would GRADIENT BOOSTING 95.533023
help our algorithms focus more on such repetitive words.
• Snowball Stemmer - This helps us in identifying the root
words of the different words obtained from the URLs. TABLE II
F UZZY PATTERN C LASSIFIERS
This is useful because not all phishing URLs would
contain identical words but there is a good chance that CLASSIFIERS ACCURACY
they’ll have some words originating from a common root FuzzyPatternClassifier 45.21786791
FuzzyPatternTreeTopDownClassifier 89.03809263
word. Using snowball stemmer we aim to identify such
words and use them in our classification algorithms.
• The final preprocessing step is to remove punctuation TABLE III
marks like commas from our text column so that it can DATA M INING T ECHNIQUES
be passed as input to the algorithms.
CLASSIFIERS ACCURACY
We used our second dataset for this approach by splitting it Logistic Regression 96
into the standard 75-25 train test split ratio. This allowed us Multinomial Naı̈ve Bayes 96
to have a textual data set on which we used algorithms like
logistic regression and multinomial naı̈ve bayes.
VI. CONCLUSION AND FUTURE SCOPE
D. Fuzzy logic approach Phishing url intro 2 lines– This research study shows the
comparison of various techniques of Machine learning and
The aim of our research was to make use of a classifier
data mining classifiers for phishing URL detection with high
that had not been used extensively in previous experiments.
accuracy along with fuzzy techniques and/or its classifiers. Af-
We used the FuzzyPatternTreeTopDownClassifier along with 29
ter performing the experiments, it was concluded that Random
the standard FuzzyPatternClassifier. The first algorithm builds
forest outperformed the others achieving the highest accuracy
fuzzy pattern trees using the top-down method. Basically it is a
of 96.87 per cent. As publicly available dataset was used in
tree like structure which is hierarchical in nature. Its leaf nodes
this work, we plan to employ our model along with NLP
have fuzzy predicates on input attributes and has its inner
techniques for real time phishing URL dataset.
nodes marked with generalized fuzzy-logical operators. The
standard FuzzyPatternClassifier uses the traditional bottom-up R EFERENCES
approach to build fuzzy pattern trees. In simple terms, it is a
27
collection of fuzzy pattern trees where each tree is associated [1] R. Senge and E. Hüllermeier, ”Top-Down Induction of Fuzzy Pattern
22 Trees,” in IEEE Transactions on Fuzzy Systems, vol. 19, no. 2, pp.
with a class and for classifying a new instance a prediction 241-252, April 2011, doi: 10.1109/TFUZZ.2010.2093532.
is made in favor of the class whose tree produces the highest [2] E. Hullermeier, “Fuzzy sets in machine learning and data mining: Status
score. We implemented the fylearn python library to apply the ¨ and prospects,” Fuzzy Sets Syst., vol. 156, no. 3, pp. 387–406, 2005.
[3] Phishtank, “Out of the Net, into the Tank.” [Online].
algorithms to our dataset. The first dataset with the categorical [4] Statistics on Phishing, https://www.statista.com/statistics/266161/websites-
variables was used with these three algorithms and the results most-affected-by-phishing/
obtained in all the three approaches were compared to establish [5] Statistics and Information about Phishing Websites,
the best and ideal method to classify phishing websites [43] https://www.tessian.com/blog/phishing-statistics-2020/, By Maddie
Rosenthal, 12 January 2022
[6] Singh, Priyanka, Yogendra PS Maravi, and Sanjeev Sharma. ”Phishing
websites detection through supervised learning networks.” Computing
V. RESULT and Communications Technologies (ICCCT), 2015 International Con-
ference on. IEEE, 2015, pp. 61-65
We performed 3 approaches using 2 different datasets. First [7] J. Crowe, ‘Phishing by the Numbers: Must-Know Phishing Statistics
approach is where we have used different machine learning 2016’, 2016.
[8] Hawanna, Varsharani Ramdas, V. Y. Kulkarni, and R. A. Rane. ”A novel
classifiers using dataset 1 .In Second approach, we have used algorithm to detect phishing URLs.” Automatic Control and Dynamic
data mining techniques which only includes logistic regression Optimization Techniques (ICACDOT), International Conference on.
and multinomial naive bayes using dataset 2. Third and last IEEE, 2016, pp. 548-552.
[9] Dutta AK (2021) Detecting phishing websites using
approach includes the fuzzy pattern classifiers using dataset 1. machine learning technique. PLoS ONE 16(10): e0258361.
Below are the tables showing different accuracies accordingly. https://doi.org/10.1371/journal.pone.0258361
[10] Wu CY, Kuo CC, Yang CS,” A phishing detection system based on [28] Hamidzadeh, J., Zabihimayvan, M. Sadeghi, R. Detection of Web site
machine learning” In: 2019 International Conference on Intelligent visitors based on fuzzy rough sets. Soft Comput 22, 2175–2188 (2018).
Computing and its Emerging Applications (ICEA), pp 28–32, 2019. https://doi.org/10.1007/s00500-016-2476-4
[11] J. Rashid, T. Mahmood, M. W. Nisar and T. Nazir, ”Phishing Detection [29] N. Sanglerdsinlapachai and A. Rungsawang, ”Using Domain Top-page
Using Machine Learning Technique,” 2020 First International Confer- Similarity Feature in Machine Learning-Based Web Phishing Detection,”
ence of Smart Systems and Emerging Technologies (SMARTTECH), 2010 Third International Conference on Knowledge Discovery and Data
2020, pp. 43-46, doi: 10.1109/SMART-TECH49988.2020.00026. Mining, 2010, pp. 187-190, doi: 10.1109/WKDD.2010.108.
[12] Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri, [30] R. B. Basnet and T. Doleck, ”Towards Developing a Tool to Detect
Machine learning based phishing detection from URLs, Expert Systems Phishing URLs: A Machine Learning Approach,” 2015 IEEE Inter-
with Applications, Volume 117, 2019, Pages 345-357, ISSN 0957-4174, national Conference on Computational Intelligence Communication
https://doi.org/10.1016/j.eswa.2018.09.029. Technology, 2015, pp. 220-223, doi: 10.1109/CICT.2015.63.
[13] Ping Yi, Yuxiang Guan, Futai Zou, Yao Yao, Wei Wang, Ting Zhu, [31] D. Vaishnavi, S. Suwetha, Y. B. Jinila, R. Subhashini and S. P. Shyry, ”A
”Web Phishing Detection Using a Deep Learning Framework”, Wireless Comparative Analysis of Machine Learning Algorithms on Malicious
Communications and Mobile Computing, vol. 2018, Article ID 4678746, URL Prediction,” 2021 5th International Conference on Intelligent
9 pages, 2018. https://doi.org/10.1155/2018/4678746 Computing and Control Systems (ICICCS), 2021, pp. 1398-1402, doi:
10.1109/ICICCS51141.2021.9432138.
[14] Maher Aburrous, M.A. Hossain, Keshav Dahal, Fadi Thabtah, Intelligent
[32] H. Ishibuchi, T. Nakashima and T. Murata, ”Performance evaluation
phishing detection system for e-banking using fuzzy data mining, Expert
of fuzzy classifier systems for multidimensional pattern classification
Systems with Applications, Volume 37, Issue 12, 2010, Pages 7913-
problems,” in IEEE Transactions on Systems, Man, and Cybernetics,
7921, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2010.04.044.
Part B (Cybernetics), vol. 29, no. 5, pp. 601-618, Oct. 1999, doi:
[15] Neda Abdelhamid, Aladdin Ayesh, Fadi Thabtah, Phishing detection 10.1109/3477.790443.
based Associative Classification data mining, Expert Systems with [33] Hisao Ishibuchi, Yusuke Nojima, Analysis of interpretability-accuracy
Applications, Volume 41, Issue 13, 2014, Pages 5948-5959, ISSN 0957- tradeoff of fuzzy systems by multiobjective fuzzy genetics-based
4174, https://doi.org/10.1016/j.eswa.2014.03.019. machine learning, International Journal of Approximate Reason-
[16] Gandotra, E., Gupta, D. (2021). An Efficient Approach for Phishing ing, Volume 44, Issue 1, 2007, Pages 4-31, ISSN 0888-613X,
Detection using Machine Learning. In: Giri, K.J., Parah, S.A., Bashir, https://doi.org/10.1016/j.ijar.2006.01.004.
R., Muhammad, K. (eds) Multimedia Security. Algorithms for Intelligent [34] Julián Luengo, Francisco Herrera, Domains of competence of fuzzy rule
Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-8711- based classification systems with data complexity measures: A case of
5-12 study using a fuzzy hybrid genetic based machine learning method,
[17] Ensieh Modiri Dovom, Amin Azmoodeh, Ali Dehghantanha, David Fuzzy Sets and Systems, Volume 161, Issue 1, 2010, Pages 3-19, ISSN
Ellis Newton, Reza M. Parizi, Hadis Karimipour, Fuzzy pattern tree 0165-0114, https://doi.org/10.1016/j.fss.2009.04.001.
for edge malware detection and categorization in IoT, Journal of [35] S. Ahn et al., ”A Fuzzy Logic Based Machine Learning Tool for Sup-
Systems Architecture, Volume 97, 2019, Pages 1-7, ISSN 1383-7621, porting Big Data Business Analytics in Complex Artificial Intelligence
https://doi.org/10.1016/j.sysarc.2019.01.017. Environments,” 2019 IEEE International Conference on Fuzzy Systems
[18] Y. -S. Chen, Y. -H. Yu, H. -S. Liu and P. -C. Wang, ”Detect phishing (FUZZ-IEEE), 2019, pp. 1-6, doi: 10.1109/FUZZ-IEEE.2019.8858791.
by checking content consistency,” Proceedings of the 2014 IEEE 15th [36] M. Yeganejou and S. Dick, ”Classification via Deep Fuzzy c-Means
International Conference on Information Reuse and Integration (IEEE Clustering,” 2018 IEEE International Conference on Fuzzy Systems
IRI 2014), 2014, pp. 109-119, doi: 10.1109/IRI.2014.7051880. (FUZZ-IEEE), 2018, pp. 1-6, doi: 10.1109/FUZZ-IEEE.2018.8491461.
[19] H. Chapla, R. Kotak and M. Joiser, ”A Machine Learning Approach [37] J. Rhuggenaath, Y. Zhang, A. Akcay, U. Kaymak and S. Verwer,
for URL Based Web Phishing Using Fuzzy Logic as Classifier,” 2019 ”Learning fuzzy decision trees using integer programming,” 2018 IEEE
International Conference on Communication and Electronics Systems International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp.
(ICCES), 2019, pp. 383-388, doi: 10.1109/ICCES45898.2019.9002145. 1-8, doi: 10.1109/FUZZ-IEEE.2018.8491636.
[20] M. Aydin and N. Baykal, ”Feature extraction and classification phish- [38] R. Altilio, A. Rosato and M. Panella, ”A Sparse Bayesian Model for
ing websites based on URL,” 2015 IEEE Conference on Com- Random Weight Fuzzy Neural Networks,” 2018 IEEE International
munications and Network Security (CNS), 2015, pp. 769-770, doi: Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp. 1-7, doi:
10.1109/CNS.2015.7346927. 10.1109/FUZZ-IEEE.2018.8491645.
[39] P. Melin, E. Ramirez and G. Prado-Arechiga, ”A new variant of Fuzzy
[21] J. Stobbs, B. Issac and S. M. Jacob, ”Phishing Web Page Detec-
K-Nearest Neighbor using Interval Type-2 Fuzzy Logic,” 2018 IEEE
tion Using Optimised Machine Learning,” 2020 IEEE 19th Interna-
International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp.
tional Conference on Trust, Security and Privacy in Computing and
1-7, doi: 10.1109/FUZZ-IEEE.2018.8491472.
Communications (TrustCom), 2020, pp. 483-490, doi: 10.1109/Trust-
[40] Kuncheva, Ludmila. (2008). Fuzzy classifiers.. Scholarpedia. 3. 2925.
Com50675.2020.00072.
10.4249/scholarpedia.2925.
[22] M. Zabihimayvan and D. Doran, ”Fuzzy Rough Set Feature Selec- [41] AlexaRank, https://www.alexa.com/siteinfo.
tion to Enhance Phishing Attack Detection,” 2019 IEEE International [42] IC3 Annual Report 2018 https://pdf.ic3.gov/2018IC3Report.pdf
Conference on Fuzzy Systems (FUZZ-IEEE), 2019, pp. 1-6, doi: [43] Ensieh Modiri Dovom, Amin Azmoodeh, Ali Dehghantanha, David
10.1109/FUZZ-IEEE.2019.8858884. Ellis Newton, Reza M. Parizi, Hadis Karimipour, Fuzzy pattern tree
[23] Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri, for edge malware detection and categorization in IoT, Journal of
Machine learning based phishing detection from URLs, Expert Systems Systems Architecture, Volume 97, 2019, Pages 1-7, ISSN 1383-7621,
with Applications, Volume 117, 2019, Pages 345-357, ISSN 0957-4174, https://doi.org/10.1016/j.sysarc.2019.01.017.
https://doi.org/10.1016/j.eswa.2018.09.029.
[24] Babagoli, M., Aghababa, M.P. Solouk, V. Heuristic nonlinear regression
strategy for detecting phishing websites. Soft Comput 23, 4315–4327
(2019). https://doi.org/10.1007/s00500-018-3084-2
[25] DIDIER DUBOIS HENRI PRADE (1990) ROUGH FUZZY SETS
AND FUZZY ROUGH SETS*, International Journal of General Sys-
tems, 17:2-3, 191-209, DOI: 10.1080/03081079008935107
[26] Dan Meng, Xiaohong Zhang, Keyun Qin, Soft rough fuzzy sets
and soft fuzzy rough sets, Computers Mathematics with Applica-
tions, Volume 62, Issue 12, 2011, Pages 4635-4645, ISSN 0898-1221,
https://doi.org/10.1016/j.camwa.2011.10.049.
[27] Anna Maria Radzikowska, Etienne E. Kerre, A comparative study
of fuzzy rough sets, Fuzzy Sets and Systems, Volume 126, Issue 2,
2002, Pages 137-155, ISSN 0165-0114, https://doi.org/10.1016/S0165-
0114(01)00032-X.
26% Overall Similarity

Top sources found in the following databases:
16% Internet database 15% Publications database
Crossref database Crossref Posted Content database
17% Submitted Works database
TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.
abacademies.org
1 3%
Internet
K. J. Somaiya College of Engineering Vidyavihar, Mumbai on 2020-03-01

2 2%
Submitted works
arxiv.org
3 2%
Internet
en.cs.uni-paderborn.de
4 2%
Internet
Rochester Institute of Technology on 2021-05-30

5 2%
Submitted works
Coventry University on 2021-06-14

6 1%
Submitted works
University of Florida on 2016-04-26

7 1%
Submitted works
HCUC on 2022-04-01
8 1%
Submitted works
Sources overview
crownstone.rocks
9 <1%
Internet
Liverpool John Moores University on 2021-11-09

10 <1%
Submitted works
towardsdatascience.com
11 <1%
Internet
D. Vaishnavi, S. Suwetha, Y.Bevish Jinila, R. Subhashini, S.Prayla Shyry...

12 <1%
Crossref
Senge, R, and E Hu llermeier. "Top-Down Induction of Fuzzy Pattern Tr...

13 <1%
Crossref
University of Greenwich on 2021-12-17

14 <1%
Submitted works
researchgate.net
15 <1%
Internet
link.springer.com
16 <1%
Internet
th-owl.de
17 <1%
Internet
Ba Lam To, Luong Anh Tuan Nguyen, Huu Khuong Nguyen, Minh Hoa...
18 <1%
Crossref
avesis.yildiz.edu.tr
19 <1%
Internet
ijict.itrc.ac.ir
20 <1%
Internet
Sources overview
Coventry University on 2021-08-06

21 <1%
Submitted works
coek.info
22 <1%
Internet
Kolati Sri Rama Chandra Murthy, Tanay Bhattacharya, Narendran Rajag...

23 <1%
Crossref
Mehek Thaker, Mihir Parikh, Preetika Shetty, Vinit Neogi, Shree Jaswal....
24 <1%
Crossref
yuhuaqian.net
25 <1%
Internet
pure.aber.ac.uk
26 <1%
Internet
tojqi.net
27 <1%
Internet
Liverpool John Moores University on 2022-03-05

28 <1%
Submitted works
National College of Ireland on 2019-04-07

29 <1%
Submitted works
Sources overview

Performance Analysis of Data Mining, Machine Learning and Fuzzy Logic Algorithms For Detecting Phish

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Analysis of Data Mining, Machine Learning and Fuzzy Logic Algorithms For Detecting Phish

Uploaded by

Copyright:

Available Formats

Similarity Report ID: oid:27535:15688990

Performance analysis of Data mining, Ma

WORD COUNT CHARACTER COUNT

5287 Words 28796 Characters

PAGE COUNT FILE SIZE

SUBMISSION DATE REPORT DATE

Apr 4, 2022 10:51 PM GMT+5:30 Apr 4, 2022 10:55 PM GMT+5:30

26% Overall Similarity

Excluded from Similarity Report

Sumedha Seniaray Payal Dabas

4) Domain based features

B. Machine Learning algorithms

26% Overall Similarity

K. J. Somaiya College of Engineering Vidyavihar, Mumbai on 2020-03-01

Rochester Institute of Technology on 2021-05-30

Coventry University on 2021-06-14

University of Florida on 2016-04-26

Liverpool John Moores University on 2021-11-09

D. Vaishnavi, S. Suwetha, Y.Bevish Jinila, R. Subhashini, S.Prayla Shyry...

Senge, R, and E Hu llermeier. "Top-Down Induction of Fuzzy Pattern Tr...

University of Greenwich on 2021-12-17

Coventry University on 2021-08-06

Kolati Sri Rama Chandra Murthy, Tanay Bhattacharya, Narendran Rajag...

Liverpool John Moores University on 2022-03-05

National College of Ireland on 2019-04-07

You might also like