You are on page 1of 23

SCHOOL OF COMPUTING

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

1156CS701- MAJOR PROJECT (IN-House/Internship/Abroad)

WINTER SEMESTER 19-20

SEMESTER END PROJECT VIVA VOCE EXAMINATIONS


“PHISHING WEB SITES CLASSIFICATION
BASED ON EXTREME LEARNING MACHINE”

SUPERVISED BY PRESENTED BY

Dr.S.Sankara Narayanan M.E.,Ph.D., 1. M.BALAJI (7182)(16UECS0238)


ASSISTANT PROFESSOR 2. R.DIVYASAI (7011)(16UECN0107)
3. G.JAGADEESH SAICHAND (7005)(16UECN0111)

BATCH NO:5 PRESENTED DATE:03-07-2020 1


AGENDA

• ABSTRACT
• OBJECTIVE
• INTRODUCTION
• LITERATURE REVIEW
• DESIGN AND METHODOLOGIES
• STANDARDS & POLICIES USED
• IMPLEMENTATION
• TESTING
• RESULTS
• CONCLUSION
• FUTURE ENHANCEMENTS
• REFERENCES

BATCH NO:5 PRESENTED DATE:03-07-2020 2


ABSTRACT
• Phishing are one of the most common and most damaging strikes one-of cyber
offenses.
• Phishing means copying reliable sites so as to obtain the proprietary
information entered to sites daily for a variety of purposes, like usernames,
passwords and citizenship amounts. The objective of the attacks is to steal the
exact information utilized by businesses and folks to conduct transactions.
• It also extracts personal informations as much as possible. Bookmarking
websites contain various tips among the listing of contents together side web
browser based particulars. The aim of this investigation is to perform extreme
learning machine (ELM) Irvine Machine For results analysis.
• Extreme Learning Machine (ELM) is really a feed-forward artificial neural
network (ANN) model having one hidden layer . ELM does not use image
visulaizatin, text visualization for defining phishing websites because it is very
long process
• ELM concentrates on url and it describes it is phishing website or not ELM has
been compared with Naive Bayes (NB) and also discovered to have the most
accuracy
PRESENTED DATE:03-07-2020 3
BATCH NO:5
OBJECTIVES
• Phishing really are among the most familiar and most damaging strikes among
cyber-crimes. The purpose of the attacks will be to steal the info used by
organizations and individuals to run trades.
• The objective of the study is to classify or detect phishing websites,Reduce cyber
bulling using phishing website ,To protect information and private data. Extreme
Learning Machine (ELM) is used classification for URL Data from UC Irvine
Machine Learning Repository database
• The current project is geared toward classification of phishing websites
dependent on the qualities. For this we’ve chosen the phishingdata set that
gathered from uc machine learning tank and we’ve assembled our version with
three distinct classifiers such as SVM, NaiveBayes.
• ELM and we all have good accuracy scores. There’s a range to enhance it
farther if we are able to obtain significantly more data our job will probably soon
be more effective and also we will get excellent results

BATCH NO:5 PRESENTED DATE:03-07-2020 4


INTRODUCTION
• Internet usage is now a vital element of our everyday tasks as a
consequence of fast growing technology. As a result of the rapid increase
of technology and intensive usage of digital programs, data security of
those systems has attained great relevance.
• The main aim of keeping safety in information technologies would be
always to make certain that necessary steps are taken against dangers and
dangers more likely to be faced by users throughout using the
technologies.
• Phishing is understood to be copying reliable internet sites so as to
acquire the proprietary information entered to internet sites daily for a
variety of purposes, like user names, passwords and citizenship amounts.
Bookmarking sites comprise various hints in all of their contents and also
web browser-based info.
• Contents of the site or even the email include asks planning to lure the
people to input or update their own private information or to alter their
passwords in addition to hyperlinks to sites that seem like precise
duplicates of their sites of these associations worried.
BATCH NO:5 PRESENTED DATE:03-07-2020 5
LITERATURE REVIEW
[1]Srushti Patil (2019) A Methodical Overview on Phishing Detection along with an Organized Way
to Construct an Anti-Phishing Framework 978-1-5386-9533- 3/19 2019.

In this paper authour said that, They contrasted the models using all of the 5 kinds of
approaches depending on the range of features used, size and precision of data set.

[2]Abdulghani Ali Ahmed, Nurul Amirah Abdullah (2016) 978-1-5090-0996-1/16 2016


Real Time Detection of Phishing Websites.
In this paper authour said that ,This analysis simply assesses the validity of Universal
Resource Locator (URLs) predicated on some faculties for discovering phishing attack.

[3]Shraddha Parekh ,Dhwanil Parikh(2018) A new method for Detection of Phishing


Websites. URL Detection978-1-5386-1974-2/18 2018 IEEE952.

In this paper authour said that ,Future work will aim to develop a system that can learn
by itself about new types of phishing attacks by adding a more enhanced feature to the
detection process.

BATCH NO:5 PRESENTED DATE:03-07-2020 6


STANDARDS & POLICIES USED

• Python Notebook
Software Reqirements
• OS Windows or Linux
• Python IDE : python 2.7.x and above
• Jupyter IDE
• Setup tools and pip to be installed for 3.6 and above
• Language: Python Scripting
Hardware Requirements
• Processor: 2.0 GHz
• RAM: 4GB and Higher 9
• Processor: Intel i3 and above
• Hard Disk: 500GB: Minimum

BATCH NO:5 PRESENTED DATE:03-07-2020 7


DESIGN AND METHODOLOGIES

• ELM is with just one hidden layer. For the ANN to ensure a highperforming learning,
parameters like threshold value, activation and weight function has to possess the
ideal values for the data system needs to be modeled.
• In gradient-based learning strategies, all these parameters are shifted iteratively to get
appropriate worth. Hence, they could be slow and create low-performing outcomes
due to the likelihood of being stuck in local minima.
• In ELM learning processes, differently from ANN that renews its parameters as
gradient-based, input weights are randomly selected while output weights are
analytically calculated.
• As an analytical studying process appreciably reduces both the It increases the
performance ratio.
• So as to trigger the cells from the hidden layer of ELM, a terminal function as well as
non linear (sigmoid, sinus, Gaussian), non-derivable or discrete activation works may
be used.

BATCH NO:5 PRESENTED DATE:03-07-2020


8
IMPLEMENTATION
• ARCHITECTURE DIAGRAM

BATCH NO:5 PRESENTED DATE:03-07-2020 9


• DATA FLOW DIAGRAM

PRESENTED DATE:03-07-2020
BATCH NO:5
10
• SEQUENCE DIAGRAM

BATCH NO:5 PRESENTED DATE:03-07-2020 11


• COLLABORATION DIAGRAM

12
BATCH NO:5 PRESENTED DATE:03-07-2020
TESTING

UNIT TESTING
• This testing method believes a module as single unit And tests the machine at
ports and communicates with other modules rather than getting right into
details at invoice level.
• Here the module will probably be medicated as a black box, that’ll need a
while and generate output signal. Outputs for a given set of input combination.

BATCH NO:5 PRESENTED DATE:03-07-2020 13


• INTEGRATION TESTING
• Testing is an important quality control measure employed during software
creation. Its basic role is to find errors.
• Sub functions when combined may well not produce than it is desired. Global data
structures can represent the issues. Integrated testing is a systematic method of
building this program structure whilst conducting the evaluations.
• To find errors which are connected with all of the goal is to earn unit evaluation
modules and built a program structure that’s been detected by design. In an non -
incremental integration most of the modules are all united in improvement and this
program is analyzed as a complete.
• Here errors will come at a end-less fold function. In incremental testing this
program is assembled and analyzed in small segments where the errors have been
isolated and corrected.

BATCH NO:5 PRESENTED DATE:03-07-2020 14


• FUNCTIONAL TESTING

• Here all of the pre tested individual modules will be assembled to make the bigger
tests and system are carried out at platform level to ensure all modules are still
working in synchronous with one another.
• This testing methodology helps in making certain that all modules that are working
flawlessly when checked individually will also be running in cohesion with different
modules.
• For this testing we create test cases to look at all modules once and then generated test
combinations of evaluation avenues through out the system to ensure no course is
making its way into chaos

BATCH NO:5 PRESENTED DATE:03-07-2020 15


• WHITE BOX TESTING

White box testing involves the testing of software code for


• Security holes
• Flow of inputs
• Expected output
• The functionality of loops

PRESENTED DATE:03-07-2020 16
BATCH NO:5
• BLACK BOX TESTING

• This method is employed when understanding of this specified function a


product has been supposed to do is understood.
• The notion of black box is applied to rep resent a platform whose inside
workings aren’t readily available to inspection.
• In a blackbox that the evaluation item is a”Dark”, since its logic remains
unknown, all that’s understood is exactly what moves into and what comes
out, or the output and input .
Black box testing attempts to find errors in the Following classes

• Incorrect Or missing works


• Interface errors
• Errors In data arrangement
• Performance mistakes
• Initialization And judgment mistakes

PRESENTED DATE:03-07-2020
BATCH NO:5 17
RESULTS

PRESENTED DATE:03-07-2020
BATCH NO:5 18
BATCH NO:5 PRESENTED DATE:03-07-2020 19
CONCLUSION
We defined features of phishing attack and also we proposed a classification version in
order to classification of those phishing attacks.
This procedure is composed of feature extraction from websites and classification
section. In the feature extractionwe have clearly defined rules of malware feature
extraction and these rules are utilized for receiving features.
In order to classification of those feature, SVM, both NB and also ELM were all used.
At the ELM, 6 distinct activation functions were used and ELM achieved highest
accuracy score

BATCH NO:5 PRESENTED DATE:03-07-2020 20


FUTURE ENHANCEMENTS
• The present project is aimed at classification of phishing websites based on the
features. For that we have taken the phishing dataset which collected from uci
machine learning repository and we built our model with three different
classifiers like SVC, NaIve Bayes.
• ELM and we got good accuracy scores. There is a scope to enhance it further .
• IF we can have more data our project will be much more effective and we can
get very good results.
• For this we need API integrations go get the data of different website .

BATCH NO:5 PRESENTED DATE:03-07-2020 21


REFERENCES
[1] Srushti Patil (2019) A Methodical Overview on Phishing Detection along with an
Organized Way to Construct an Anti-Phishing Framework 978-1-5386-9533- 3/19
2019.

[2] Abdulghani Ali Ahmed, Nurul Amirah Abdullah (2016) 978-1-5090-0996-1/16


2016 Real Time Detection of Phishing Websites.

[3] Shraddha Parekh ,Dhwanil Parikh(2018) A new method for Detection of Phishing
Websites. URL Detection978-1-5386-1974-2/18 2018 IEEE952.

[4] Zuochao Dou 2017. Systematization of Knowledge (SoK): A Systematic Review of


Software Based Web Phishing Detection 10.11.17 09, IEEE

BATCH NO:5 PRESENTED DATE:03-07-2020 22


Thank you

BATCH NO:5 PRESENTED DATE:03-07-2020 23

You might also like