You are on page 1of 24

TIRUMALA ENGINEERING COLLEGE

DETECTION OF CYBER ATTACKS IN NETWORK


BY USING MACHINE LEARNING

Presented by Guided by
Submitted for the partial fulfillment
of
R.Jaya Surya(19NE1A1247)
K.Kalyani (19NE1A1225)
BACHELOR OF TECHNOLOGY D.PAVAN
G.Lavanya (19NE1A1232) in KUMAR
M.Ramya(19NE1A1237) INFORMATION TECHNOLOGY Asst.Prof ,
B.TECH,M.TECH

November -
DEPARTMENT OF INFORMATION TECHNOLOGY 2022
Introduction
Cyber-attacks are increasing within the cyber world. There ought to be some
advanced security measures taken to scale back or avoid the amount of cyber-
attacks. There are various attacks like D-Dos attacks, Man within the middle,
information escape, PROBE, User-To Root, Remote-To Local. These attacks are
utilized by the hackers or intruders to realize the unauthorized access to any non-
public network, websites, information or perhaps in our personal computers.Cyber
security refers to the science of technologies, processes, and practices designed to
shield networks, devices, programs, and information from attacks, damage, or
unauthorized access. They need to be used for locating helpful data from varied
audit datasets, which are applied to the matter of intrusion detection. With the
assistance of Machine learning technology, we will deploy these ideas in cyber
security to boost the protection measures within the intrusion detection system
Objectives
The main objective of this project is to detect malware or botnet traffic from a NetFlow
dataset using different Machine Learning approaches. More specifically, our proposed
approach seeks to:
• Detect malware or botnet traffic from a Netflow data. The system should take any
Netflow dataset of any size, clean or with malware, and classify as either normal or
attack traffic.
• Compare a variety of Machine Learning methods and recommend the suitable one for
specific use cases
EXISTING SYSTEM
Blameless Bayes and Principal Component Analysis (PCA) were been used with the
KDD99 dataset by Almansob and Lomte.Similarly, PCA, SVM, and KDD99 were used
Chithik and Rabbani for IDS . In Aljawarnehetal's. Paper, their assessment and
examinations were conveyed reliant on the NSL-KDD dataset for their IDS model [11]
Composing inspects show that KDD99 dataset is continually used for IDS [6]-
[10].There are 41 highlights in KDD99 and it was created in 1999. Consequently,
KDD99 is old and doesn't give any data about cutting edge new assault types, example,
multi day misuses and so forth. In this manner we utilized a cutting-edge and new
CICIDS2017 dataset [12] in our investigation
EXISTING SYSTEM DRAWBACKS

• Strict Regulations

• Difficult to work with for non-technical users

• Restrictive to resources

• Constantly needs Patching

• Constantly being attacked


PROPOSED SYSTEM

Important steps of the algorithm are given in below.

1) Normalization of every dataset.

2) Convert that dataset into the testing and training.

3) Form IDS models with the help of using RF, ANN, CNN and SVM Algorithms
4) Evaluate every model’s performances
PROPOSED SYSTEM ADVANTAGES

• Protection from malicious attacks on your network.


• Deletion and/or guaranteeing malicious elements within a preexisting
network.
• Prevents users from unauthorized access to the network.
System Requirements
HARDWARE REQUIREMENTS
Workstation : Lenevo
Processor IntelCore i5
RAM 8GB
GPU 16GB(NVIDIAGEFORCEGTX)
HardDisk Drive 1 TB
 

SOFTWARE REQUIREMENTS
Operating System : Windows10

Machine Learning Framework TensorFlow 1.1


GUI framework Flask
Supporting Libraries Pandas,NumPy
Programming Language Python 3.7
SYSTEM ARCHITECTURE
MACHINE LEARNING ALGORITHMS USED
✔ RANDOM FORESTS
✔ SUPPORT VECTOR MACHINES
✔ NEURAL NETWORKS
Random Forest Classifier

• Random Forest is a popular machine learning algorithm that belongs to


the supervised machine learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the concept
of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of
the model.
• Instead of depending on a single decision tree we can actually can
improve our model prediction accuracy using random forest which can be
a combination of the number of decision trees.
• Select random samples from a given dataset.
• Construct a decision tree for each sample and get a prediction result from each
decision tree.
• Perform a vote for each predicted result.
• Select the prediction result with the most votes as the final prediction.
SUPPORT VECTOR MACHINES
The Support Vector Machine method is an algorithm that uses kernels to transform the data
space and then try to find a separate line to split the data into two classes.

NEURAL NETWORKS
Recently, neural networks have gained popularity since they perform very well when a lot of
data is available. We test here a simple dense (or fully connected) neural network with 2 hidden
layers: the first one has 256 neurons and the second one has 128. The parameters of the neural
network are composed of a batch-normalization, no dropout, a ReLU activation function
(except for the output layer where a sigmoid function is used). The model has 39 681 trainable
parameters and 768 non-trainable parameters.
IMPLEMENTATION
Python Libraries used:
• Numpy
• Seaborn
• Sklearn
• matplotlib
• Pandas

All the above libraries should be installed in our python Environment before we start working with our
model.
We use import command to import all the above libraries to our code.
LOADING DATA SETS
RANDOM FORESTS
NEURAL NETWORK
SUPPORT VECTOR MACHINE
CONCLUSION

Right now, estimations of help vector machine, ANN, CNN, Random Forest and profound
learning calculations dependent on modern CICIDS2017 dataset were introduced relatively.
Results show that the profound learning calculation performed fundamentally preferable
outcomes over SVM, ANN, RF and CNN. We are going to utilize port sweep endeavors as well
as other assault types with AI and profound learning calculations, apache Hadoop and sparkle
innovations together dependent on this dataset later on. All these calculation helps us to detect
the cyber attack in network. It happens in the way that when we consider long back years
there may be so many attacks happened so when these attacks are recognized then the features
at which values these attacks are happening will be stored in some datasets. So by using these
datasets we are going to predict whether cyber attack is done or not. These predictions can be
done by four algorithms like SVM, ANN, RF, CNN this paper helps to identify which algorithm
predicts the best accuracy rates which helps to predict best results to identify the cyber attacks
happened or not.
FUTURE ENHANCEMENT

In future the model can be optimized to handle imbalanced datasets


from various sources and domains. Also, the model can be modified
for applying on Hadoop Map Reduce platform
References
➢ K. Graves, Ceh: Official certified ethical hacker review guide: Exam 312-
50.John Wiley & Sons, 2007.
➢ R. Christopher, “Port scanning techniques and the defense against them,”
SANSInstitute, 2001.
➢ S. Staniford, J. A. Hoagland, and J. M. McAlerney, “Practical
automateddetection of stealthy portscans,” Journal of Computer Security,
vol. 10, no. 1-2, pp105–136, 2002.
➢ S. Robertson, E. V. Siegel, M. Miller, and S. J. Stolfo, “Surveillance
detection inhighbandwidth environments,” in DARPA Information
Survivability Conferenceand Exposition, 2003. Proceedings, vol. 1. IEEE,
2003, pp. 130–138.
➢ K. Ibrahimi and M. Ouaddane, “Management of intrusion detection
systemsbased-kdd99
References
➢ N. Moustafa and J. Slay, “The significant features of the unsw-nb15 and thekdd99 data set for
network intrusion detection systems,” in Building AnalysisDatasets and GatheringExperience
Returns for Security (BADGERS), 2015 4th International Workshop onIEEE, 2015, pp. 25–31.
➢ L. Sun, T. Anthony, H. Z. Xia, J. Chen, X. Huang, and Y. Zhang, “Detectionandclassification of
malicious patterns in network traffic using benford’s law,” inAsia-Pacific Signal and Information
Processing Association Annual Summit andConference (APSIPA ASC), 2017. IEEE, 2017, pp. 864–
872.
➢ S. M. Almansob and S. S. Lomte, “Addressing challenges for intrusion detectionsystem using naive
bayes and pca algorithm,” in Convergence in Technology(I2CT), 2017 2nd International Conference
for. IEEE, 2017, pp. 565–568.
➢ M. C. Raja and M. M. A. Rabbani, “Combined analysis of support vectormachine and principle
component analysis for ids,” in IEEE InternationalConference on Communication and Electronics
Systems, 2016, pp. 1–5. 23
24

You might also like