You are on page 1of 65

A FEDERATED LEARNING APPROACH TO DETECT AND AVOID

SOURCES OF MISCLASSIFIED CYBER ATTACK DATA

A REPORT
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE AWARD OF THE DEGREE
OF
BACHELOR OF ENGINEERING
IN
DIVISION OF INFORMATION TECHNOLOGY

Submitted by:
RISHABH SETIYA 2018UIT2597
SRIRAM M. PANT 2018UIT2623
AYUSH GOEL 2018UIT2582

Under the supervision of


DR DEEPAK KUMAR SHARMA

DIVISION OF INFORMATION TECHNOLOGY


NETAJI SUBHAS INSTITUTE OF TECHNOLOGY
UNIVERSITY OF DELHI
DECEMBER, 2021
DECLARATION

Division of Information Technology

University of Delhi

Delhi-110007, India

We, (Rishabh Setiya Roll No. 2018UIT2597, Sriram M. Pant Roll No. 2018UIT2623 and

Ayush Goel Roll No. 2018UIT2582) students of B. E., Division of Information

Technology, hereby declare that the Project-Thesis titled “A Federated Learning

Approach to Detect and Avoid Sources of Misclassified Cyber Attack Data” which is

submitted by us to the Division of Information Technology, Netaji Subhas Institute of

Technology, Delhi (University of Delhi) in partial fulfillment of the requirement for the

award of the degree of Bachelor of Engineering, is original and not copied from any

source without proper citation. This work has not previously formed the basis for the

award of any degree.

Rishabh Setiya Sriram M. Pant Ayush Goel


2018UIT2597 2018UIT2623 2018UIT2582

Place: Delhi
Date: 20 December 2021

i
CERTIFICATE

Division of Information Technology

University of Delhi

Delhi-110007, India

This is to certify that the work embodied in the Project-Thesis titled “A Federated

Learning Approach to Detect and Avoid Sources of Misclassified Cyber Attack Data” has

been completed by Rishabh Setiya Roll No. 2018UIT2597, Sriram M. Pant Roll No.

2018UIT2623 and Ayush Goel Roll No. 2018UIT2582 of B.E., Department of

Information Technology, under my guidance and supervision towards fulfillment of the

requirements for the award of the degree of Bachelor of Engineering. This work has not

been submitted for any other diploma or degree of any university.

Dr Deepak Kumar Sharma


SUPERVISOR
Place: Delhi
Date: 20 December 2021

ii
ABSTRACT
Federated learning is a form of deep learning in which the training data stays at the source

while the model gets trained from the data arising from a number of sources. The model

is communicated between a central server and the data source nodes which train it

independently. The models are averaged based on the amount of data available with the

sources.

In such a scenario, it is possible that some of the data sources provide mislabelled data

which could be due to a deliberate attempt to tamper with the learning process or inability

to correctly label the data. Our attempt is to identify such sources and gradually reduce

the priority that we give them by varying the learning rate.

We assign trust parameter to each of the data sources involved in training. Then we run a

single epoch of training on each of the sources in a loop. A test dataset is available at the

central server. It is used for checking whether the previous epoch had a constructive or

destructive effect on the model. That is, whether the accuracy of the model improved or

degraded on the test dataset. If the effect was constructive, then the trust parameter of that

source is increased and if was destructive then the trust parameter is decreased.

The expectation from this experiment is that the trust parameter of malicious sources

would eventually drop to nil and the trust parameter of trustworthy sources would keep

increasing.

The dataset used in this project is an intrusion detection dataset which contains labelled

data corresponding to “Benign” and “Attack” classes. The model that we use for

classification is an autoencoder which classifies a data point as an anomaly if the

reconstruction error is higher than a threshold.

iii
LIST OF CONTENTS
DECLARATION i

CERTIFICATE ii

ABSTRACT iii

LIST OF CONTENTS iv

LIST OF FIGURES vi

CHAPTER 1 INTRODUCTION AND BACKGROUND 1-18

1.1 STATEMENT OF OBJECTIVE 2

1.2 NEURAL NETWORK 2

1.2.1 FEED-FORWARD NEURAL NETWORKS 3

1.2.2 CONVOLUTIONAL NEURAL NETWORKS 4

1.2.3 RECURRENT NEURAL NETWORKS 6

1.3 AUTOENCODERS 6

1.4 FEDERATED LEARNING 10

1.4.1 LIFE OF A FEDERATED LEARNING MODEL 12

1.5 CYBER ATTACKS 14

1.5.1 BRUTE FORCE 14

1.5.2 DENIAL OF SERVICE 15

1.5.3 CROSS SITE SCRIPTING 16

1.5.4 SQL INJECTION 17

1.5.5 BOTNET 17

1.5.6 PORT SCAN 18

CHAPTER 2 RELATED WORK 19-23

iv
CHAPTER 3 METHODOLOGY 25-38

3.1 LIBRARIES USED 25

3.1.1 TENSORFLOW 25

3.1.2 PANDAS 25

3.1.3 MATPLOTLIB 26

3.1.4 SCIKIT-LEARN 26

3.2 PROPOSED ALGORITHM 27

3.2.1 CLOUD ALGORITHM 27

3.2.2 EDGE ALGORITHM 30

3.2.3 SETTING THRESHOLD 31

3.3 MATHEMATICAL JUSTIFICATION 33

3.4 SIMULATION 36

3.4.1 ABOUT CIC-IDS-2017 DATASET 36

3.4.2 APPROACH FOLLOWED 37

CHAPTER 4 RESULTS AND DISCUSSION 39-40

CHAPTER 5 CONCLUSION AND FUTURE WORK 41-42

5.1 CONCLUSION 41

5.2 FUTURE WORK 41

REFERENCES 43

PLAGIARISM REPORT 47

APPENDIX 54

v
LIST OF FIGURES

Figure Caption Page No.

1.1 Architecture of a feed-forward deep neural network 3

1.2 Convolutional layer having filter size 3x3 and input size 5x5 4

1.3 Pooling layer of size 2x2 with stride 2 5

1.4 Convolutional Neural Network Architecture 5

1.5 Recurrent Neural Network Architecture 6

1.6 Architecture of an autoencoder with three hidden layers 7

1.7 Basic form of CAE architecture 8

1.8 Variational autoencoder architecture 9

1.9 Denoising autoencoder architecture 10

1.10 Visualization of steps of federated learning 12

1.11 Life of a Federated Learning Model 12

1.12 Representation of a Denial-of-Service Attack 15

1.13 Representation of a Distributed Denial-of-Service Attack 16

1.14 Representation of Cross Site Scripting attack 17

3.1 RMSE distribution for Friday Morning Working Hours after 32


training on benign data (Monday)

3.2 Initial view of mathematical description presented 34

3.3 The model is guided by ‘good’ force 35

3.4 The model realizes its ‘mistake’ 35

4.1 Variation of trust of the four sources against epoch 39

vi
CHAPTER 1 INTRODUCTION

It's a struggle to design safe networks and systems for everyday usage as the

world becomes increasingly reliant on computers and automation [1]. With the

rise of online marketplaces and services, the number of security dangers to

businesses grows tremendously. The majority of company decisions are now

based on data [2]. With more data and information available than ever before,

it's more critical than ever to properly analyse and evaluate it. There exists

nothing such as an impenetrable network or a fool proof firewall. Attackers

are constantly coming up with new exploits and attack methods to bypass your

security.

These attacks may be with the intention of gathering confidential data,

encrypting the data on a machine or simply hampering the normal functioning

of devices.

These attacks can sometimes be identified from the flow level data, which is

information summarized from a number of packets. This flow level data can

be fed to a machine learning model to identify the attacks.

However, there are concerns of privacy, as this flow level data may be

confidential to the organization to which the computer belongs. This issue can

be tackled by federated learning [3]. Federated learning is an emerging

paradigm in which the training data does not move from one machine to

another but the updates to model weights are sent to the centralized server.

Nevertheless, there may be machines on which the data does not get correctly

classified and attacks remain undetected. Some malicious entrants in the

1
network may deliberately tamper the data so that the model does not gain or

loses its ability to identify attacks.

The aim of this project is to identify such malicious actors and reduce their

ability to damage the model. If these actors turn helpful later on, then their

training ability should be restored back so that their data also helps in

betterment of the model.

Hence, we approach the problem with a rewarding, penalizing and forgiving

nature.

1.1 STATEMENT OF OBJECTIVE

Formulation of a machine learning protocol which

• detects data flows as benign or attack

• can perform even if some data sources are poisoned

• does not require aggregation of data from all sources

• keeps actual data secure and unavailable to any party except the

source itself

1.2 NEURAL NETWORKS

Neural networks [4] are a subset of machine learning inspired from the

functioning of neurons in biology. They consist of an input layer, an output

layer and one or more hidden layers.

They are widely used for handwriting recognition, image compression,

prediction of future from past trends in areas like stock market, decision

making in granting of loans and various other situations.

2
Fig 1.1: Architecture of a feed-forward deep neural network

1.2.1 Feed-Forward Neural Networks

In feed-forward neural networks [5], the inputs enter through the first layer,

propagate in the forward direction and the last layer produces the outputs. Each

neuron sums up the inputs coming into it and outputs the result after applying

an activation function on the sum.

For the neural network to learn, it must modify its weights to produce the

expected results on the output layer. In the beginning, however, the outputs

are totally random.

The difference between the previous output and the expected output propagates

in the backward direction and depending on this error the weights of each layer

get modified slightly. This process is repeated until the error is less than a

threshold.

3
For many cases, a single hidden layer is sufficient to give good results. In this

project, a simple feed-forward neural network with a single hidden layer is

used. Training takes place through backpropagation.

Fig 1.2: Convolutional layer having filter size 3x3 and input size 5x5

1.2.2 Convolutional Neural Networks

The simple feed-forward neural networks may give a decent result on some

simple binary patterns like handwritten characters but they do not generalize

well to visual inputs like colour photographs and videos.

A convolutional layer [6] consists of a matrix of weights (called filter) which

slides over the input matrix. The element-wise products are summed up to give

a single element of the output matrix.

The number of rows that the filter shifts towards right in each step and then

down after it reaches the right end is known as stride.

4
Another layer used in CNNs is known as pooling layer. Each cell of output of

pooling layer is maximum or average of the values of a smaller region of the

input.

Fig 1.3: Pooling layer of size 2x2 with stride 2

The last few layers of CNNs are fully connected. This means that the 2D matrix

is converted to a 1D array and the rest of the operations are similar to simple

feed-forward neural network.

Fig 1.4: Convolutional Neural Network Architecture


5
The idea proposed in this project can be extended and tested on datasets of

images also by using a CNN instead of a simple feed-forward architecture.

1.2.3 Recurrent Neural Networks

Fig 1.5: Recurrent Neural Network Architecture

RNNs [7] are used for sequential or time series data. The output of the previous

step is also fed as input to every neuron in the recurrent layer. The idea

proposed in this project may be extended to RNNs too by using LSTM

autoencoder.

1.3 AUTOENCODERS

Auto associative memory, is a class of neural networks. The output layer

consists of as many neurons as the input layer and the model is trained to

replicate the identity function.

6
This means that the input features and the expected outputs are the same. The

purpose of training such a model is to check whether a particular test pattern

is same or somewhat similar to the patterns on which the model was trained.

Autoencoders [8] are a type of auto associative memory which consist of an

encoder part followed by a decoder part. The difference between the original

pattern and output of the network is used to calculate the error known as

reconstruction error.

A low value indicates that the pattern is similar to training patterns while a

high value tells us that it is different from those patterns.

An appropriate value of threshold needs to be set to distinguish the normal

data from abnormalities or anomalies.

Fig 1.6: Architecture of an autoencoder with three hidden layers

7
There are three components in autoencoder:

• Encoder: In an encoder, the model reduces the input dimensions and

converts the data into an encoded form.

• Bottleneck: This is the layer that includes the compressed form of input

data. This is layer in which the input data is converted to lowest number

of dimensions.

• Decoder: In this, the model recreates the data from the encoded form to

be as similar to the actual input as it can.

Some standard autoencoders are described briefly below:

1. Convolutional Autoencoder (CAEs): It encodes or breaks down the input

into a set of basic signals and then attempts to rebuild the input from all of

those signals. We can also utilize CAE [9] to alter the image geometry or to

produce reflectance. Encoder layers are convolution and pooling layers in

this type of autoencoder, whereas decoder consists of transpose convolution

(deconvolution is a misnomer for this) and unpooling layers.

Fig 1.7: Basic form of CAE architecture

8
2. Variational Autoencoders (VAEs): New images can be produced by using

this type of autoencoder similarly like in Generative Adversarial Networks

(GANs). A lot of assumptions are made by variational autoencoders [10]

when it comes to the distribution of latent variables. For latent

representation learning, they adopt a variational technique, that results in an

explicit loss factor and a Stochastic Gradient Variational Bayes estimator

for the training process. A variational autoencoder's probability distribution

of the latent vector often matches the training data much better than a

normal autoencoder. VAEs are appropriate for artwork production of any

kind since their production nature is significantly more adaptable and

configurable than GANs. Fig x. shows the basic architecture of VAE.

Fig 1.8: Variational autoencoder architecture

3. Denoising Autoencoders: In this type of autoencoder, additional noise is

added to the source images [11] and then noise removal learning is performed.

As a result, rather than duplicating the input to the output, data features are

learned. During training, these autoencoders use a slightly degraded input to

recover the original unaltered input. To cancel out the extra noise, there is

the learning of a vector field by the model for mapping the input data towards

9
a lower-dimensional manifold that reflects the natural data. This is how by

identifying the most relevant features, the encoder will learn a more robust

representation of the data.

Fig 1.9: Denoising autoencoder architecture

1.4 FEDERATED LEARNING

Federated learning [3] is a distributed approach towards machine learning. In

traditional approach, data from all the sources is aggregated on a single ser ver

and then the model is trained. In federated learning, the current model at any

time is sent to all the sources of data.

The training happens at the source itself. The data does not leave the sources

at any time.

Only the weight updates are sent back to the aggregation server. Homomorphic

encryption is used to ensure that the updates cannot be used to infer statistics

about the edge device data. The aggregation server updates the weights.

10
The update from each source is weighted by the amount of data t hat is used

for training by that source.

While theoretically viable, this architecture would not have been practical in

the past because the computational capacities of portable phones were

exceptionally restricted for running a machine learning model. In any case, a

lot of things changed in early 2017 [12].

There was the introduction of billions of smartphones with integrated AI chips

for computing heavy tasks starting with Huawei’s Mate, Google’s Pixel, and

Apple’s iPhones, all containing advanced hardware, designed to run Machine

learning-based tasks efficiently. The ratio of such smartphones has increased

by 32% from 2017 to 2020 [12].

This method of learning poses another big challenge. The model may contain

a large number of weights. Downloading and uploading all the weight changes

for each global epoch requires a high bandwidth.

Often there is a large gap between the download and upload speeds. The

download speed could be as high as 100 Mbps while the upload speed may be

less than 10 Mbps.

Uploading of the entire model from the sources to the server is often the

bottleneck step in federated learning. However, this approach is necessary for

preservation of privacy. The owners of data may not agree to share their data

for training the model.

Federated Learning permits for speedier testing and deployment of intelligent

models as well as decreased latency and power consumption while maintaining

11
data privacy. In addition to delivering an update to the shared model, the

enhanced local model can be used right away.

Fig 1.10: Visualization of steps of federated learning

Fig 1.11 Life of a Federated Learning Model

1.4.1 Life of a Federated Learning Model


Model architect creating a model for a specific domain often leads the

Federated Learning process. A natural language processing domain specialist,

12
for example, might create a model for predicting the next word on a virtual

keyboard.

A typical workflow [3], at a high level, is as follows:

1. Identification of Problem to be solved with Federated Learning : The

model architect recognizes a problem that can be solved using Federated

Learning.

2. Instrumentation of the client: If necessary, the users or clients (for

example, a mobile app) are programmed to retain the necessary training

data locally (with time and quantity constraints).

In many circumstances, the app will have already saved this information

(A messaging app, for example, must save messages, whereas a p hoto

management software should store images already.). Extra metadata or

data, such as user interaction data to provide labels for a supervised

learning assignment, may be necessary in some cases.

3. Prototyping through simulation: Using a suitable dataset, the model

engineer can test learning hyperparameters and prototype model

architectures in a Federated Learning simulation.

4. Training of Federated Model: To train distinct variations of the model

or employ distinct optimization hyperparameters, multiple fede rated

training processes are initiated.

5. Model Evaluation: After the tasks have been properly trained, the

models are analysed and suitable candidates are chosen. Metrics

generated on the data centre’s datasets, or federated evaluation, in which

the models are distributed to held-out clients for assessment on their local

data, are examples of analysis.

13
6. Model Deployment: Eventually, after choosing a good model, it follows

a normal model launch procedure, which includes manual quality

inspection, live A/B testing (where the updated model is used on various

gadgets while the model corresponds to the previous generation is used

on others to juxtapose in-vivo execution), and a staged rollout (so that

bad behaviour may be identified and corrected before it affects too many

users). The owner of the application determines the model's specific

launch method, which is usually unrelated to how the model was trained.

To put it in, either way, this phase would apply to either a model built

using federated learning or a model developed using a typical data centre

technique.

1.5 CYBER ATTACKS


Cyber-attacks [13] could be done with different intentions. It could be done to

gain access to the information stored on the computer, to intercept the data

flowing to and from the system or to simply stop the machine from functioning.

The abilities of attackers are growing along with the progress in hardware

capabilities of latest high-end machines.

1.5.1 Brute Force Attack


In this type of attack, hackers try to guess the credentials like passwords by

trying a large number of combinations to gain access to the user accounts.

Smaller and simpler passwords can be guessed in a few minutes while longer

and complicated passwords involving different punctuation symbols, numbers

and a mix of upper- and lower-case letters could take many years to be guessed.

14
Brute force attacks are of different kinds depending on what passwords are

attempted. Some attacks use dictionary words while others use logical guesses

involving birthdate, names of family members. A combination of both may

also be used.

1.5.2 Denial of Service Attack

Servers have a threshold on the number of requests they can serve at a time.

If the number of requests grows too much then the server would take a lot of

time to serve the clients which may frustrate them to leave the website and go

to the website of another company for its business.

Fig 1.12 Representation of a Denial of Service (DoS) attack

In extreme cases, the requests would result in timeout and not get served at

all.

15
Attackers try to disrupt the functioning of a server by sending large number of

requests continuously to the server.

If the malicious requests are sent from a number of computers, the n the attack

is known as Distributed Denial of Service (DDoS) attack.

Fig 1.13 Representation of a Distributed Denial of Service (DDoS) attack

1.5.3 Cross Site Scripting

Websites which accept user input in the form of blog post, comments or other

types of forms may be vulnerable to cross site scripting. The attacker can insert

malicious code in such input area of the trustworthy website.

This code could be used to change the content that is displayed on the website

or to gain access to website credentials of users which are stored by browsers

on users’ computers. This allows the attacker to do any action on the website

on behalf of the real users.

16
Fig 1.14 Representation of Cross Site Scripting attack

1.5.4 SQL Injection

If a website or any other application is supposed to provide results after

fetching some data from a database based on a user’s query entered in a text

input field, it may be vulnerable to SQL injection attack.

The application uses the user’s input to construct an SQL query which is

executed on the database. If the query can be manipulated by an adversary,

then they may collect any information which they are not authorized to. It may

even be possible to delete entire tables from the database using this att ack.

1.5.5 Botnet

The term is a short form for robot-network. The attacker infects a large number

of computers and uses the whole network of such computers to carry out

malicious activities. Botnet is also known as zombie army.

IoT devices like set top boxes, cameras, voice assistant speakers, smart

watches etc may also be used as zombies. These botnets could be used for

17
mining cryptocurrency, launching DDoS attacks or recruiting more zombies

into the network.

As the botnet uses a small fraction of the processing capabilities of a machine,

the effect is not easily observable because the devices appear to function

normally to their users.

1.5.6 Port Scan

Attackers send data to a number of ports of the victim’s computer to identify

which ports accept connections and what response it sends back.

It helps them find out which devices are vulnerable to attacks and which ones

have strong measures of security like firewalls installed.

It tells them which services are running on the device, their versions and

whether it is accepting anonymous logins or not. This information allows the

adversary to decide which exploits to use.

18
CHAPTER 2 RELATED WORK

Chen et al.[14] have proposed the concept of federated learning-based

Attention Gated Recurrent Unit (FedAGRU), an intrusion detection algorithm

to detect any kind of abnormality in the wireless edge networks. As it is based

on a federated learning model it updates the model to the cloud itself rather

than sharing the user’s personal data to the cloud which may c reate privacy

issues for the user. The algorithm also increments the weights of some

particular devices which are high in priority and avoids updating the weights

of the rest of the devices depending on a certain threshold value which may

determine whether that device is important for the cloud server or not. This

thus reduces any kind of irrelevant updates thereby reducing the bandwidth

and extra computational costs required for updating those parameters to the

server. The cloud server starts updating the model parameters on receiving

model updates from multiple client devices without waiting for the rest of the

client devices whose models are still processing the data, this is because the

updates are asynchronous in nature thus reducing the time required f or

aggregating and updating the parameters of the global model. Along with that

FedAGRU assigns different weights to different clients according to their

modified model parameters after calculating the new model parameter by

taking the weighted average of the previous parameters.

The FedAGRU algorithm is based on GRU-SVM, a combined model made

from GRU (Gated recurrent unit) and SVM (Support Vector Machine) instead

of using SGD (Stochastic Gradient Descent), and is relatively faster and uses

fewer parameters in comparison to SGD. They found that FedAGRU is 8%

19
more accurate compared to the other centralized learning algorithms and is

70% less expensive computation-wise than other federated learning

algorithms.

Some researchers have also compared different models on different dataset to

find out the optimum model required for their IDS research. Hindy et al. [ 15]

presented an autoencoder-based neural network model for implementing an

Intrusion Detection System to detect zero-day attacks. A zero-day attack is an

unknown flaw or bug in the system which has not been discovered by any

developer which is then further exploited by some attacker to either gain

control of the system or steal data from the system such as user credentials.

These attacks create a very nasty challenge for many other machine learning

models such as signature-based detection models which depend on previously

detected threats for identifying current threats which may be attacking the

network. Hence, they suffer from a lot of false negatives d ue to the increasing

number and variety of new threats to the system which may remain undetected.

The authors are aiming to resolve this issue by building a better IDS model

using an autoencoder model which can detect zero-day attacks a lot faster than

previous machine learning models with a reasonable number of false positives.

For demonstrating the efficiency of their autoencoder model they have also

built an outlier detecting One-Class Support Vector Machine (SVM) model to

compare the results along with using two different datasets one of which is

CICIDS2017 and the other one is NSL-KDD for demonstration purposes.

Now there are some differences between the CICIDS2017 and NSL -KDD

dataset with CICIDS2017 being diverse and covering a wide variety of

common cyber-attacks such as Heartbleed, SSH & FTP Brute Force, DDOS

20
and many more which is suitable to represent real world attacks as on daily

many types of attacks a network has to face whereas NSL -KDD covers only

benign traffic, Denial of Service (DOS), probing, Remote to Local (R2L) and

User to Root(U2R). Before using the datasets for training and validation the

correlated features are dropped from the datasets to reduce model instability

and flow-based data are used, as they are better suited for the IDS syst em.

During the scaling and selection of the features only benign data was selected

to be used to eliminate any kind of risk of any influence of an attack instance.

Thus, the models are then trained on both the datasets with 75% of normal data

for training and the rest for testing purposes. The results thus show that

autoencoders are more suitable at detecting zero-day attacks compared to one-

class SVM showing an accuracy of 89-99% for the NSL-KDD dataset and 75-

98% for the CICIDS2017 dataset.

Yulianto et al. [16] proposed on improving the performance of AdaBoost -

based Intrusion Detection System using a sampling technique called Synthetic

Minority Oversampling Technique (SMOTE), feature selection techniques like

Principal Component Analysis (PCA) and Ensemble Feature Selection (EFS)

as previous research weren’t able to give good enough results using the

AdaBoost classifier technique due to problems which include an imbalance in

CICIDS2017 dataset and improper feature selection of classification methods .

AdaBoost, also known as Adaptive Boosting, is a boosting technique that can

greatly improve the characterization ability by numerous cycles by reducing

bias and variance present in the model and is best suited for weak learners such

as decision trees.

21
Therefore, the author’s goal is to reduce the imbalance present in the training

dataset using SMOTE which generates synthetic samples for the minority class

thus improving the sensitivity of the model towards minority class and select

important features from the new dataset with the help of PCA and ensemble

feature selection which minimizes the dimensionality of the dataset and at the

same time reduce the data loss too. The researchers have thus found their

research to be outperforming previous research with an accuracy of 81.83%.

Mathur [17] tried to incorporate time dimension of data and investigated

whether the overlap between anomaly and benign data flow in the network can

be eliminated. He has also used the CICIDS2017 dataset due to its wide range

of attacks. Also, during the pre-processing part of the dataset, the time stamps

of all the flow data have been modified to include even microseconds to make

the data more precise. The author has used an autoencoder along with ReLU

(Rectified Linear Unit) as activation function and RMSE (Root Mean Square

Error) as loss function. He also experimented with different number of neurons

in autoencoder model and found out that on increasing number of neurons the

RMSE value decreased but for both benign and malicious data they changed

at around equal proportions. For time analysis of the network, he used an

ensemble of autoencoders and trained each of them on some particular port

and on some specific substructure of the data. But the results of the testing

showed us that there still existed an overlap between two traffic flows-benign

and malicious data thus making us realize the fact that no matter what some

overlap between the data will always exist no matter how we label the data.

Panwar et al. [18], took different combinations of four feature selection

algorithms with two machine learning algorithms and compare the

22
performance of all the combinations on the basis of four parameters time,

accuracy, specificity, and sensitivity. Four feature selection algorithms

include Classifier Subset Evaluator with Decision Tree, Classifier Subset

Evaluator with Naive Bayes, Classifier Subset Evaluator with J48, and

CfsSubset Attribute Evaluator and the two different machine learning

algorithms include REPTree and OneR. For training and evaluation, the same

dataset CICIDS-2017 is used. The software tool for data analysis and

investigation was WEKA. As the result, they found that feature selection

reduced the dataset size and time and gives high performance. For Port Scan

Attack, Heartbleed Attack/DoS Attack, Brute Force Attack, Botnet Attack,

Web Attack, and DDoS Attack, the REPTree classification algorithm with

CfsSubset Attribute Evaluator with J48 features selection technique performs

best, while the REPTree classification algorithm with Classifier Subset

Evaluator with Naive Bayes features selection technique performs best for

infiltration attack.

Farukee et al. [19] proposed a model for DDoS attack detection in IoT

networks using a single direction convolutional neural network (1D CNN) and

Multilayer Perceptron Method with the random forest as a feature selector.

Training and evaluation take place on the same data set which is CICIDS2017.

The main motive for the proposed work was to detect the DDoS attack as soon

as possible before it affects the system majorly. As the result, RF-1DCNN

achieved a high accuracy of 99.63% and in the case of RF -MLP accuracy

obtained was 99.63%. They also concluded that the feature selection approach

appears to have a considerable impact on accuracy, stabili ty, and

interpretability.

23
24
CHAPTER 3 METHODOLOGY

We performed our experimentative analysis with the help of a python program.

The autoencoder was created using Keras [20] and TensorFlow.

3.1 LIBRARIES USED

3.1.1 TensorFlow

TensorFlow is an open-source machine learning platform originally created by

Google and used in many services provided by it [21]. Development of

TensorFlow Federated, which is meant for research on federated learning using

TensorFlow, is ongoing.

TensorFlow allows the user to focus on development of the model while it

takes care of the finer details behind the scenes. Keras is the high -level API

for TensorFlow.

A sequential model in TensorFlow allows us to create a linear stack of layers.

Each layer has a single input tensor and a single output tensor. We have used

this type of model in this thesis.

3.1.2 Pandas

Pandas is a data manipulation and analysis library in Python. It provides us

Series and DataFrame data structures [22].

Series is unidimensional array which can hold multiple types of data.

DataFrame is a two-dimensional data structure which looks like an Excel sheet

25
or an SQL Table. The columns are unidimensional arrays and each column

may be of different data type.

This library gives us the capability to read a CSV file containing column names

along with the data into a DataFrame.

3.1.3 Matplotlib

Matplotlib is the most widely used library for plotting graphs in Python [23].

It plots two dimensional graphs like the simple plot between two arrays of

numbers connecting the dots with lines, scatter plot, bar graph, step graph,

histogram, box plot, pie chart, contours etc. It can also convert a three -

dimensional array to an image by using the data as RGB values for the colour

of each pixel.

3.1.4 Scikit-Learn

It is an open-source machine learning library in Python which supports

classification, regression, clustering, dimensionality reduction as well as pre -

processing of data.

It also allows us to train neural networks but does not offer GPU support which

makes it not much useful for large scale applications. The customization which

we can do to the model is also limited.

In this thesis, we use this library for pre-processing of data using Min-Max

scaler.

26
3.2 PROPOSED ALGORITHM

In federated learning, we have two algorithms. The cloud algorithm, which

runs on the machine that aggregates the updates and computes the global model

and the edge algorithm which runs on the devices which provide the data for

training.

Our proposal for both of them is as follows.

3.2.1 CLOUD ALGORITHM

The aggregation server stores a database of all the edge devices which are

interested to train the global model. It assigns a default value to them and we

will refer to this value as the trust of that device. As we need to evaluate the

impact of update by each device, we cannot aggregate all the updates and

compute their weighted average based on the amount of data. We rather fix the

amount of data that would be used for training by each device in a single epoch.

The server sends the current model that it has at any time along with the

amount of data that should be used for training to the edge devices. The server

then receives the updates to weights from the edge devices and scales t hem by

the old value of trust and applies these updates on the global model. Then it

evaluates the impact of this update by computing the accuracy of this model

on a dataset which is kept for this purpose. If the impact is positive (accuracy

increases), the trust of the device which sent that update is incremented. In

case the impact turns out to be negative (accuracy decreases), the value of trust

of that device is decremented.

27
The server then moves to the next edge device and the above -mentioned

procedure is repeated with it.

The process is repeated with all the edge devices in the database cyclically. If

any new device wants to join the system, then it will be assigned the default

value of trust and get added to the database towards the end.

28
α α

α α

α α

α α

Different combinations of arithmetic and geometric increment and decrement

can be tried and the results can be compared.

The upper bound and lower bound in this algorithm have their own

significance.

In absence of any upper bound, the trust would keep increasing and the updates

to weights will grow so large that the model would oscillate from one

suboptimal state to another suboptimal state but would never arrive at the

29
optimal state. Later on, the updates will be so large that the model would be

rendered absolutely useless.

The lower bound is necessary to stop the trust from reaching a negative value

(if arithmetic decrement is used) or a value so small that it becomes impossible

to grow reasonably when it has correctly labelled data which would improve

the performance (if increments and decrements are geometric).

3.2.2 EDGE ALGORITHM

An edge device receives a request from the aggregation server to train the

model. The input that it receives contains the current global model as well as

the amount of data that it should use for training. Alternatively, it already

knows the amount of data required and sends a request to the aggregation

30
server for allowing it to join the epoch of training the global model. The

amount of data is fixed to maintain uniformity among all the edge devices.

It uses the specified amount of data to train the model using backpropagation

and gradient descent. It sends back the update that it wants the aggregation

server to make on the global model.

Then it keeps on generating or collecting more data until it receives another

request for training from the server or it has generated the fixed amount of

data that it must use for training. Once that much data is available, it can itself

request the server to download the current model as it would like to train it.

3.2.3 SETTING THRESHOLD

What we would expect in an ideal scenario is that we set a threshold in such a

way that all the benign samples have reconstruction error below it and all the

anomalous samples have an error higher than it. However, it is impossible to

set such a threshold due to a significant overlap between the reconstruction

31
errors of benign and anomalous samples. Even if we take the threshold equal

to 0.9 times the maximum reconstruction error of the training data, we see that

almost all the samples get classified as benign. This is because on benign

dataset also has some samples which are highly unsimilar to the other samples

and hence have a huge reconstruction error, which is greater than the

reconstruction error of most of the anomalous samples as well.

Fig 3.1 RMSE distribution for Friday Morning Working Hours after training on benign data (Monday)

Hence, we attempt to choose the threshold in such a way that we get

minimum false positives while being able to detect most of the

anomalous samples correctly. This c an be done if we use a percentile of

the reconstruction error on training data as threshold. If we set it to 90

percentile of the reconstruction error of the training data, we can expect

that 10% of the benign samples would also get classified as malicious .

This is clearly undesirable, but inevitable due to the distribution of

reconstruction errors.

32
3.3 MATHEMATICAL JUSTIFICATION

For simplicity, consider two functions of two variables. The variables

are the model parameters. The aim of the model is to reac h the optimal

location in one function. There are two guiding forces which modify the

variables. One force makes the model move towards the optimal location

of the function in which we desire it to go. The other force is guiding

the model to towards the op timal location of the other function which

is undesired. The model is cyclically led by both the forces turn by turn.

Up to a few steps, the model feels that both the forces are friendly and

it trusts both of them. After a certain step it realizes that whi le one force

is friendly and leading it to its destination, the other one is trying to

take it somewhere else. When it notices that it is moving farther from

its destination when it is led by one force, its trust on that force starts

to fade away and the s teps it takes in the direction indicated by that

force become smaller and smaller. It is able to see that the destination

coming closer when led by the other force and its step size for being led

by that force increases gradually. However, it should increa se only in a

limit, as too large steps, instead of taking it to its destination, would

take it even farther from the destination than it previously was.

33
Fig 3.2 Initial view of mathematical description presented

In Figure 3.2, the circle at the starting of the dotted arrow represents the initial

position of the model. The dotted arrow is the misguiding force. The triangle

represents the optimal location or centre of attraction of the misguiding force.

The cross symbol represents the centre of attraction or optimal point of the

good guiding force. The hollow circle is the position of the model after it takes

a step in the direction of the misguiding force. This move decreases its distance

from both the centres. However, the model focuses only knows its dis tance

from the correct destination and as this distance has decreased, its trust on the

misguiding force increases.

The next figure, Figure 3.3, shows the turn of ‘good’ force. The distance

between the cross symbol and current position decreases once again and

its trust on this force also increases.

34
Fig 3.3 The model is guided by ‘good’ force

Fig 3.4 The model realizes its ‘mistake’

35
Figure 3.4 shows the point where the distance of the model from its

optimal destination point increases when it is the turn of the ‘bad’ force.

At this point, the model realizes that it should not trust this force and

gradually the size of the step on this ‘bad’ force’s t urn decreases until

it reaches a lower bound.

3.4 SIMULATION

3.4.1 About CIC-IDS-2017 Dataset

The dataset used in this project is the CIC -IDS-2017 Dataset [24], an

IDS evaluation dataset generated by the Canadian Institute of

Cybersecurity (CIC) based at the University of New Brunswick in

Fredericton, Canada. This dataset was generated over a period of five

days from Monday through Friday and has been available to be used by

the research community since 2018 in PCAP format as well as in CSV

format where each record is a labelled flow with 78 features. Some

datasets like NSL-KDD and CAIDA have fewer attacks whereas a

variety of attacks has been used in CICIDS2017 simulating the real

world as much as possible.

The Monday CSV file represents the data for the first day of the week

and includes benign traffic only. Simulated attacks were executed

between Tuesday and Friday CSV files. These are Brute -force attacks

on File Transfer Protocol (FTP) and Secure Shell (SSH), DoS,

Heartbleed attack, Web attacks, Infiltration, Botnets and Distributed -

Denial-of-Service (DDoS) attacks.

Each record of the CSV files is the traffic flow in the network.

36
What is a traffic flow? It is the sequence of IP packets passing an

observation point during a certain time interval. It isn’t as

comprehensive as the packet data but is more than sufficient for keeping

a track of statistical characteristics and identify anomalies which may

be present in the network due to the presence of some malicious data.

This kind of statistical da ta can be further analysed for verifying if the

network is properly being used by the people of its organisation and no

misconduct is done. The victim network generated the datasets using

the help of a mirrored port on the primary switch which was saving a ll

the data on a PCAP file. The CICFlowMeter tool has been used to create

bidirectional flows and also the calculated features from the PCAP files.

Some of the features of the flow data are source IP address, source port,

destination port and many more.

The attack network and victim network used for creating the datasets

had all the required equipment like router, firewalls, switches, hubs and

computers with different operating systems like Windows, macOS and

Ubuntu.

The dataset contains “Infinity” and “ NaN” in some places. We replaced

“Infinity” by 10 1 0 and “NaN” by 0. Some categories of attacks have very

little representation making the data on some of the days highly

imbalanced.

3.4.2 Approach followed

The features are scaled using min -max scaler algorithm from scikit -

learn library to standardize the datasets. The number of features could

37
have been reduced using principal component analysis. The first 20

vectors would have been enough to cover more than 99% data.

We have used an autoencoder with a singl e hidden layer containing 10

neurons for detecting anomalies in the data.

The activation function used in the model is ‘ReLU’ which stands for

Rectified Linear Unit, a piecewise function which gives the output as

input if it is positive otherwise it gives zero.

After which the model is compiled with Root Mean Square Error and

Adam optimizer.

Here Adam Optimizer [25] is Adaptive Moment Estimation which is an

efficient method used for optimizing the gradient descent even while

working with a large number o f parameters or data. It is a

combination of ‘gradient descent with momentum’ algorithm and the

RMSProp algorithm.

We use two for loops for training. A single iteration of the outer loop

refers to one global epoch and a single iteration of the inner loop would

train the algorithm for one epoch on a single source of data, compare

the result with previous result, update the trust value of that source in

a dictionary which contains trust values of all the sources and change

the value stored in previous resul t to the updated one.

38
CHAPTER 4 RESULTS AND DISCUSSION

The following results were generated using 4 sources. The first 2 lakh samples

from Wednesday data (which contains DoS attacks) were used as server data

for measuring accuracy. The next 2 lakh samples from Wednesday data were

used as first source and next 2 lakh samples were used as third source. The

first 2 lakh records from Monday data (Benign) were used as second source,

the next 2 lakh samples were used as fourth source and the remaining 129918

samples were used for setting threshold. The 90 percentile of reconstruction

errors from the data kept for setting threshold was used as the threshold.

In source 1, 113515/200000 samples are malicious. In source 3, 4404/200000

samples are malicious. In server data, 1229971/200000 samples are malicious.

Geometric increment and decrement were used with the multiplication factors

1.1 and 0.9 respectively.

Fig 4.1: Variation of trust of the four sources against epoch

39
The upper bound for trust values was set to 5 and the lower bound was

set to 0.1. Figure 4.1 shows the result graphically. The trust value of

source 2 has reached the upper bound of 5 and stays constant after that.

The trust value of source 4 alternates ab ove and below 1. The trust value

of sources 1 decreases almost every time as the accuracy is very low

due to a large number of malicious samples. The trust value of source 3

first increases up to a certain epoch and then keeps decreasing.

40
CHAPTER 5 CONCLUSION AND FUTURE WORK

5.1 CONCLUSION

Our focus in this thesis was to study the impact of assigning variable learning

rates to different sources involved in training. Due to a significant overlap of

patterns of malicious and benign flows, the model described here is not an

ideal choice to detect cyber-attacks.

However, our idea of varying the trust parameter depending on the effect each

epoch causes on the model, appears to be successful to a certain extent. When

trained with a mix of benign and attack data, the overlap between the training

data and attack flows indeed increases and makes it tougher to identify the

flows accounting for attacks.

5.2 FUTURE WORK

We have explored only the effect of varying the learning rate based on trust

parameter. In future, the effect of varying other parameters like, number of

epochs on each dataset and the batch size can also be explored.

Different types of datasets can be experimented with. In case of cyber-attack,

attackers keep changing their exploits to counter various security measures.

However, in cases where the anomalies occur without such deliberate attempts

and are similar in pattern to other anomalies, this method may gi ve better

results. Hence, this approach should be attempted on such datasets.

While carrying out this analysis, we assumed perfect flow of information

which does not happen in real networks.

41
If a network simulator is used for the experimentation, instead of a simple

program that we have used, the challenges of a real network would get

highlighted. That would lead to further research on dealing with issues of

actual networks like encryption of individual updates to the weights to prevent

analysis which could potentially reveal some information about the data

present at the sources.

42
REFERENCES

[1] Stahl R (2017) Technology reliant society, has it gone too far?
(https://thesnapper.millersville.edu/index.php/2017/04/19/technology-reliant-society-
opinion)

[2] Stobierski T (2019) The advantages of data-driven decision-making


(https://online.hbs.edu/blog/post/data-driven-decision-making)

[3] Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN,


Bonawitz K, Charles Z, et al Advances and open problems in federated
learning. arXiv preprint arXiv:1912.04977. 2019 Dec 10.

[4] Dongare AD, Kharde RR, Kachare AD. Introduction to artificial neural
network. International Journal of Engineering and Innovative Technology
(IJEIT). 2012 Jul;2(1):189-94.

[5] Fine TL. Feedforward neural network methodology. Springer Science &
Business Media; 2006 Apr 6.

[6] Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X,


Wang G, Cai J, Chen T. Recent advances in convolutional neural networks.
Pattern Recognition. 2018 May 1; 77:354-77.

[7] Medsker L, Jain LC, editors. Recurrent neural networks: design and
applications. CRC press; 1999 Dec 20.

[8] Bank D, Koenigstein N, Giryes R. Autoencoders. arXiv preprint


arXiv:2003.05991. 2020 Mar 12.

[9] Chow JK, Su Z, Wu J, Tan PS, Mao X, Wang YH. Anomaly detection of
defects on concrete structures with the convolutional autoencoder. Advanced
Engineering Informatics. 2020 Aug 1; 45:101105.

[10] An J, Cho S. Variational autoencoder based anomaly detection using


reconstruction probability. Special Lecture on IE. 2015 Dec 27;2(1):1 -8.

43
[11] Gondara L. Medical image denoising using convolutional denoising
autoencoders. In2016 IEEE 16th international con ference on data mining
workshops (ICDMW) 2016 Dec 12 (pp. 241-246). IEEE.

[12] Llanasas R (2020) How AI and machine learning are transforming mobile
technology (https://www.greenbook.org/mr/market-research-technology/how-ai-is-
transforming-mobile-technology/)

[13] Akbari Roumani M, Fung CC, Rai S, Xie H. Value analysis of cyber
security based on attack types. ITMSOC: Transactions on Innovation a nd
Business Engineering. 2016; 1:34-9.

[14] Chen Z, Lv N, Liu P, Fang Y, Chen K, Pan W. Intrusion Detection for


Wireless Edge Networks Based on Federated Learning. IEEE Access. 2020
Dec 1; 8:217463-72.

[15] Hindy H, Atkinson R, Tachtatzis C, Colin JN, Bayne E, Bellekens X.


Utilising deep learning techniques for effective zero-day attack detection.
Electronics. 2020 Oct;9(10):1684.

[16] Yulianto A, Sukarno P, Suwastika NA. Improving Adaboost -based


intrusion detection system (IDS) performance on CIC IDS 2017 dataset.
InJournal of Physics: Conference Series 2019 Mar 1 (Vol. 1192, No. 1, p.
012018). IOP Publishing.

[17] Mathur NO. (2020) Application of Autoencoder Ensembles in Anomaly


and Intrusion Detection using Time-Based Analysis (Masters dissertation,
University of Cincinnati).

[18] Singh Panwar S, Raiwani YP, Panwar LS. Evaluation of network


intrusion detection with features selection and machine learning algorithms
on CICIDS-2017 dataset. In International Conference on Advances in
Engineering Science Management & Technology (ICAESMT)-2019,
Uttaranchal University, Dehradun, India 2019 Mar 15.

44
[19] Farukee MB, Shabit MZ, Haque MR, Sattar AS. DDoS Attack Detection
in IoT Networks Using Deep Learning Models Combined with Random
Forest as Feature Selector. In International Conference on Advances in Cyber
Security 2020 Dec 8 (pp. 118-134). Springer, Singapore.

[20] Ketkar N. Introduction to keras. In Deep learning with Python 2017(pp.


97-111). Apress, Berkeley, CA.

[21] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M,


Ghemawat S, Irving G, Isard M, Kudlur M. Tensorflow: A system for large -
scale machine learning. In12th {USENIX} symposium on operating systems
design and implementation ({OSDI} 16) 2016 (pp. 265-283).

[22] McKinney W. pandas: a foundational Python library for data analysis


and statistics. Python for high performance and scientific computing. 2011
Nov 18;14(9):1-9.

[23] Tosi S. Matplotlib for Python developers. Packt Publishing Ltd; 2009
Nov 9.

[24] Yulianto A, Sukarno P, Suwastika NA. Improving adaboost-based


intrusion detection system (IDS) performance on CIC IDS 2017 dataset. In
Journal of Physics: Conference Series 2019 Mar 1 (Vol. 1192, No. 1, p.
012018). IOP Publishing.

[25] Bock S, Goppold J, Weiß M. An improvement of the convergence proof


of the ADAM-Optimizer. arXiv preprint arXiv:1804.10587. 2018 Apr 27.

45
46
Bachelor Thesis Project
ORIGINALITY REPORT

10 %
SIMILARITY INDEX
6%
INTERNET SOURCES
6%
PUBLICATIONS
3%
STUDENT PAPERS

PRIMARY SOURCES

1
www.mdpi.com
Internet Source 1%
2
Submitted to Rochester Institute of
Technology
1%
Student Paper

3
docplayer.net
Internet Source <1 %
4
"Proceedings of Data Analytics and
Management", Springer Science and Business
<1 %
Media LLC, 2022
Publication

5
Submitted to Indian School of Mines
Student Paper <1 %
6
ebin.pub
Internet Source <1 %
7
"Data Science and Security", Springer Science
and Business Media LLC, 2021
<1 %
Publication

8
elib.dlr.de
Internet Source <1 %
47
9
papers.ssrn.com
Internet Source <1 %
10
Submitted to Universiti Teknologi Petronas
Student Paper <1 %
11
link.springer.com
Internet Source <1 %
12
labs-repos.iit.demokritos.gr
Internet Source <1 %
13
"Advances in Cyber Security", Springer
Science and Business Media LLC, 2021
<1 %
Publication

14
iopscience.iop.org
Internet Source <1 %
15
s-space.snu.ac.kr
Internet Source <1 %
16
www.upgrad.com
Internet Source <1 %
17
"Computational Vision and Bio Inspired
Computing", Springer Science and Business
<1 %
Media LLC, 2018
Publication

18
Submitted to University of Reading
Student Paper <1 %
19
arxiv.org
Internet Source <1 %
48
20
dokumen.pub
Internet Source <1 %
21
downloads.hindawi.com
Internet Source <1 %
22
www.greenbook.org
Internet Source <1 %
23
Angela Demke Brown. "Compiler-based I/O
prefetching for out-of-core applications", ACM
<1 %
Transactions on Computer Systems, 5/1/2001
Publication

24
Submitted to Myongji University Graduate
School
<1 %
Student Paper

25
Submitted to Thadomal Shahani Engineering
College
<1 %
Student Paper

26
Latif U. Khan, Walid Saad, Zhu Han, Ekram
Hossain, Choong Seon Hong. "Federated
<1 %
Learning for Internet of Things: Recent
Advances, Taxonomy, and Open Challenges",
IEEE Communications Surveys & Tutorials,
2021
Publication

27
Submitted to Swinburne University of
Technology
<1 %
Student Paper

49
28
Zhuo Chen, Na Lv, Pengfei Liu, Yu Fang, Kun
Chen, Wu Pan. "Intrusion Detection for
<1 %
Wireless Edge Networks Based on Federated
Learning", IEEE Access, 2020
Publication

29
doaj.org
Internet Source <1 %
30
en.wikipedia.org
Internet Source <1 %
31
iugspace.iugaza.edu.ps
Internet Source <1 %
32
www.cert.org
Internet Source <1 %
33
Xumei Fan, William Sayers, Shujun Zhang,
Zhiwu Han, Luquan Ren, Hassan Chizari.
<1 %
"Review and Classification of Bio-inspired
Algorithms and Their Applications", Journal of
Bionic Engineering, 2020
Publication

34
www.springerprofessional.de
Internet Source <1 %
35
publications.muet.edu.pk
Internet Source <1 %
36
"Machine Intelligence and Soft Computing",
Springer Science and Business Media LLC,
<1 %
2021
50
Publication

37
"Proceedings of the 22nd Engineering
Applications of Neural Networks Conference",
<1 %
Springer Science and Business Media LLC,
2021
Publication

38
Al-Zoubi, H.. "Rejection and modelling of
sulphate and potassium salts by
<1 %
nanofiltration membranes: neural network
and Spiegler-Kedem model", Desalination,
20070205
Publication

39
Ankit Thakkar, Ritika Lohiya. "A survey on
intrusion detection system: feature selection,
<1 %
model, performance measures, application
perspective, challenges, and future research
directions", Artificial Intelligence Review, 2021
Publication

40
Benedetto Marco Serinelli, Anastasija Collen,
Niels Alexander Nijdam. "On the analysis of
<1 %
open source datasets: validating IDS
implementation for well-known and zero day
attack detection", Procedia Computer Science,
2021
Publication

41
Dun Li, Dezhi Han, Tien-Hsiung Weng, Zibin
Zheng, Hongzhi Li, Han Liu, Arcangelo
<1 %
Castiglione, Kuan-Ching Li. "Blockchain for
51
federated learning toward secure distributed
machine learning systems: a systemic
survey", Soft Computing, 2021
Publication

42
Mohamed Gaber, Ashraf Khalaf, Imbaby
Mahmoud, Mohamed El_Tokhy. "Advanced
<1 %
Protection Scheme For Information
Monitoring in Internet of Things
Environment", Research Square Platform LLC,
2021
Publication

43
arxiv-export-lb.library.cornell.edu
Internet Source <1 %
44
content.iospress.com
Internet Source <1 %
45
mafiadoc.com
Internet Source <1 %
46
"Intelligent Communication, Control and
Devices", Springer Science and Business
<1 %
Media LLC, 2020
Publication

47
Arif Yulianto, Parman Sukarno, Novian Anggis
Suwastika. "Improving AdaBoost-based
<1 %
Intrusion Detection System (IDS) Performance
on CIC IDS 2017 Dataset", Journal of Physics:
Conference Series, 2019
Publication

52
48
Hanane Azzaoui, Akram Boukhamla. "Two-
Stages Intrusion Detection System Based On
<1 %
Hybrid Methods", Proceedings of the 10th
International Conference on Information
Systems and Technologies, 2020
Publication

49
Mohamed Amine Ferrag, Othmane Friha,
Leandros Maglaras, Helge Janicke, Lei Shu.
<1 %
"Federated Deep Learning for Cyber Security
in the Internet of Things: Concepts,
Applications, and Experimental Analysis", IEEE
Access, 2021
Publication

50
"Computer Security – ESORICS 2017", Springer
Nature, 2017
<1 %
Publication

51
Hanan Hindy, Robert Atkinson, Christos
Tachtatzis, Jean-Noël Colin, Ethan Bayne,
<1 %
Xavier Bellekens. "Utilising Deep Learning
Techniques for Effective Zero-Day Attack
Detection", Electronics, 2020
Publication

Exclude quotes Off Exclude matches Off


Exclude bibliography On

53
54
APPENDIX

Description of Features in the Dataset

Sr. No Feature Name Description


1 Destination Port Destination Port
2 Flow Duration Duration of the flow in Microseconds
Total count of packets in the forward
3 Total Fwd Packets direction
Total count of packets in the backward
4 Total Bwd Packets direction
Total size of packets in the forward
5 Total Length of Fwd Packets direction
Total size of packets in the backward
6 Total Length of Bwd Packets direction
Minimum size of packets in the forward
7 Fwd Packet Length Min direction
Maximum size of packets in the forward
8 Fwd Packet Length Max direction
Mean size of packets in the forward
9 Fwd Packet Length Mean direction
Standard deviation of packet sizes in the
10 Fwd Packet Length Std forward direction
Minimum size of packets in the backward
11 Bwd Packet Length Min direction
Maximum size of packets in the backward
12 Bwd Packet Length Max direction
Mean size of packets in the backward
13 Bwd Packet Length Mean direction
Standard deviation of packet sizes in the
14 Bwd Packet Length Std backward direction
15 Flow Bytes/s Number of flow bytes per second
16 Flow Packets/s Number of flow packets per second
Mean time between two packets sent in the
17 Flow IAT Mean flow
Standard deviation of time between two
18 Flow IAT Std packets sent in the flow
Maximum time between two packets sent in
19 Flow IAT Max the flow

55
Sr. No Feature Name Description
Minimum time between two packets sent in
20 Flow IAT Min the flow
Minimum time between two packets sent in
21 Fwd IAT Min the forward direction
Maximum time between two packets sent in
22 Fwd IAT Max the forward direction
Mean time between two packets sent in the
23 Fwd IAT Mean forward direction
Standard deviation of time between two
24 Fwd IAT Std packets sent in the forward direction
Total time between two packets sent in the
25 Fwd IAT Total forward direction
Minimum time between two packets sent in
26 Bwd IAT Min the backward direction
Maximum time between two packets sent in
27 Bwd IAT Max the backward direction
Mean time between two packets sent in the
28 Bwd IAT Mean backward direction
Standard deviation of the time between two
29 Bwd IAT Std packets sent in the backward direction
Total time between two packets sent in the
30 Bwd IAT Total backward direction
Number of times the PSH flag was set in
31 Fwd PSH Flags packets travelling forward
Number of times the PSH flag was set in
32 Bwd PSH Flags packets travelling backwards
Number of times the URG flag was set in
33 Fwd URG Flags packets travelling forward
Number of times the URG flag was set in
34 Bwd URG Flags packets travelling backwards
Total bytes used for headers in the forward
35 Fwd Header Length direction
Total bytes used for headers in the backward
36 Bwd Header Length direction
37 Fwd Packets/s Number of forward packets per second
38 Bwd Packets/s Number of backward packets per second
39 Min Packet Length Minimum length of a packet
40 Max Packet Length Maximum length of a packet
41 Packet Length Mean Mean length of a packet
42 Packet Length Std Standard deviation of packet length

56
Sr. No Feature Name Description
43 Packet Length Variance Variance of packet length
44 FIN Flag Count Number of packets with FIN Flag
45 SYN Flag Count Number of packets with SYN Flag
46 RST Flag Count Number of packets with RST Flag
47 PSH Flag Count Number of packets with PUSH Flag
48 ACK Flag Count Number of packets with ACK Flag
49 URG Flag Count Number of packets with URG Flag
50 CWE Flag Count Number of packets with CWE
51 ECE Flag Count Number of packets with ECE
52 Down/Up Ratio Download and upload ratio
53 Average Packet Size Average size of packets
Average size observed in the forward
54 Avg Fwd Segment Size direction
Average size observed in the backward
55 Avg Bwd Segment Size direction
Average bytes bulk rate in the forward
56 Fwd Avg Bytes/Bulk direction
Average packet bulk rate in the forward
57 Fwd Avg Packets/Bulk direction
58 Fwd Avg Bulk Rate Average bulk rate in the forward direction
Average bytes bulk rate in the backward
59 Bwd Avg Bytes/Bulk direction
Average packets bulk rate in the backward
60 Bwd Avg Packets/Bulk direction
61 Bwd Avg Bulk Rate Average bulk rate in the backward direction
Average number of packets in a sub flow in
62 Subflow Fwd Packets the forward direction
Average bytes in a sub flow in the forward
63 Subflow Fwd Bytes direction
Average number of packets in a sub flow in
64 Subflow Bwd Packets the backward direction
Average bytes in a sub flow in the backward
65 Subflow Bwd Bytes direction
Number of bytes sent in the initial window
66 Init Win bytes fwd in the forward direction
Number of bytes sent in the initial window
67 Init Win bytes bwd in the backward direction

57
Sr. No Feature Name Description
Count of packets with at least 1 byte of TCP
68 act data pkt fwd data payload in the forward direction
Minimum segment size observed in the
69 Min seg size fwd forward direction
Mean time a flow was active before
70 Active Mean becoming idle
Standard Deviation of the time a flow was
71 Active Std active before becoming idle
Maximum time a flow was active before
72 Active Max becoming idle
Minimum time a flow was active before
73 Active Min becoming idle
Minimum time a flow was idle before
74 Idle Min becoming active
Mean time a flow was idle before becoming
75 Idle Mean active
Maximum time a flow was idle before
76 Idle Max becoming active
Standard deviation of the time a flow was
77 Idle Std idle before becoming active
The target variable, ‘Benign’ or a specific
78 Label ‘Attack category’

58

You might also like