Bachelor Thesis Project Semester 7

A FEDERATED LEARNING APPROACH TO DETECT AND AVOID
SOURCES OF MISCLASSIFIED CYBER ATTACK DATA
A REPORT
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE AWARD OF THE DEGREE
OF
BACHELOR OF ENGINEERING
IN
DIVISION OF INFORMATION TECHNOLOGY
Submitted by:
RISHABH SETIYA 2018UIT2597
SRIRAM M. PANT 2018UIT2623
AYUSH GOEL 2018UIT2582
Under the supervision of

DR DEEPAK KUMAR SHARMA
DIVISION OF INFORMATION TECHNOLOGY

NETAJI SUBHAS INSTITUTE OF TECHNOLOGY
UNIVERSITY OF DELHI
DECEMBER, 2021
DECLARATION
Division of Information Technology
University of Delhi
Delhi-110007, India
We, (Rishabh Setiya Roll No. 2018UIT2597, Sriram M. Pant Roll No. 2018UIT2623 and
Ayush Goel Roll No. 2018UIT2582) students of B. E., Division of Information
Technology, hereby declare that the Project-Thesis titled “A Federated Learning
Approach to Detect and Avoid Sources of Misclassified Cyber Attack Data” which is
submitted by us to the Division of Information Technology, Netaji Subhas Institute of
Technology, Delhi (University of Delhi) in partial fulfillment of the requirement for the
award of the degree of Bachelor of Engineering, is original and not copied from any
source without proper citation. This work has not previously formed the basis for the
award of any degree.
Rishabh Setiya Sriram M. Pant Ayush Goel

2018UIT2597 2018UIT2623 2018UIT2582
Place: Delhi
Date: 20 December 2021
i
CERTIFICATE
Division of Information Technology
University of Delhi
Delhi-110007, India
This is to certify that the work embodied in the Project-Thesis titled “A Federated
Learning Approach to Detect and Avoid Sources of Misclassified Cyber Attack Data” has
been completed by Rishabh Setiya Roll No. 2018UIT2597, Sriram M. Pant Roll No.
2018UIT2623 and Ayush Goel Roll No. 2018UIT2582 of B.E., Department of
Information Technology, under my guidance and supervision towards fulfillment of the
requirements for the award of the degree of Bachelor of Engineering. This work has not
been submitted for any other diploma or degree of any university.
Dr Deepak Kumar Sharma

SUPERVISOR
Place: Delhi
Date: 20 December 2021
ii
ABSTRACT
Federated learning is a form of deep learning in which the training data stays at the source
while the model gets trained from the data arising from a number of sources. The model
is communicated between a central server and the data source nodes which train it
independently. The models are averaged based on the amount of data available with the
sources.
In such a scenario, it is possible that some of the data sources provide mislabelled data
which could be due to a deliberate attempt to tamper with the learning process or inability
to correctly label the data. Our attempt is to identify such sources and gradually reduce
the priority that we give them by varying the learning rate.
We assign trust parameter to each of the data sources involved in training. Then we run a
single epoch of training on each of the sources in a loop. A test dataset is available at the
central server. It is used for checking whether the previous epoch had a constructive or
destructive effect on the model. That is, whether the accuracy of the model improved or
degraded on the test dataset. If the effect was constructive, then the trust parameter of that
source is increased and if was destructive then the trust parameter is decreased.
The expectation from this experiment is that the trust parameter of malicious sources
would eventually drop to nil and the trust parameter of trustworthy sources would keep
increasing.
The dataset used in this project is an intrusion detection dataset which contains labelled
data corresponding to “Benign” and “Attack” classes. The model that we use for
classification is an autoencoder which classifies a data point as an anomaly if the
reconstruction error is higher than a threshold.
iii
LIST OF CONTENTS
DECLARATION i
CERTIFICATE ii
ABSTRACT iii
LIST OF CONTENTS iv
LIST OF FIGURES vi
CHAPTER 1 INTRODUCTION AND BACKGROUND 1-18
1.1 STATEMENT OF OBJECTIVE 2
1.2 NEURAL NETWORK 2
1.2.1 FEED-FORWARD NEURAL NETWORKS 3
1.2.2 CONVOLUTIONAL NEURAL NETWORKS 4
1.2.3 RECURRENT NEURAL NETWORKS 6
1.3 AUTOENCODERS 6
1.4 FEDERATED LEARNING 10
1.4.1 LIFE OF A FEDERATED LEARNING MODEL 12
1.5 CYBER ATTACKS 14
1.5.1 BRUTE FORCE 14
1.5.2 DENIAL OF SERVICE 15
1.5.3 CROSS SITE SCRIPTING 16
1.5.4 SQL INJECTION 17
1.5.5 BOTNET 17
1.5.6 PORT SCAN 18
CHAPTER 2 RELATED WORK 19-23
iv
CHAPTER 3 METHODOLOGY 25-38
3.1 LIBRARIES USED 25
3.1.1 TENSORFLOW 25
3.1.2 PANDAS 25
3.1.3 MATPLOTLIB 26
3.1.4 SCIKIT-LEARN 26
3.2 PROPOSED ALGORITHM 27
3.2.1 CLOUD ALGORITHM 27
3.2.2 EDGE ALGORITHM 30
3.2.3 SETTING THRESHOLD 31
3.3 MATHEMATICAL JUSTIFICATION 33
3.4 SIMULATION 36
3.4.1 ABOUT CIC-IDS-2017 DATASET 36
3.4.2 APPROACH FOLLOWED 37
CHAPTER 4 RESULTS AND DISCUSSION 39-40
CHAPTER 5 CONCLUSION AND FUTURE WORK 41-42
5.1 CONCLUSION 41
5.2 FUTURE WORK 41
REFERENCES 43
PLAGIARISM REPORT 47
APPENDIX 54
v
LIST OF FIGURES
Figure Caption Page No.
1.1 Architecture of a feed-forward deep neural network 3
1.2 Convolutional layer having filter size 3x3 and input size 5x5 4
1.3 Pooling layer of size 2x2 with stride 2 5
1.4 Convolutional Neural Network Architecture 5
1.5 Recurrent Neural Network Architecture 6
1.6 Architecture of an autoencoder with three hidden layers 7
1.7 Basic form of CAE architecture 8
1.8 Variational autoencoder architecture 9
1.9 Denoising autoencoder architecture 10
1.10 Visualization of steps of federated learning 12
1.11 Life of a Federated Learning Model 12
1.12 Representation of a Denial-of-Service Attack 15
1.13 Representation of a Distributed Denial-of-Service Attack 16
1.14 Representation of Cross Site Scripting attack 17
3.1 RMSE distribution for Friday Morning Working Hours after 32

training on benign data (Monday)
3.2 Initial view of mathematical description presented 34
3.3 The model is guided by ‘good’ force 35
3.4 The model realizes its ‘mistake’ 35
4.1 Variation of trust of the four sources against epoch 39
vi
CHAPTER 1 INTRODUCTION
It's a struggle to design safe networks and systems for everyday usage as the
world becomes increasingly reliant on computers and automation [1]. With the
rise of online marketplaces and services, the number of security dangers to
businesses grows tremendously. The majority of company decisions are now
based on data [2]. With more data and information available than ever before,
it's more critical than ever to properly analyse and evaluate it. There exists
nothing such as an impenetrable network or a fool proof firewall. Attackers
are constantly coming up with new exploits and attack methods to bypass your
security.
These attacks may be with the intention of gathering confidential data,
encrypting the data on a machine or simply hampering the normal functioning
of devices.
These attacks can sometimes be identified from the flow level data, which is
information summarized from a number of packets. This flow level data can
be fed to a machine learning model to identify the attacks.
However, there are concerns of privacy, as this flow level data may be
confidential to the organization to which the computer belongs. This issue can
be tackled by federated learning [3]. Federated learning is an emerging
paradigm in which the training data does not move from one machine to
another but the updates to model weights are sent to the centralized server.
Nevertheless, there may be machines on which the data does not get correctly
classified and attacks remain undetected. Some malicious entrants in the
1
network may deliberately tamper the data so that the model does not gain or
loses its ability to identify attacks.
The aim of this project is to identify such malicious actors and reduce their
ability to damage the model. If these actors turn helpful later on, then their
training ability should be restored back so that their data also helps in
betterment of the model.
Hence, we approach the problem with a rewarding, penalizing and forgiving
nature.
1.1 STATEMENT OF OBJECTIVE
Formulation of a machine learning protocol which
• detects data flows as benign or attack
• can perform even if some data sources are poisoned
• does not require aggregation of data from all sources
• keeps actual data secure and unavailable to any party except the
source itself
1.2 NEURAL NETWORKS
Neural networks [4] are a subset of machine learning inspired from the
functioning of neurons in biology. They consist of an input layer, an output
layer and one or more hidden layers.
They are widely used for handwriting recognition, image compression,
prediction of future from past trends in areas like stock market, decision
making in granting of loans and various other situations.
2
Fig 1.1: Architecture of a feed-forward deep neural network
1.2.1 Feed-Forward Neural Networks
In feed-forward neural networks [5], the inputs enter through the first layer,
propagate in the forward direction and the last layer produces the outputs. Each
neuron sums up the inputs coming into it and outputs the result after applying
an activation function on the sum.
For the neural network to learn, it must modify its weights to produce the
expected results on the output layer. In the beginning, however, the outputs
are totally random.
The difference between the previous output and the expected output propagates
in the backward direction and depending on this error the weights of each layer
get modified slightly. This process is repeated until the error is less than a
threshold.
3
For many cases, a single hidden layer is sufficient to give good results. In this
project, a simple feed-forward neural network with a single hidden layer is
used. Training takes place through backpropagation.
Fig 1.2: Convolutional layer having filter size 3x3 and input size 5x5
1.2.2 Convolutional Neural Networks
The simple feed-forward neural networks may give a decent result on some
simple binary patterns like handwritten characters but they do not generalize
well to visual inputs like colour photographs and videos.
A convolutional layer [6] consists of a matrix of weights (called filter) which
slides over the input matrix. The element-wise products are summed up to give
a single element of the output matrix.
The number of rows that the filter shifts towards right in each step and then
down after it reaches the right end is known as stride.
4
Another layer used in CNNs is known as pooling layer. Each cell of output of
pooling layer is maximum or average of the values of a smaller region of the
input.
Fig 1.3: Pooling layer of size 2x2 with stride 2
The last few layers of CNNs are fully connected. This means that the 2D matrix
is converted to a 1D array and the rest of the operations are similar to simple
feed-forward neural network.
Fig 1.4: Convolutional Neural Network Architecture

5
The idea proposed in this project can be extended and tested on datasets of
images also by using a CNN instead of a simple feed-forward architecture.
1.2.3 Recurrent Neural Networks
Fig 1.5: Recurrent Neural Network Architecture
RNNs [7] are used for sequential or time series data. The output of the previous
step is also fed as input to every neuron in the recurrent layer. The idea
proposed in this project may be extended to RNNs too by using LSTM
autoencoder.
1.3 AUTOENCODERS
Auto associative memory, is a class of neural networks. The output layer
consists of as many neurons as the input layer and the model is trained to
replicate the identity function.
6
This means that the input features and the expected outputs are the same. The
purpose of training such a model is to check whether a particular test pattern
is same or somewhat similar to the patterns on which the model was trained.
Autoencoders [8] are a type of auto associative memory which consist of an
encoder part followed by a decoder part. The difference between the original
pattern and output of the network is used to calculate the error known as
reconstruction error.
A low value indicates that the pattern is similar to training patterns while a
high value tells us that it is different from those patterns.
An appropriate value of threshold needs to be set to distinguish the normal
data from abnormalities or anomalies.
Fig 1.6: Architecture of an autoencoder with three hidden layers
7
There are three components in autoencoder:
• Encoder: In an encoder, the model reduces the input dimensions and
converts the data into an encoded form.
• Bottleneck: This is the layer that includes the compressed form of input
data. This is layer in which the input data is converted to lowest number
of dimensions.
• Decoder: In this, the model recreates the data from the encoded form to
be as similar to the actual input as it can.
Some standard autoencoders are described briefly below:
1. Convolutional Autoencoder (CAEs): It encodes or breaks down the input
into a set of basic signals and then attempts to rebuild the input from all of
those signals. We can also utilize CAE [9] to alter the image geometry or to
produce reflectance. Encoder layers are convolution and pooling layers in
this type of autoencoder, whereas decoder consists of transpose convolution
(deconvolution is a misnomer for this) and unpooling layers.
Fig 1.7: Basic form of CAE architecture
8
2. Variational Autoencoders (VAEs): New images can be produced by using
this type of autoencoder similarly like in Generative Adversarial Networks
(GANs). A lot of assumptions are made by variational autoencoders [10]
when it comes to the distribution of latent variables. For latent
representation learning, they adopt a variational technique, that results in an
explicit loss factor and a Stochastic Gradient Variational Bayes estimator
for the training process. A variational autoencoder's probability distribution
of the latent vector often matches the training data much better than a
normal autoencoder. VAEs are appropriate for artwork production of any
kind since their production nature is significantly more adaptable and
configurable than GANs. Fig x. shows the basic architecture of VAE.
Fig 1.8: Variational autoencoder architecture
3. Denoising Autoencoders: In this type of autoencoder, additional noise is
added to the source images [11] and then noise removal learning is performed.
As a result, rather than duplicating the input to the output, data features are
learned. During training, these autoencoders use a slightly degraded input to
recover the original unaltered input. To cancel out the extra noise, there is
the learning of a vector field by the model for mapping the input data towards
9
a lower-dimensional manifold that reflects the natural data. This is how by
identifying the most relevant features, the encoder will learn a more robust
representation of the data.
Fig 1.9: Denoising autoencoder architecture
1.4 FEDERATED LEARNING
Federated learning [3] is a distributed approach towards machine learning. In
traditional approach, data from all the sources is aggregated on a single ser ver
and then the model is trained. In federated learning, the current model at any
time is sent to all the sources of data.
The training happens at the source itself. The data does not leave the sources
at any time.
Only the weight updates are sent back to the aggregation server. Homomorphic
encryption is used to ensure that the updates cannot be used to infer statistics
about the edge device data. The aggregation server updates the weights.
10
The update from each source is weighted by the amount of data t hat is used
for training by that source.
While theoretically viable, this architecture would not have been practical in
the past because the computational capacities of portable phones were
exceptionally restricted for running a machine learning model. In any case, a
lot of things changed in early 2017 [12].
There was the introduction of billions of smartphones with integrated AI chips
for computing heavy tasks starting with Huawei’s Mate, Google’s Pixel, and
Apple’s iPhones, all containing advanced hardware, designed to run Machine
learning-based tasks efficiently. The ratio of such smartphones has increased
by 32% from 2017 to 2020 [12].
This method of learning poses another big challenge. The model may contain
a large number of weights. Downloading and uploading all the weight changes
for each global epoch requires a high bandwidth.
Often there is a large gap between the download and upload speeds. The
download speed could be as high as 100 Mbps while the upload speed may be
less than 10 Mbps.
Uploading of the entire model from the sources to the server is often the
bottleneck step in federated learning. However, this approach is necessary for
preservation of privacy. The owners of data may not agree to share their data
for training the model.
Federated Learning permits for speedier testing and deployment of intelligent
models as well as decreased latency and power consumption while maintaining
11
data privacy. In addition to delivering an update to the shared model, the
enhanced local model can be used right away.
Fig 1.10: Visualization of steps of federated learning
Fig 1.11 Life of a Federated Learning Model
1.4.1 Life of a Federated Learning Model

Model architect creating a model for a specific domain often leads the
Federated Learning process. A natural language processing domain specialist,
12
for example, might create a model for predicting the next word on a virtual
keyboard.
A typical workflow [3], at a high level, is as follows:
1. Identification of Problem to be solved with Federated Learning : The
model architect recognizes a problem that can be solved using Federated
Learning.
2. Instrumentation of the client: If necessary, the users or clients (for
example, a mobile app) are programmed to retain the necessary training
data locally (with time and quantity constraints).
In many circumstances, the app will have already saved this information
(A messaging app, for example, must save messages, whereas a p hoto
management software should store images already.). Extra metadata or
data, such as user interaction data to provide labels for a supervised
learning assignment, may be necessary in some cases.
3. Prototyping through simulation: Using a suitable dataset, the model
engineer can test learning hyperparameters and prototype model
architectures in a Federated Learning simulation.
4. Training of Federated Model: To train distinct variations of the model
or employ distinct optimization hyperparameters, multiple fede rated
training processes are initiated.
5. Model Evaluation: After the tasks have been properly trained, the
models are analysed and suitable candidates are chosen. Metrics
generated on the data centre’s datasets, or federated evaluation, in which
the models are distributed to held-out clients for assessment on their local
data, are examples of analysis.
13
6. Model Deployment: Eventually, after choosing a good model, it follows
a normal model launch procedure, which includes manual quality
inspection, live A/B testing (where the updated model is used on various
gadgets while the model corresponds to the previous generation is used
on others to juxtapose in-vivo execution), and a staged rollout (so that
bad behaviour may be identified and corrected before it affects too many
users). The owner of the application determines the model's specific
launch method, which is usually unrelated to how the model was trained.
To put it in, either way, this phase would apply to either a model built
using federated learning or a model developed using a typical data centre
technique.
1.5 CYBER ATTACKS

Cyber-attacks [13] could be done with different intentions. It could be done to
gain access to the information stored on the computer, to intercept the data
flowing to and from the system or to simply stop the machine from functioning.
The abilities of attackers are growing along with the progress in hardware
capabilities of latest high-end machines.
1.5.1 Brute Force Attack

In this type of attack, hackers try to guess the credentials like passwords by
trying a large number of combinations to gain access to the user accounts.
Smaller and simpler passwords can be guessed in a few minutes while longer
and complicated passwords involving different punctuation symbols, numbers
and a mix of upper- and lower-case letters could take many years to be guessed.
14
Brute force attacks are of different kinds depending on what passwords are
attempted. Some attacks use dictionary words while others use logical guesses
involving birthdate, names of family members. A combination of both may
also be used.
1.5.2 Denial of Service Attack
Servers have a threshold on the number of requests they can serve at a time.
If the number of requests grows too much then the server would take a lot of
time to serve the clients which may frustrate them to leave the website and go
to the website of another company for its business.
Fig 1.12 Representation of a Denial of Service (DoS) attack
In extreme cases, the requests would result in timeout and not get served at
all.
15
Attackers try to disrupt the functioning of a server by sending large number of
requests continuously to the server.
If the malicious requests are sent from a number of computers, the n the attack
is known as Distributed Denial of Service (DDoS) attack.
Fig 1.13 Representation of a Distributed Denial of Service (DDoS) attack
1.5.3 Cross Site Scripting
Websites which accept user input in the form of blog post, comments or other
types of forms may be vulnerable to cross site scripting. The attacker can insert
malicious code in such input area of the trustworthy website.
This code could be used to change the content that is displayed on the website
or to gain access to website credentials of users which are stored by browsers
on users’ computers. This allows the attacker to do any action on the website
on behalf of the real users.
16
Fig 1.14 Representation of Cross Site Scripting attack
1.5.4 SQL Injection
If a website or any other application is supposed to provide results after
fetching some data from a database based on a user’s query entered in a text
input field, it may be vulnerable to SQL injection attack.
The application uses the user’s input to construct an SQL query which is
executed on the database. If the query can be manipulated by an adversary,
then they may collect any information which they are not authorized to. It may
even be possible to delete entire tables from the database using this att ack.
1.5.5 Botnet
The term is a short form for robot-network. The attacker infects a large number
of computers and uses the whole network of such computers to carry out
malicious activities. Botnet is also known as zombie army.
IoT devices like set top boxes, cameras, voice assistant speakers, smart
watches etc may also be used as zombies. These botnets could be used for
17
mining cryptocurrency, launching DDoS attacks or recruiting more zombies
into the network.
As the botnet uses a small fraction of the processing capabilities of a machine,
the effect is not easily observable because the devices appear to function
normally to their users.
1.5.6 Port Scan
Attackers send data to a number of ports of the victim’s computer to identify
which ports accept connections and what response it sends back.
It helps them find out which devices are vulnerable to attacks and which ones
have strong measures of security like firewalls installed.
It tells them which services are running on the device, their versions and
whether it is accepting anonymous logins or not. This information allows the
adversary to decide which exploits to use.
18
CHAPTER 2 RELATED WORK
Chen et al.[14] have proposed the concept of federated learning-based
Attention Gated Recurrent Unit (FedAGRU), an intrusion detection algorithm
to detect any kind of abnormality in the wireless edge networks. As it is based
on a federated learning model it updates the model to the cloud itself rather
than sharing the user’s personal data to the cloud which may c reate privacy
issues for the user. The algorithm also increments the weights of some
particular devices which are high in priority and avoids updating the weights
of the rest of the devices depending on a certain threshold value which may
determine whether that device is important for the cloud server or not. This
thus reduces any kind of irrelevant updates thereby reducing the bandwidth
and extra computational costs required for updating those parameters to the
server. The cloud server starts updating the model parameters on receiving
model updates from multiple client devices without waiting for the rest of the
client devices whose models are still processing the data, this is because the
updates are asynchronous in nature thus reducing the time required f or
aggregating and updating the parameters of the global model. Along with that
FedAGRU assigns different weights to different clients according to their
modified model parameters after calculating the new model parameter by
taking the weighted average of the previous parameters.
The FedAGRU algorithm is based on GRU-SVM, a combined model made
from GRU (Gated recurrent unit) and SVM (Support Vector Machine) instead
of using SGD (Stochastic Gradient Descent), and is relatively faster and uses
fewer parameters in comparison to SGD. They found that FedAGRU is 8%
19
more accurate compared to the other centralized learning algorithms and is
70% less expensive computation-wise than other federated learning
algorithms.
Some researchers have also compared different models on different dataset to
find out the optimum model required for their IDS research. Hindy et al. [ 15]
presented an autoencoder-based neural network model for implementing an
Intrusion Detection System to detect zero-day attacks. A zero-day attack is an
unknown flaw or bug in the system which has not been discovered by any
developer which is then further exploited by some attacker to either gain
control of the system or steal data from the system such as user credentials.
These attacks create a very nasty challenge for many other machine learning
models such as signature-based detection models which depend on previously
detected threats for identifying current threats which may be attacking the
network. Hence, they suffer from a lot of false negatives d ue to the increasing
number and variety of new threats to the system which may remain undetected.
The authors are aiming to resolve this issue by building a better IDS model
using an autoencoder model which can detect zero-day attacks a lot faster than
previous machine learning models with a reasonable number of false positives.
For demonstrating the efficiency of their autoencoder model they have also
built an outlier detecting One-Class Support Vector Machine (SVM) model to
compare the results along with using two different datasets one of which is
CICIDS2017 and the other one is NSL-KDD for demonstration purposes.
Now there are some differences between the CICIDS2017 and NSL -KDD
dataset with CICIDS2017 being diverse and covering a wide variety of
common cyber-attacks such as Heartbleed, SSH & FTP Brute Force, DDOS
20
and many more which is suitable to represent real world attacks as on daily
many types of attacks a network has to face whereas NSL -KDD covers only
benign traffic, Denial of Service (DOS), probing, Remote to Local (R2L) and
User to Root(U2R). Before using the datasets for training and validation the
correlated features are dropped from the datasets to reduce model instability
and flow-based data are used, as they are better suited for the IDS syst em.
During the scaling and selection of the features only benign data was selected
to be used to eliminate any kind of risk of any influence of an attack instance.
Thus, the models are then trained on both the datasets with 75% of normal data
for training and the rest for testing purposes. The results thus show that
autoencoders are more suitable at detecting zero-day attacks compared to one-
class SVM showing an accuracy of 89-99% for the NSL-KDD dataset and 75-
98% for the CICIDS2017 dataset.
Yulianto et al. [16] proposed on improving the performance of AdaBoost -
based Intrusion Detection System using a sampling technique called Synthetic
Minority Oversampling Technique (SMOTE), feature selection techniques like
Principal Component Analysis (PCA) and Ensemble Feature Selection (EFS)
as previous research weren’t able to give good enough results using the
AdaBoost classifier technique due to problems which include an imbalance in
CICIDS2017 dataset and improper feature selection of classification methods .
AdaBoost, also known as Adaptive Boosting, is a boosting technique that can
greatly improve the characterization ability by numerous cycles by reducing
bias and variance present in the model and is best suited for weak learners such
as decision trees.
21
Therefore, the author’s goal is to reduce the imbalance present in the training
dataset using SMOTE which generates synthetic samples for the minority class
thus improving the sensitivity of the model towards minority class and select
important features from the new dataset with the help of PCA and ensemble
feature selection which minimizes the dimensionality of the dataset and at the
same time reduce the data loss too. The researchers have thus found their
research to be outperforming previous research with an accuracy of 81.83%.
Mathur [17] tried to incorporate time dimension of data and investigated
whether the overlap between anomaly and benign data flow in the network can
be eliminated. He has also used the CICIDS2017 dataset due to its wide range
of attacks. Also, during the pre-processing part of the dataset, the time stamps
of all the flow data have been modified to include even microseconds to make
the data more precise. The author has used an autoencoder along with ReLU
(Rectified Linear Unit) as activation function and RMSE (Root Mean Square
Error) as loss function. He also experimented with different number of neurons
in autoencoder model and found out that on increasing number of neurons the
RMSE value decreased but for both benign and malicious data they changed
at around equal proportions. For time analysis of the network, he used an
ensemble of autoencoders and trained each of them on some particular port
and on some specific substructure of the data. But the results of the testing
showed us that there still existed an overlap between two traffic flows-benign
and malicious data thus making us realize the fact that no matter what some
overlap between the data will always exist no matter how we label the data.
Panwar et al. [18], took different combinations of four feature selection
algorithms with two machine learning algorithms and compare the
22
performance of all the combinations on the basis of four parameters time,
accuracy, specificity, and sensitivity. Four feature selection algorithms
include Classifier Subset Evaluator with Decision Tree, Classifier Subset
Evaluator with Naive Bayes, Classifier Subset Evaluator with J48, and
CfsSubset Attribute Evaluator and the two different machine learning
algorithms include REPTree and OneR. For training and evaluation, the same
dataset CICIDS-2017 is used. The software tool for data analysis and
investigation was WEKA. As the result, they found that feature selection
reduced the dataset size and time and gives high performance. For Port Scan
Attack, Heartbleed Attack/DoS Attack, Brute Force Attack, Botnet Attack,
Web Attack, and DDoS Attack, the REPTree classification algorithm with
CfsSubset Attribute Evaluator with J48 features selection technique performs
best, while the REPTree classification algorithm with Classifier Subset
Evaluator with Naive Bayes features selection technique performs best for
infiltration attack.
Farukee et al. [19] proposed a model for DDoS attack detection in IoT
networks using a single direction convolutional neural network (1D CNN) and
Multilayer Perceptron Method with the random forest as a feature selector.
Training and evaluation take place on the same data set which is CICIDS2017.
The main motive for the proposed work was to detect the DDoS attack as soon
as possible before it affects the system majorly. As the result, RF-1DCNN
achieved a high accuracy of 99.63% and in the case of RF -MLP accuracy
obtained was 99.63%. They also concluded that the feature selection approach
appears to have a considerable impact on accuracy, stabili ty, and
interpretability.
23
24
CHAPTER 3 METHODOLOGY
We performed our experimentative analysis with the help of a python program.
The autoencoder was created using Keras [20] and TensorFlow.
3.1 LIBRARIES USED
3.1.1 TensorFlow
TensorFlow is an open-source machine learning platform originally created by
Google and used in many services provided by it [21]. Development of
TensorFlow Federated, which is meant for research on federated learning using
TensorFlow, is ongoing.
TensorFlow allows the user to focus on development of the model while it
takes care of the finer details behind the scenes. Keras is the high -level API
for TensorFlow.
A sequential model in TensorFlow allows us to create a linear stack of layers.
Each layer has a single input tensor and a single output tensor. We have used
this type of model in this thesis.
3.1.2 Pandas
Pandas is a data manipulation and analysis library in Python. It provides us
Series and DataFrame data structures [22].
Series is unidimensional array which can hold multiple types of data.
DataFrame is a two-dimensional data structure which looks like an Excel sheet
25
or an SQL Table. The columns are unidimensional arrays and each column
may be of different data type.
This library gives us the capability to read a CSV file containing column names
along with the data into a DataFrame.
3.1.3 Matplotlib
Matplotlib is the most widely used library for plotting graphs in Python [23].
It plots two dimensional graphs like the simple plot between two arrays of
numbers connecting the dots with lines, scatter plot, bar graph, step graph,
histogram, box plot, pie chart, contours etc. It can also convert a three -
dimensional array to an image by using the data as RGB values for the colour
of each pixel.
3.1.4 Scikit-Learn
It is an open-source machine learning library in Python which supports
classification, regression, clustering, dimensionality reduction as well as pre -
processing of data.
It also allows us to train neural networks but does not offer GPU support which
makes it not much useful for large scale applications. The customization which
we can do to the model is also limited.
In this thesis, we use this library for pre-processing of data using Min-Max
scaler.
26
3.2 PROPOSED ALGORITHM
In federated learning, we have two algorithms. The cloud algorithm, which
runs on the machine that aggregates the updates and computes the global model
and the edge algorithm which runs on the devices which provide the data for
training.
Our proposal for both of them is as follows.
3.2.1 CLOUD ALGORITHM
The aggregation server stores a database of all the edge devices which are
interested to train the global model. It assigns a default value to them and we
will refer to this value as the trust of that device. As we need to evaluate the
impact of update by each device, we cannot aggregate all the updates and
compute their weighted average based on the amount of data. We rather fix the
amount of data that would be used for training by each device in a single epoch.
The server sends the current model that it has at any time along with the
amount of data that should be used for training to the edge devices. The server
then receives the updates to weights from the edge devices and scales t hem by
the old value of trust and applies these updates on the global model. Then it
evaluates the impact of this update by computing the accuracy of this model
on a dataset which is kept for this purpose. If the impact is positive (accuracy
increases), the trust of the device which sent that update is incremented. In
case the impact turns out to be negative (accuracy decreases), the value of trust
of that device is decremented.
27
The server then moves to the next edge device and the above -mentioned
procedure is repeated with it.
The process is repeated with all the edge devices in the database cyclically. If
any new device wants to join the system, then it will be assigned the default
value of trust and get added to the database towards the end.
28
α α
α α
α α
α α
Different combinations of arithmetic and geometric increment and decrement
can be tried and the results can be compared.
The upper bound and lower bound in this algorithm have their own
significance.
In absence of any upper bound, the trust would keep increasing and the updates
to weights will grow so large that the model would oscillate from one
suboptimal state to another suboptimal state but would never arrive at the
29
optimal state. Later on, the updates will be so large that the model would be
rendered absolutely useless.
The lower bound is necessary to stop the trust from reaching a negative value
(if arithmetic decrement is used) or a value so small that it becomes impossible
to grow reasonably when it has correctly labelled data which would improve
the performance (if increments and decrements are geometric).
3.2.2 EDGE ALGORITHM
An edge device receives a request from the aggregation server to train the
model. The input that it receives contains the current global model as well as
the amount of data that it should use for training. Alternatively, it already
knows the amount of data required and sends a request to the aggregation
30
server for allowing it to join the epoch of training the global model. The
amount of data is fixed to maintain uniformity among all the edge devices.
It uses the specified amount of data to train the model using backpropagation
and gradient descent. It sends back the update that it wants the aggregation
server to make on the global model.
Then it keeps on generating or collecting more data until it receives another
request for training from the server or it has generated the fixed amount of
data that it must use for training. Once that much data is available, it can itself
request the server to download the current model as it would like to train it.
3.2.3 SETTING THRESHOLD
What we would expect in an ideal scenario is that we set a threshold in such a
way that all the benign samples have reconstruction error below it and all the
anomalous samples have an error higher than it. However, it is impossible to
set such a threshold due to a significant overlap between the reconstruction
31
errors of benign and anomalous samples. Even if we take the threshold equal
to 0.9 times the maximum reconstruction error of the training data, we see that
almost all the samples get classified as benign. This is because on benign
dataset also has some samples which are highly unsimilar to the other samples
and hence have a huge reconstruction error, which is greater than the
reconstruction error of most of the anomalous samples as well.
Fig 3.1 RMSE distribution for Friday Morning Working Hours after training on benign data (Monday)
Hence, we attempt to choose the threshold in such a way that we get
minimum false positives while being able to detect most of the
anomalous samples correctly. This c an be done if we use a percentile of
the reconstruction error on training data as threshold. If we set it to 90
percentile of the reconstruction error of the training data, we can expect
that 10% of the benign samples would also get classified as malicious .
This is clearly undesirable, but inevitable due to the distribution of
reconstruction errors.
32
3.3 MATHEMATICAL JUSTIFICATION
For simplicity, consider two functions of two variables. The variables
are the model parameters. The aim of the model is to reac h the optimal
location in one function. There are two guiding forces which modify the
variables. One force makes the model move towards the optimal location
of the function in which we desire it to go. The other force is guiding
the model to towards the op timal location of the other function which
is undesired. The model is cyclically led by both the forces turn by turn.
Up to a few steps, the model feels that both the forces are friendly and
it trusts both of them. After a certain step it realizes that whi le one force
is friendly and leading it to its destination, the other one is trying to
take it somewhere else. When it notices that it is moving farther from
its destination when it is led by one force, its trust on that force starts
to fade away and the s teps it takes in the direction indicated by that
force become smaller and smaller. It is able to see that the destination
coming closer when led by the other force and its step size for being led
by that force increases gradually. However, it should increa se only in a
limit, as too large steps, instead of taking it to its destination, would
take it even farther from the destination than it previously was.
33
Fig 3.2 Initial view of mathematical description presented
In Figure 3.2, the circle at the starting of the dotted arrow represents the initial
position of the model. The dotted arrow is the misguiding force. The triangle
represents the optimal location or centre of attraction of the misguiding force.
The cross symbol represents the centre of attraction or optimal point of the
good guiding force. The hollow circle is the position of the model after it takes
a step in the direction of the misguiding force. This move decreases its distance
from both the centres. However, the model focuses only knows its dis tance
from the correct destination and as this distance has decreased, its trust on the
misguiding force increases.
The next figure, Figure 3.3, shows the turn of ‘good’ force. The distance
between the cross symbol and current position decreases once again and
its trust on this force also increases.
34
Fig 3.3 The model is guided by ‘good’ force
Fig 3.4 The model realizes its ‘mistake’
35
Figure 3.4 shows the point where the distance of the model from its
optimal destination point increases when it is the turn of the ‘bad’ force.
At this point, the model realizes that it should not trust this force and
gradually the size of the step on this ‘bad’ force’s t urn decreases until
it reaches a lower bound.
3.4 SIMULATION
3.4.1 About CIC-IDS-2017 Dataset
The dataset used in this project is the CIC -IDS-2017 Dataset [24], an
IDS evaluation dataset generated by the Canadian Institute of
Cybersecurity (CIC) based at the University of New Brunswick in
Fredericton, Canada. This dataset was generated over a period of five
days from Monday through Friday and has been available to be used by
the research community since 2018 in PCAP format as well as in CSV
format where each record is a labelled flow with 78 features. Some
datasets like NSL-KDD and CAIDA have fewer attacks whereas a
variety of attacks has been used in CICIDS2017 simulating the real
world as much as possible.
The Monday CSV file represents the data for the first day of the week
and includes benign traffic only. Simulated attacks were executed
between Tuesday and Friday CSV files. These are Brute -force attacks
on File Transfer Protocol (FTP) and Secure Shell (SSH), DoS,
Heartbleed attack, Web attacks, Infiltration, Botnets and Distributed -
Denial-of-Service (DDoS) attacks.
Each record of the CSV files is the traffic flow in the network.
36
What is a traffic flow? It is the sequence of IP packets passing an
observation point during a certain time interval. It isn’t as
comprehensive as the packet data but is more than sufficient for keeping
a track of statistical characteristics and identify anomalies which may
be present in the network due to the presence of some malicious data.
This kind of statistical da ta can be further analysed for verifying if the
network is properly being used by the people of its organisation and no
misconduct is done. The victim network generated the datasets using
the help of a mirrored port on the primary switch which was saving a ll
the data on a PCAP file. The CICFlowMeter tool has been used to create
bidirectional flows and also the calculated features from the PCAP files.
Some of the features of the flow data are source IP address, source port,
destination port and many more.
The attack network and victim network used for creating the datasets
had all the required equipment like router, firewalls, switches, hubs and
computers with different operating systems like Windows, macOS and
Ubuntu.
The dataset contains “Infinity” and “ NaN” in some places. We replaced
“Infinity” by 10 1 0 and “NaN” by 0. Some categories of attacks have very
little representation making the data on some of the days highly
imbalanced.
3.4.2 Approach followed
The features are scaled using min -max scaler algorithm from scikit -
learn library to standardize the datasets. The number of features could
37
have been reduced using principal component analysis. The first 20
vectors would have been enough to cover more than 99% data.
We have used an autoencoder with a singl e hidden layer containing 10
neurons for detecting anomalies in the data.
The activation function used in the model is ‘ReLU’ which stands for
Rectified Linear Unit, a piecewise function which gives the output as
input if it is positive otherwise it gives zero.
After which the model is compiled with Root Mean Square Error and
Adam optimizer.
Here Adam Optimizer [25] is Adaptive Moment Estimation which is an
efficient method used for optimizing the gradient descent even while
working with a large number o f parameters or data. It is a
combination of ‘gradient descent with momentum’ algorithm and the
RMSProp algorithm.
We use two for loops for training. A single iteration of the outer loop
refers to one global epoch and a single iteration of the inner loop would
train the algorithm for one epoch on a single source of data, compare
the result with previous result, update the trust value of that source in
a dictionary which contains trust values of all the sources and change
the value stored in previous resul t to the updated one.
38
CHAPTER 4 RESULTS AND DISCUSSION
The following results were generated using 4 sources. The first 2 lakh samples
from Wednesday data (which contains DoS attacks) were used as server data
for measuring accuracy. The next 2 lakh samples from Wednesday data were
used as first source and next 2 lakh samples were used as third source. The
first 2 lakh records from Monday data (Benign) were used as second source,
the next 2 lakh samples were used as fourth source and the remaining 129918
samples were used for setting threshold. The 90 percentile of reconstruction
errors from the data kept for setting threshold was used as the threshold.
In source 1, 113515/200000 samples are malicious. In source 3, 4404/200000
samples are malicious. In server data, 1229971/200000 samples are malicious.
Geometric increment and decrement were used with the multiplication factors
1.1 and 0.9 respectively.
Fig 4.1: Variation of trust of the four sources against epoch
39
The upper bound for trust values was set to 5 and the lower bound was
set to 0.1. Figure 4.1 shows the result graphically. The trust value of
source 2 has reached the upper bound of 5 and stays constant after that.
The trust value of source 4 alternates ab ove and below 1. The trust value
of sources 1 decreases almost every time as the accuracy is very low
due to a large number of malicious samples. The trust value of source 3
first increases up to a certain epoch and then keeps decreasing.
40
CHAPTER 5 CONCLUSION AND FUTURE WORK
5.1 CONCLUSION
Our focus in this thesis was to study the impact of assigning variable learning
rates to different sources involved in training. Due to a significant overlap of
patterns of malicious and benign flows, the model described here is not an
ideal choice to detect cyber-attacks.
However, our idea of varying the trust parameter depending on the effect each
epoch causes on the model, appears to be successful to a certain extent. When
trained with a mix of benign and attack data, the overlap between the training
data and attack flows indeed increases and makes it tougher to identify the
flows accounting for attacks.
5.2 FUTURE WORK
We have explored only the effect of varying the learning rate based on trust
parameter. In future, the effect of varying other parameters like, number of
epochs on each dataset and the batch size can also be explored.
Different types of datasets can be experimented with. In case of cyber-attack,
attackers keep changing their exploits to counter various security measures.
However, in cases where the anomalies occur without such deliberate attempts
and are similar in pattern to other anomalies, this method may gi ve better
results. Hence, this approach should be attempted on such datasets.
While carrying out this analysis, we assumed perfect flow of information
which does not happen in real networks.
41
If a network simulator is used for the experimentation, instead of a simple
program that we have used, the challenges of a real network would get
highlighted. That would lead to further research on dealing with issues of
actual networks like encryption of individual updates to the weights to prevent
analysis which could potentially reveal some information about the data
present at the sources.
42
REFERENCES
[1] Stahl R (2017) Technology reliant society, has it gone too far?
(https://thesnapper.millersville.edu/index.php/2017/04/19/technology-reliant-society-
opinion)
[2] Stobierski T (2019) The advantages of data-driven decision-making

(https://online.hbs.edu/blog/post/data-driven-decision-making)
[3] Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN,

Bonawitz K, Charles Z, et al Advances and open problems in federated
learning. arXiv preprint arXiv:1912.04977. 2019 Dec 10.
[4] Dongare AD, Kharde RR, Kachare AD. Introduction to artificial neural
network. International Journal of Engineering and Innovative Technology
(IJEIT). 2012 Jul;2(1):189-94.
[5] Fine TL. Feedforward neural network methodology. Springer Science &
Business Media; 2006 Apr 6.
[6] Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X,

Wang G, Cai J, Chen T. Recent advances in convolutional neural networks.
Pattern Recognition. 2018 May 1; 77:354-77.
[7] Medsker L, Jain LC, editors. Recurrent neural networks: design and
applications. CRC press; 1999 Dec 20.
[8] Bank D, Koenigstein N, Giryes R. Autoencoders. arXiv preprint

arXiv:2003.05991. 2020 Mar 12.
[9] Chow JK, Su Z, Wu J, Tan PS, Mao X, Wang YH. Anomaly detection of
defects on concrete structures with the convolutional autoencoder. Advanced
Engineering Informatics. 2020 Aug 1; 45:101105.
[10] An J, Cho S. Variational autoencoder based anomaly detection using

reconstruction probability. Special Lecture on IE. 2015 Dec 27;2(1):1 -8.
43
[11] Gondara L. Medical image denoising using convolutional denoising
autoencoders. In2016 IEEE 16th international con ference on data mining
workshops (ICDMW) 2016 Dec 12 (pp. 241-246). IEEE.
[12] Llanasas R (2020) How AI and machine learning are transforming mobile
technology (https://www.greenbook.org/mr/market-research-technology/how-ai-is-
transforming-mobile-technology/)
[13] Akbari Roumani M, Fung CC, Rai S, Xie H. Value analysis of cyber
security based on attack types. ITMSOC: Transactions on Innovation a nd
Business Engineering. 2016; 1:34-9.
[14] Chen Z, Lv N, Liu P, Fang Y, Chen K, Pan W. Intrusion Detection for

Wireless Edge Networks Based on Federated Learning. IEEE Access. 2020
Dec 1; 8:217463-72.
[15] Hindy H, Atkinson R, Tachtatzis C, Colin JN, Bayne E, Bellekens X.

Utilising deep learning techniques for effective zero-day attack detection.
Electronics. 2020 Oct;9(10):1684.
[16] Yulianto A, Sukarno P, Suwastika NA. Improving Adaboost -based

intrusion detection system (IDS) performance on CIC IDS 2017 dataset.
InJournal of Physics: Conference Series 2019 Mar 1 (Vol. 1192, No. 1, p.
012018). IOP Publishing.
[17] Mathur NO. (2020) Application of Autoencoder Ensembles in Anomaly

and Intrusion Detection using Time-Based Analysis (Masters dissertation,
University of Cincinnati).
[18] Singh Panwar S, Raiwani YP, Panwar LS. Evaluation of network

intrusion detection with features selection and machine learning algorithms
on CICIDS-2017 dataset. In International Conference on Advances in
Engineering Science Management & Technology (ICAESMT)-2019,
Uttaranchal University, Dehradun, India 2019 Mar 15.
44
[19] Farukee MB, Shabit MZ, Haque MR, Sattar AS. DDoS Attack Detection
in IoT Networks Using Deep Learning Models Combined with Random
Forest as Feature Selector. In International Conference on Advances in Cyber
Security 2020 Dec 8 (pp. 118-134). Springer, Singapore.
[20] Ketkar N. Introduction to keras. In Deep learning with Python 2017(pp.

97-111). Apress, Berkeley, CA.
[21] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M,

Ghemawat S, Irving G, Isard M, Kudlur M. Tensorflow: A system for large -
scale machine learning. In12th {USENIX} symposium on operating systems
design and implementation ({OSDI} 16) 2016 (pp. 265-283).
[22] McKinney W. pandas: a foundational Python library for data analysis

and statistics. Python for high performance and scientific computing. 2011
Nov 18;14(9):1-9.
[23] Tosi S. Matplotlib for Python developers. Packt Publishing Ltd; 2009
Nov 9.
[24] Yulianto A, Sukarno P, Suwastika NA. Improving adaboost-based

intrusion detection system (IDS) performance on CIC IDS 2017 dataset. In
Journal of Physics: Conference Series 2019 Mar 1 (Vol. 1192, No. 1, p.
012018). IOP Publishing.
[25] Bock S, Goppold J, Weiß M. An improvement of the convergence proof

of the ADAM-Optimizer. arXiv preprint arXiv:1804.10587. 2018 Apr 27.
45
46
Bachelor Thesis Project
ORIGINALITY REPORT
10 %
SIMILARITY INDEX
6%
INTERNET SOURCES
6%
PUBLICATIONS
3%
STUDENT PAPERS
PRIMARY SOURCES
1
www.mdpi.com
Internet Source 1%
2
Submitted to Rochester Institute of
Technology
1%
Student Paper
3
docplayer.net
Internet Source <1 %
4
"Proceedings of Data Analytics and
Management", Springer Science and Business
<1 %
Media LLC, 2022
Publication
5
Submitted to Indian School of Mines
Student Paper <1 %
6
ebin.pub
7
"Data Science and Security", Springer Science
and Business Media LLC, 2021
<1 %
Publication
8
elib.dlr.de
47
9
papers.ssrn.com
10
Submitted to Universiti Teknologi Petronas
Student Paper <1 %
11
link.springer.com
12
labs-repos.iit.demokritos.gr
13
"Advances in Cyber Security", Springer
Science and Business Media LLC, 2021
<1 %
Publication
14
iopscience.iop.org
15
s-space.snu.ac.kr
16
www.upgrad.com
17
"Computational Vision and Bio Inspired
Computing", Springer Science and Business
<1 %
Media LLC, 2018
Publication
18
Submitted to University of Reading
Student Paper <1 %
19
arxiv.org
48
20
dokumen.pub
21
downloads.hindawi.com
22
www.greenbook.org
23
Angela Demke Brown. "Compiler-based I/O
prefetching for out-of-core applications", ACM
<1 %
Transactions on Computer Systems, 5/1/2001
Publication
24
Submitted to Myongji University Graduate
School
<1 %
Student Paper
25
Submitted to Thadomal Shahani Engineering
College
<1 %
Student Paper
26
Latif U. Khan, Walid Saad, Zhu Han, Ekram
Hossain, Choong Seon Hong. "Federated
<1 %
Learning for Internet of Things: Recent
Advances, Taxonomy, and Open Challenges",
IEEE Communications Surveys & Tutorials,
2021
Publication
27
Submitted to Swinburne University of
Technology
<1 %
Student Paper
49
28
Zhuo Chen, Na Lv, Pengfei Liu, Yu Fang, Kun
Chen, Wu Pan. "Intrusion Detection for
<1 %
Wireless Edge Networks Based on Federated
Learning", IEEE Access, 2020
Publication
29
doaj.org
30
en.wikipedia.org
31
iugspace.iugaza.edu.ps
32
www.cert.org
33
Xumei Fan, William Sayers, Shujun Zhang,
Zhiwu Han, Luquan Ren, Hassan Chizari.
<1 %
"Review and Classification of Bio-inspired
Algorithms and Their Applications", Journal of
Bionic Engineering, 2020
Publication
34
www.springerprofessional.de
35
publications.muet.edu.pk
36
"Machine Intelligence and Soft Computing",
Springer Science and Business Media LLC,
<1 %
2021
50
Publication
37
"Proceedings of the 22nd Engineering
Applications of Neural Networks Conference",
<1 %
Springer Science and Business Media LLC,
2021
Publication
38
Al-Zoubi, H.. "Rejection and modelling of
sulphate and potassium salts by
<1 %
nanofiltration membranes: neural network
and Spiegler-Kedem model", Desalination,
20070205
Publication
39
Ankit Thakkar, Ritika Lohiya. "A survey on
intrusion detection system: feature selection,
<1 %
model, performance measures, application
perspective, challenges, and future research
directions", Artificial Intelligence Review, 2021
Publication
40
Benedetto Marco Serinelli, Anastasija Collen,
Niels Alexander Nijdam. "On the analysis of
<1 %
open source datasets: validating IDS
implementation for well-known and zero day
attack detection", Procedia Computer Science,
2021
Publication
41
Dun Li, Dezhi Han, Tien-Hsiung Weng, Zibin
Zheng, Hongzhi Li, Han Liu, Arcangelo
<1 %
Castiglione, Kuan-Ching Li. "Blockchain for
51
federated learning toward secure distributed
machine learning systems: a systemic
survey", Soft Computing, 2021
Publication
42
Mohamed Gaber, Ashraf Khalaf, Imbaby
Mahmoud, Mohamed El_Tokhy. "Advanced
<1 %
Protection Scheme For Information
Monitoring in Internet of Things
Environment", Research Square Platform LLC,
2021
Publication
43
arxiv-export-lb.library.cornell.edu
44
content.iospress.com
45
mafiadoc.com
46
"Intelligent Communication, Control and
Devices", Springer Science and Business
<1 %
Media LLC, 2020
Publication
47
Arif Yulianto, Parman Sukarno, Novian Anggis
Suwastika. "Improving AdaBoost-based
<1 %
Intrusion Detection System (IDS) Performance
on CIC IDS 2017 Dataset", Journal of Physics:
Conference Series, 2019
Publication
52
48
Hanane Azzaoui, Akram Boukhamla. "Two-
Stages Intrusion Detection System Based On
<1 %
Hybrid Methods", Proceedings of the 10th
International Conference on Information
Systems and Technologies, 2020
Publication
49
Mohamed Amine Ferrag, Othmane Friha,
Leandros Maglaras, Helge Janicke, Lei Shu.
<1 %
"Federated Deep Learning for Cyber Security
in the Internet of Things: Concepts,
Applications, and Experimental Analysis", IEEE
Access, 2021
Publication
50
"Computer Security – ESORICS 2017", Springer
Nature, 2017
<1 %
Publication
51
Hanan Hindy, Robert Atkinson, Christos
Tachtatzis, Jean-Noël Colin, Ethan Bayne,
<1 %
Xavier Bellekens. "Utilising Deep Learning
Techniques for Effective Zero-Day Attack
Detection", Electronics, 2020
Publication
Exclude quotes Off Exclude matches Off

Exclude bibliography On
53
54
APPENDIX
Description of Features in the Dataset
Sr. No Feature Name Description

1 Destination Port Destination Port
2 Flow Duration Duration of the flow in Microseconds
Total count of packets in the forward
3 Total Fwd Packets direction
Total count of packets in the backward
4 Total Bwd Packets direction
Total size of packets in the forward
5 Total Length of Fwd Packets direction
Total size of packets in the backward
6 Total Length of Bwd Packets direction
Minimum size of packets in the forward
7 Fwd Packet Length Min direction
Maximum size of packets in the forward
8 Fwd Packet Length Max direction
Mean size of packets in the forward
9 Fwd Packet Length Mean direction
Standard deviation of packet sizes in the
10 Fwd Packet Length Std forward direction
Minimum size of packets in the backward
11 Bwd Packet Length Min direction
Maximum size of packets in the backward
12 Bwd Packet Length Max direction
Mean size of packets in the backward
13 Bwd Packet Length Mean direction
Standard deviation of packet sizes in the
14 Bwd Packet Length Std backward direction
15 Flow Bytes/s Number of flow bytes per second
16 Flow Packets/s Number of flow packets per second
Mean time between two packets sent in the
17 Flow IAT Mean flow
Standard deviation of time between two
18 Flow IAT Std packets sent in the flow
Maximum time between two packets sent in
19 Flow IAT Max the flow
55
Minimum time between two packets sent in
20 Flow IAT Min the flow
21 Fwd IAT Min the forward direction
22 Fwd IAT Max the forward direction
23 Fwd IAT Mean forward direction
Standard deviation of time between two
24 Fwd IAT Std packets sent in the forward direction
Total time between two packets sent in the
25 Fwd IAT Total forward direction
26 Bwd IAT Min the backward direction
27 Bwd IAT Max the backward direction
28 Bwd IAT Mean backward direction
Standard deviation of the time between two
29 Bwd IAT Std packets sent in the backward direction
Total time between two packets sent in the
30 Bwd IAT Total backward direction
Number of times the PSH flag was set in
31 Fwd PSH Flags packets travelling forward
Number of times the PSH flag was set in
32 Bwd PSH Flags packets travelling backwards
Number of times the URG flag was set in
33 Fwd URG Flags packets travelling forward
Number of times the URG flag was set in
34 Bwd URG Flags packets travelling backwards
Total bytes used for headers in the forward
35 Fwd Header Length direction
Total bytes used for headers in the backward
36 Bwd Header Length direction
37 Fwd Packets/s Number of forward packets per second
38 Bwd Packets/s Number of backward packets per second
39 Min Packet Length Minimum length of a packet
40 Max Packet Length Maximum length of a packet
41 Packet Length Mean Mean length of a packet
42 Packet Length Std Standard deviation of packet length
56
43 Packet Length Variance Variance of packet length
44 FIN Flag Count Number of packets with FIN Flag
45 SYN Flag Count Number of packets with SYN Flag
46 RST Flag Count Number of packets with RST Flag
47 PSH Flag Count Number of packets with PUSH Flag
48 ACK Flag Count Number of packets with ACK Flag
49 URG Flag Count Number of packets with URG Flag
50 CWE Flag Count Number of packets with CWE
51 ECE Flag Count Number of packets with ECE
52 Down/Up Ratio Download and upload ratio
53 Average Packet Size Average size of packets
Average size observed in the forward
54 Avg Fwd Segment Size direction
Average size observed in the backward
55 Avg Bwd Segment Size direction
Average bytes bulk rate in the forward
56 Fwd Avg Bytes/Bulk direction
Average packet bulk rate in the forward
57 Fwd Avg Packets/Bulk direction
58 Fwd Avg Bulk Rate Average bulk rate in the forward direction
Average bytes bulk rate in the backward
59 Bwd Avg Bytes/Bulk direction
Average packets bulk rate in the backward
60 Bwd Avg Packets/Bulk direction
61 Bwd Avg Bulk Rate Average bulk rate in the backward direction
Average number of packets in a sub flow in
62 Subflow Fwd Packets the forward direction
Average bytes in a sub flow in the forward
63 Subflow Fwd Bytes direction
Average number of packets in a sub flow in
64 Subflow Bwd Packets the backward direction
Average bytes in a sub flow in the backward
65 Subflow Bwd Bytes direction
Number of bytes sent in the initial window
66 Init Win bytes fwd in the forward direction
Number of bytes sent in the initial window
67 Init Win bytes bwd in the backward direction
57
Count of packets with at least 1 byte of TCP
68 act data pkt fwd data payload in the forward direction
Minimum segment size observed in the
69 Min seg size fwd forward direction
Mean time a flow was active before
70 Active Mean becoming idle
Standard Deviation of the time a flow was
71 Active Std active before becoming idle
Maximum time a flow was active before
72 Active Max becoming idle
Minimum time a flow was active before
73 Active Min becoming idle
Minimum time a flow was idle before
74 Idle Min becoming active
Mean time a flow was idle before becoming
75 Idle Mean active
Maximum time a flow was idle before
76 Idle Max becoming active
Standard deviation of the time a flow was
77 Idle Std idle before becoming active
The target variable, ‘Benign’ or a specific
78 Label ‘Attack category’
58

Bachelor Thesis Project Semester 7

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bachelor Thesis Project Semester 7

Uploaded by

Copyright:

Available Formats

A FEDERATED LEARNING APPROACH TO DETECT AND AVOID

SOURCES OF MISCLASSIFIED CYBER ATTACK DATA

Under the supervision of

DIVISION OF INFORMATION TECHNOLOGY

Division of Information Technology

Ayush Goel Roll No. 2018UIT2582) students of B. E., Division of Information

Technology, hereby declare that the Project-Thesis titled “A Federated Learning

submitted by us to the Division of Information Technology, Netaji Subhas Institute of

award of any degree.

Rishabh Setiya Sriram M. Pant Ayush Goel

Division of Information Technology

2018UIT2623 and Ayush Goel Roll No. 2018UIT2582 of B.E., Department of

Information Technology, under my guidance and supervision towards fulfillment of the

been submitted for any other diploma or degree of any university.

Dr Deepak Kumar Sharma

the priority that we give them by varying the learning rate.

classification is an autoencoder which classifies a data point as an anomaly if the

reconstruction error is higher than a threshold.

CHAPTER 1 INTRODUCTION AND BACKGROUND 1-18

1.1 STATEMENT OF OBJECTIVE 2

1.2 NEURAL NETWORK 2

1.2.1 FEED-FORWARD NEURAL NETWORKS 3

1.2.2 CONVOLUTIONAL NEURAL NETWORKS 4

1.2.3 RECURRENT NEURAL NETWORKS 6

1.4 FEDERATED LEARNING 10

1.4.1 LIFE OF A FEDERATED LEARNING MODEL 12

1.5 CYBER ATTACKS 14

1.5.1 BRUTE FORCE 14

1.5.2 DENIAL OF SERVICE 15

1.5.3 CROSS SITE SCRIPTING 16

1.5.4 SQL INJECTION 17

1.5.6 PORT SCAN 18

CHAPTER 2 RELATED WORK 19-23

3.1 LIBRARIES USED 25

3.2 PROPOSED ALGORITHM 27

3.2.1 CLOUD ALGORITHM 27

3.2.2 EDGE ALGORITHM 30

3.2.3 SETTING THRESHOLD 31

3.3 MATHEMATICAL JUSTIFICATION 33

3.4.1 ABOUT CIC-IDS-2017 DATASET 36

3.4.2 APPROACH FOLLOWED 37

CHAPTER 4 RESULTS AND DISCUSSION 39-40

CHAPTER 5 CONCLUSION AND FUTURE WORK 41-42

5.2 FUTURE WORK 41

Figure Caption Page No.

1.1 Architecture of a feed-forward deep neural network 3

1.3 Pooling layer of size 2x2 with stride 2 5

1.4 Convolutional Neural Network Architecture 5

1.5 Recurrent Neural Network Architecture 6

1.6 Architecture of an autoencoder with three hidden layers 7

1.7 Basic form of CAE architecture 8

1.8 Variational autoencoder architecture 9

1.9 Denoising autoencoder architecture 10

1.10 Visualization of steps of federated learning 12

1.11 Life of a Federated Learning Model 12

1.12 Representation of a Denial-of-Service Attack 15

1.13 Representation of a Distributed Denial-of-Service Attack 16

1.14 Representation of Cross Site Scripting attack 17

3.1 RMSE distribution for Friday Morning Working Hours after 32

3.2 Initial view of mathematical description presented 34

3.3 The model is guided by ‘good’ force 35

3.4 The model realizes its ‘mistake’ 35

4.1 Variation of trust of the four sources against epoch 39

rise of online marketplaces and services, the number of security dangers to