You are on page 1of 34

Darknet Traffic Detection

System Based on Deep


Learning to Counter
Cyberattacks
CONTENTS OF THIS TEMPLATE
INTRODUCTION
01
Literature Review
02
Methodology
03
Implementation
04
Results and
Conclusions
05
01
INTRODUCTIO
N
1. What is the Darknet?
2. Why is it important to detect Darknet traffic?
3. How does deep learning assist in detecting Darknet
traffic?
Darknet
DarkNet is an encrypted network technology that offers
internet users with anonymity. Only specific network
settings and tools can be used to access internet material .
To ensure user anonymity, the darknet employs privacy
networks and peer-to-peer connections such as Virtual
Private Network (VPN) and The Onion Router (TOR)
Introduction
Why is it important to detect Darknet traffic?
Darknet is largely used for illicit activities such as
acquiring illegal narcotics, cybercrime, terrorism.
The surface web accounts for around 4%-5% of
internet space and contains billions of static web
pages. The Deep Web contains material that is not
accessible via search engines. The Deep Web
accounts for 95%-96% of all data on the Internet. In
addition, deep web material is 500-550 times
greater than surface web content . According to a
claim, data from over 50 million Facebook users
accessed on the Darknet was auctioned out in
September 2018 for a bitcoin cost of $3 USD per
user's data.
Introduction
How Darknet Works?
the onion protocol for routing encapsulates the information that
needs to be delivered in multiple stages of cryptography that
resemble the layers of an onion . The information is subsequently
sent through a series of onion routers, or nodes in the network,
every one of that "peels" away (or decrypts) just one layer of
cryptography to disclose the data's final location. When the final
layer is decoded, the information are sent to their target. The
sender is kept secret because each intermediary only has
knowledge of the locations of the node that are shortly before and
after it
Introduction
How does deep learning assist in detecting Darknet traffic?
• Effective techniques must be developed in order to construct an effective model to detect
attack patterns based on internet traffic data, and such a model should follow the attacker's
changing behavior in an automated manner in order to minimize risks.
• Deep learning may also be applied on internet traffic data due to its auto-learning capabilities
to categorize network traffic (either conventional or darknet traffic)
Introduction
How does deep learning assist in detecting Darknet traffic?
Introduction
How does deep learning assist in detecting Darknet traffic?
Introduction
This research highlights the importance of
developing a Darknet traffic detection system
for various industries and organizations,
particularly government agencies, to enhance
their capabilities in monitoring and preventing The research introduces a new method for
cyber threats. transforming tabular data into images,
enabling visualization and analysis of
correlations and patterns in the Darknet
traffic dataset. The Identity-oriented
Tabular Image Transformation prioritizes
correlated features and accurately classifies
traffic into VPN, Tor, non-VPN, and non-
Tor types by using transfer learning of deep
learning model Called EfficientNetV2B0
Literature Review
02 Dataset

Research Papers
Literature Review
CIC-Darknet2020 Dataset

Each CIC-Darknet2020 sample is made These type categories include


158,659 samples The surfing, chat, email, file-transfer,
up of traffic characteristics that have
names for the top-level P2P, video-streaming, and VOIP in
been obtained in this way from sessions
traffic categories are addition to audio- and video-
of raw traffic capture of packets
Tor, non-Tor, VPN, and streaming
non-VPN
Literature Review
CIC-Darknet2020 Dataset
Literature Review
Research Papers

Machine Learning
Authors Technique Dataset/Features Results YEAR
Approach

Lashkari
Deep Learning CNN 23 Characterize darknet traffic 2020
et al[50]

Alimora Detector classifies Tor-


Deep Learning DNN 79 2022
di et al. VPN traffic into 4 classes

Detected Tor and non-Tor

Choorod J48, KNN, based on a payload with 90%


Supervised 16 accuracy 2021
et al. RF with 10
03
Methodology
Methodology

7
1 2 3 4 5 6
Feature Selection Transfer Evaluation
Dataset Cleaning Identity oriented Dataset Split
Finding Learning of
Tabular Image
Correlations EfficientNetV2B0
Transformation
Methodology
Dataset Cleaning

To ensure the integrity and


reliability of our dataset, a
thorough cleaning process was
employed. One critical aspect
involved the removal of entries
containing NaN (Not a Number)
and Inf (Infinity) values.
Methodology
Feature Selection
The elimination of features is a crucial step in enhancing the efficiency and
interpretability of our classification model.
We summarize the features that have been eliminated and the corresponding
justifications:
Methodology
Feature Selection
Feature Reasons for Elimination

Ports are non-deterministic and can be easily manipulated by malicious actors to appear
innocuous.
Source, Destination Ports:
After the initial connection, services may change ports, potentially confusing classifiers and
causing misclassification.

The feature is in the form of (Source IP)-(Destination IP)-(Source Port)-(Destination Port)-


(Protocol), providing duplicate information. This redundancy does not contribute to the
Flow-id
classification task.

These IP addresses are artifacts of the dataset generation process and do not represent the
Source, Destination IP:
broader distribution of IP addresses on the internet. The focus is on classifying traffic patterns
rather than individual users.
The classifier, designed for tabular samples, does not benefit from clustering samples based on
Timestamp:
the proximity of their timestamps.

Features with zero values in every sample provide no discriminatory information for
Zero-Valued Features:
classification against similar traffic.
Methodology
Finding Correlations

we divide it into two distinct datasets based on their labels: the traffic dataset and
the application dataset. Our objective is to analyze the correlations within each
dataset. We extract the correlations between labels or features, and subsequently
utilize this information to generate visual representations in the form of images.
presents a correlation matrix heatmap that captures the relationships between
variables in the dataset, specifically focusing on the association with the traffic
type label and the application type label.
Methodology
Finding Correlations
correlation matrix heatmap with traffic type label(left) and with
application type label.
Methodology
Identity oriented Tabular Image
Transformation

Identity-oriented Tabular Image


Transformation is a novel approach that
converts tabular data into image
representations.
Methodology
Identity oriented Tabular Image
Transformation
1. The function first constructs the row and column
vectors by padding them with the identity value
according to the specified padding. The row vector
is obtained by taking the values from the
DataFrame that correspond to the row ordering, and
the column vector is obtained in a similar way
using the column ordering.
2. Next, a product function that applies the operation
and activation functions to the input values, and
clamps the result to the interval [0, 1]. This product
function is then used to create a composed outer
function.
3. Finally, the composed outer function is applied to
the row and column vectors producing a matrix
representing the image.
Methodology
Dataset Split

10% validation
85% training 14439 images
120,966 images

5% test
7,076 images
Methodology
Transfer Learning of EfficientNetV2B0

the EfficientNetV2B0 model is initialized with


pre-trained weights obtained from the
ImageNet dataset. the final fully-connected
layers of the base model are excluded,
allowing for customization for classification.
a fully-connected layer is appended to the end
of the model. This layer employs the softmax
activation function to produce predictions. The
number of units in this layer is determined by
selected Classes
1. Tor
2. Nontor
3. vpn
4. nonvpn
Methodology
Transfer Learning of EfficientNetV2B0

Optimizer Adam

Batch Size 64

Epochs 20

Learning Rate 0.001

Loss Function Categorical Crossentropy


Evaluation
Loss

It clearly demonstrates that the model did not suffer from overfitting, a phenomenon where the model
becomes too specialized to the training data and performs poorly on new, unseen data.
Evaluation
Accuracy
the training accuracy reached 97.23%, while the validation accuracy attained 97.16%. The close
proximity of the training and validation accuracies, with only a slight difference of 0.07%, test accuracy
of 97.03%. The test accuracy represents the model's performance on an independent dataset that was not
used during training or validation
Evaluation
Confusion Matrix

It is noteworthy that out of a total of 4666


samples belonging to the "non-tor" category,
only around 20 samples were misclassified. On
the other hand, among the 1146 samples
belonging to the VPN class, 1042 were
correctly classified, while 104 were incorrectly
classified. These results indicate that the model
effectively identified samples from the VPN
class, with a high number of accurate
classifications. In total, there were only 266
misclassified samples.
Evaluation
Classification Report
Comparison with other Papers
Authors Year Traffic Machine Technique Features Output Features
Attributes Learning
Approaches
Lashkari 2020 Flow, Supervised CNN 23 features Characterize darknet
et al. Packet traffic
[50] with 86% accuracy
Alimora 2022 Flow, Deep DNN 79 features Detector classifies Tor-
di et al. Packet Learning VPN traffic into 4 classes
[51] with an accuracy of 96%

Choorod 2021 Packet DPI, J48, KNN, 16 features Detected Tor and non-Tor
et al. Supervised RF with 10 based on a payload with
[52] Learning folds 90% accuracy

Proposed 2024 Packet, flow Deep Learning EfficientNetV2B0 72 features Characterize darknet
method traffic)tor,nontor,vpn,
nonvpn)
with 97.03% accuracy
05
CONCLUSIONS
CONCLUSIONS

● The model's high accuracy, low misclassification rate, and successful


classification of various classes highlight its strong performanceYou
can list your conclusions about this topic here
● ,the approved method also showcases impressive testing accuracy
97.03%. You can list your conclusions about this topic here
● the implementation of the Identity-oriented Tabular Image
Transformation technique for converting cicdarkent to images,
combined with the use of the EfficientNetV2B0 model, has proven
effective in achieving high accuracy without overfitting.
THANKS!
Does anyone have
any questions?

You might also like