Professional Documents
Culture Documents
Research Papers
Literature Review
CIC-Darknet2020 Dataset
Machine Learning
Authors Technique Dataset/Features Results YEAR
Approach
Lashkari
Deep Learning CNN 23 Characterize darknet traffic 2020
et al[50]
7
1 2 3 4 5 6
Feature Selection Transfer Evaluation
Dataset Cleaning Identity oriented Dataset Split
Finding Learning of
Tabular Image
Correlations EfficientNetV2B0
Transformation
Methodology
Dataset Cleaning
Ports are non-deterministic and can be easily manipulated by malicious actors to appear
innocuous.
Source, Destination Ports:
After the initial connection, services may change ports, potentially confusing classifiers and
causing misclassification.
These IP addresses are artifacts of the dataset generation process and do not represent the
Source, Destination IP:
broader distribution of IP addresses on the internet. The focus is on classifying traffic patterns
rather than individual users.
The classifier, designed for tabular samples, does not benefit from clustering samples based on
Timestamp:
the proximity of their timestamps.
Features with zero values in every sample provide no discriminatory information for
Zero-Valued Features:
classification against similar traffic.
Methodology
Finding Correlations
we divide it into two distinct datasets based on their labels: the traffic dataset and
the application dataset. Our objective is to analyze the correlations within each
dataset. We extract the correlations between labels or features, and subsequently
utilize this information to generate visual representations in the form of images.
presents a correlation matrix heatmap that captures the relationships between
variables in the dataset, specifically focusing on the association with the traffic
type label and the application type label.
Methodology
Finding Correlations
correlation matrix heatmap with traffic type label(left) and with
application type label.
Methodology
Identity oriented Tabular Image
Transformation
10% validation
85% training 14439 images
120,966 images
5% test
7,076 images
Methodology
Transfer Learning of EfficientNetV2B0
Optimizer Adam
Batch Size 64
Epochs 20
It clearly demonstrates that the model did not suffer from overfitting, a phenomenon where the model
becomes too specialized to the training data and performs poorly on new, unseen data.
Evaluation
Accuracy
the training accuracy reached 97.23%, while the validation accuracy attained 97.16%. The close
proximity of the training and validation accuracies, with only a slight difference of 0.07%, test accuracy
of 97.03%. The test accuracy represents the model's performance on an independent dataset that was not
used during training or validation
Evaluation
Confusion Matrix
Choorod 2021 Packet DPI, J48, KNN, 16 features Detected Tor and non-Tor
et al. Supervised RF with 10 based on a payload with
[52] Learning folds 90% accuracy
Proposed 2024 Packet, flow Deep Learning EfficientNetV2B0 72 features Characterize darknet
method traffic)tor,nontor,vpn,
nonvpn)
with 97.03% accuracy
05
CONCLUSIONS
CONCLUSIONS