You are on page 1of 11

Abstract - The increase of illegal endeavors and malware traffic inside the Darknet presents a detracting

challenge to cybersecurity efforts. This project addresses this challenge by engaging leading machine
intelligence methods to diminish the misuse of the Darknet while continuing its allure obscurity and
security benefits. Leveraging the DarkNet Dataset, the research conducts binary and multi-class
classifications of Darknet network traffic. The reasoning exploits ultramodern algorithms, containing
Autoencoder, CNN + LSTM, and XGBoost, to distinguish the elaborate network action. Notably,
XGBoost manifests excellent performance across two together twofold and multi-class classifications,
revealing extreme veracity, accuracy, recall, and F1-score. Furthermore, the study longers allure study by
introducing ensemble forms in the way that the Voting Classifier and Stacking Classifier, improve
predicting veracity by joining diversified base estimators. By combining Autoencoder and XGBoost,
alongside investigating the CNN + LSTM architecture, the project augments model strength. These
hybrid approaches influence the substances of various algorithms, developing enhanced acting metrics
distinguished to individual orders. Through this inclusive approach, the project aims to determine a
nuanced understanding of Darknet traffic, concreting the habit of more bouncy protection measures on the
web. By enhancing the efficiency of Darknet traffic categorization, the project donates to fighting
unlawful exercises and malware traffic inside the Darknet atmosphere.
Index Terms – DarkNet Traffic, XGBoost, LSTM, Ensemble Learning, AutoEncoders, Voting Classifier, Stacking Classifier

INTRODUCTION:
In the always-progressing field of cybersecurity, the detection and classification of Darknet traffic stand
as difficult challenges. The Darknet is a difficult and elusive atmosphere that demands advanced
approaches for profitable study and mitigation on account of its clandestine nature and involvement
accompanying illegitimate activities. Traditional approaches to Darknet traffic study frequently struggle
to unravel allure complications, inciting the investigation of innovative methods to reinforce accuracy and
efficiency.
This research embarks on a met exploration, pursuing to address the challenges of Darknet traffic study
by leveraging the capacity of machine intelligence techniques. Specifically, the study first investigates the
use of Autoencoders, that are an leading type of artificial interconnected system. Autoencoders, familiar
for their ability to identify complex patterns and structures in datasets, present a feasible way toward
improving the accuracy and efficiency of Darknet traffic evaluation. Through the usage of Autoencoders'
deep-seated capacity to self-encrypt and decipher incoming data, investigators may find minute
alternatives from typical network act. This admits for the more precise and productive labeling of harmful
conduct.
The research investigates various models and equate which is bestowing best choice results, the research
pivots towards alternative machine intelligence models worthy capture the complexities of Darknet traffic
in a more excellent manner. Among these models, Convolutional Neural Networks (CNNs) progress into
a promising candidate on account of their ability to find features from both spatial and temporal
dimensions. Additionally, Long Short-Term Memory (LSTM) networks are working to process sequential
data and determine unending reliances between time steps. Combining the benefits of CNNs and LSTMs,
a hybrid CNN+LSTM network is grown to extract significant insights from the training data.
Furthermore, the study includes XGBoost, a dependable machine-learning method established on
gradient-boosting decision trees. Known for allure strength to improve machine-learning models,
XGBoost offers valuable observations into the underlying patterns inside Darknet traffic data.
In addition to investigating individual models, ensemble methods like the Voting Classifier and Stacking
Classifier are used to harness the collective predicting capacity of various classifiers. By leveraging the
outputs of various classification models, these ensemble forms enhance the overall accuracy and
dependability of Darknet traffic categorization.
Ultimately, this study aims to influence the ongoing works in fortifying cybersecurity by using a nuanced
approach that integrates the features of various machine learning methods. By untangling the
complications of Darknet traffic and bolstering defense systems against cyber-attacks, the research seeks
to improve the flexibility of cybersecurity infrastructure and safeguard against unlawful activities inside
the Darknet environment.
1 RELATED WORK
Muhammad Bilal Sarwar proposed a deep learning classifier based on convolutional neural networks and a
sophisticated, optimized machine learning algorithm for darknet traffic detection and categorization. The research is
conducted on the state-of-the-art dataset, which includes eight categories of network packets. At an average F-score
of 85%, XGB is the most accurate classifier.[1]

In paper [2], the authors built a framework for utilizing darknet traffic to identify threats. Machine Learning
algorithms created a pattern that can identify both known and undiscovered threats of the same type by extracting
the appropriate network traffic features of the threats from the darknet data.

The paper [3] presents a novel approach called DeepImage, which feeds a two-dimensional convolutional neural
network with the gray image created by selecting the most significant features to detect and describe darknet
activity. An 86% accurate darknet traffic characterization is achieved by merging two encrypted traffic datasets to
produce a darknet dataset for the evaluation of the suggested method[3].

To detect evidence of illicit activity on the darknet, the authors investigated the automatic classification of Tor
Darknet images using Semantic Attention Keypoint Filtering, a technique that combines saliency maps with a Bag of
Visual Words (BoVW) to filter non-significant features at the pixel level that do not belong to the object of interest.
Using dense SIFT descriptors, they assessed SAKF on a customized Tor picture dataset against CNN features,
including MobileNet v1, Resnet50, and BoVW[4].

In the paper[5], authors proposed a unique feed-forward back-propagation-based deep neural network detection
system to identify various application layer DDoS attacks effectively. The suggested neural network architecture
achieved an accuracy of 98%.

The authors of the paper [6] proposed a stacking ensemble learning classifier for the analysis and classification of
darknet traffic. This novel approach makes use of predictions generated by three base learning techniques to address
darknet attack issues. In the training and testing phases, the classifiers achieved accuracies of more than 99% and
97%, respectively.

The authors of the paper [7] proposed a community detection-based approach to make it easier to analyze massive
volumes of darknet data. The proposed system automatically detects and isolates vertical, focused, and horizontal
scans.

The authors of the paper[8] concluded that texts on the Darknet, whether legal or illicit, can be distinguished not
only by the words they include but also by the superficial syntactic structure in the distribution of POS tags.

Paper [9] proposed an efficient automated Darknet traffic detection system. The system achieved an accuracy of
99.5% with bagging decision tree ensemble.

Paper [10] proposed a DarkMor framework for detecting traffic on the dark web. Feature fusion model and traffic
perception model make up DarkMor's key components. It outperforms cross-modal algorithms, achieving an
accuracy of 97.78%.
The first between-class learning algorithm proposed in this paper[11] is named CDBC based on the Chebyshev
distance. They also presented a darknet detection framework based on CDBC. The study uses two datasets and
eleven different classifier types tested both in CDBC and non-CDBC contexts. The experimental findings
demonstrate that an accuracy of 99.99% can be achieved using CDBC.

The paper[12] provides an extensive analysis of the existing methods for classifying encrypted network traffic
within the darknet and anonymous traffic. This research also provides an in-depth analysis of darknet traffic
approaches that employ machine learning (ML) techniques to track and identify traffic attacks within the darknet.

To enhance the detection of darknet traffic, this study[13] evaluates the classification of such traffic and the
underlying application types using Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural
Networks (CNN), and Auxiliary-Classifier Generative Adversarial Networks (AC-GAN). Using the CIC-
Darknet2020 dataset, the authors find that the RF model performs better than the cutting-edge machine learning
methods utilized in previous work[13].

Within this context, the study [14] aims to evaluate the best prediction models through the analysis of dark web
traffic. The results demonstrated the significance of feature selection. The accuracy of the model can occasionally be
improved by selecting the appropriate features.

In [15] it is recommended that activation and loss functions that may be useful to defenders. By using the CICIDS
2017 dataset, the effect of these functions is evaluated with an SVM-RBF classifier.

1. PROPOSED MODELLING

In this work, our basic focus act feature collection from a dataset with almost 85 columns, trying to recognize the
subset of features that would provide most efficiently to reaching the highest categorization accuracy. To achieve
this, we used the Recursive Feature Elimination with Cross-Validation (RFECV) method in addition to the XGBoost
treasure. Through RFECV, we iteratively recognized and picked the most appropriate features in according to their
pay off to the model's performance, eventually diminishing the dimensionality of the dataset to a more controllable
size. After feature collection, we passed to analyze miscellaneous machine learning models to determine ultimate
acceptable approach for classifying the data. Our assessment included Autoencoder-CNN-LSTM, CNN-LSTM,
XGBoost, Voting Classifier, and Stacking Classifier.

A. Data-Set Collection
We made use of CIC Darknet 2020 dataset, incorporating approximately 140,000 rows and 85columns, and
is an excellent resource for cybersecurity study and analysis. It uses a two-layered approach to replicate
both benign and darknet traffic.
Origination of traffic occurs at the first layer. This layer generates both benign and darknet traffic. Benign
traffic usually represents usual network endeavors, while darknet traffic includes more doubtful or
vindictive actions.
The following figure shows the number of samples of both benign and Darknet traffic at first layer as well
as the number of encrypted flows in our Darknet traffic.
Figure 1 Number of samples of both benign and Darknet traffic.

Figure 2 Number of encrypted flows in Darknet traffic.

Darknet traffic is further classification into miscellaneous types, each delineating different applications or
communication protocols.

Table 1 Details of Darknet Network Traffic


Each of these classifications simulates specific network behaviors and communication protocols, admitting
investigators and cybersecurity experts to study and analyze them separate or together.

B. Normalization

Normalization is implemented using the Min-Max scaling method, which adjusts the attributes in the range of 0 and
1 by leveraging minimum and maximum values of each feature.

B. Feature Selection

Recursive Feature Elimination with Cross-Validation (RFECV) is used for Feature Selection. It continuously
removes the features from the dataset and selects the optimum subspace of features.

The initial step in RFECV is to train a machine learning model on the entire feature set and rank the significance of
each feature. Then, the least significant features are detached, and the model is retrained on the diminished set. Until
the optimal subset of features is obtained, the process is repeated recursively.

Cross-validation is employed all along each redundancy of feature elimination to assess the model's performance on
various subsets of the data, guaranteeing that the feature selection process is robust and not relying on a single train-
test split.

RFECV is used accompanying a 10-fold cross-validation strategy utilizing an XGBoost classifier as the estimator.
XGBoost supports a means to reckon feature importance established by how often features are utilized in making
decisions across all the ensemble’s trees. By evaluating the significance of each feature, RFECV can determine the
features that boost the model’s predictive performance and the features that may be detached without considerably
affecting the model’s performance.

The number of features chosen by RFECV is decided based on accuracy and other performance metrics. Eventually,
the chosen feature subset is utilized for analysis and modeling.

METHODOLOGY

An autoencoder accompanying a CNN-LSTM frame is utilized for anomaly detection. The choice of utilizing a
CNN-LSTM algorithm for the autoencoder is motivated because of the characteristics of input data. This
architecture is appropriate for processing the sequential data accompanying both geographical and temporal ranges,
which is common in many actual-realm applications such as time-series data. The encoder’s CNN layers can
efficiently capture spatial dependencies in the input data, while the decoder’s LSTM layers can model temporal
dependencies over time. Despite CNN-LSTM architecture is a powerful tool for apprehending complex patterns in
sequential data, the best alternative will depend on the details of the dataset and the issue within reach.
Experimentation with various architectures and hyperparameters is necessary to decide the optimum model for a
given task.

The encoder part of the autoencoder is a collection of various thick layers accompanying various activation
functions along with 'tanh' and 'relu'. These layers to a greater extent reduce the dimensionality of the input data, by
compressing it into a latent representation accompanying fewer dimensions. The encoder layers evenly extract
features from the input data and capture its latent patterns. Batch normalization is used after every dense layer to
standardize next-layer inputs for the stable training process. The encoder design culminates in a layer accompanying
7 units, representing the compressed latent space. This layer serves as the bottleneck of the autoencoder, urging it to
gain a compressed representation of input data.

The decoder part of the autoencoder aims to reconstruct the original input data from the compressed latent
representation of the encoder. The decoder layers mirror the construction of the encoder layers back-order. Each
thick layer in the decoder attempts to piece together the original from the compressed form. By learning to
reconstruct the input data, the autoencoder is incentivized to capture ultimate salient features from the data while
discarding unnecessary information.

The important applications of autoencoders include unsupervised learning tasks like anomaly detection and
dimensionality reduction. They aim to reconstruct input data from compressed latent space by learning to represent
the most significant features of the data.

CNNs can capture spatial patterns from sequential data whereas LSTMs are good at modelling temporal
relationships. The CNN-LSTM model can impose the benefits of both methods by combining these two designs.
CNNs are used in completing tasks that involve image or sequential data because of their ability to extract features
and understand the underlying patterns of input data.

LSTMs, on the other hand, are worthy of identifying temporal patterns and long-range dependencies, making them
appropriate for subsequent data processing tasks where the input’s order matters. When the input data has both
spatial and temporal relationships, the CNN-LSTM architecture executes unusually well.

The CNN-LSTM model begins with an order of convolutional layers (Conv1D) that extract appropriate features
from the input data. These convolutional layers are good at identifying spatial patterns in the sequential data. The
feature maps acquired from the convolutional layers are then down sampled using max-pooling layers
(MaxPooling1D), which lowers their dimensionality while keeping crucial information. LSTM layers model
temporal dependencies in the data after convolutional layers. Sequential data analysis is a good fit for LSTM units
because of their capacity to remember information over long sequences and to grasp long-range dependencies. Over
time, the model can learn intricate patterns and relationships because of the processing of the temporal components
of the data by the LSTM layers. Finally, classification based on the collected characteristics is done using fully
connected layers (Dense) with dropout regularization.

Figure 3 CNN+LSTM plot of accuracy vs epoch


Figure 4 CNN+LSTM Plot of loss vs epoch

One ensemble learning technique for classification tasks is the Voting Classifier. It forecasts which class will obtain
the most votes (the mode) or which class will have the highest average probability (soft voting) among all the
classifiers by combining the predictions of several different classifiers. The Random Forest Classifier and Decision
Tree Classifier are the two foundation classifiers whose predictions are combined in this study to create the Voting
Classifier. The bagging method of group learning is the foundation of Random Forest classifier. When it is being
trained, it builds several decision trees and produces the average prediction (regression) or the mode of the classes
(classification) of each tree.

Figure 5 Workflow Diagram of proposed methodology


The Random Forest Classifier is a well-known algorithm for complicated classification jobs because of its resilience
and capacity to handle high-dimensional data with plentiful characteristics. Decision trees, which divide the feature
space into smaller sections recursively according to feature values, are straightforward yet effective classification
algorithms. Their capacity to grasp intricate relationships within the data and their interpretability are well-known
attributes. Decision trees, particularly deep trees, are vulnerable to overfitting, nevertheless. With the Decision Tree
Classifier, the ensemble is more diverse in the Voting Classifier and functions as a complementary model to the
Random Forest Classifier.

The final prediction made by the Voting Classifier is either the weighted average of probabilities (soft voting) or the
mode of class labels (hard voting), which incorporates the predictions of these two classifiers. In this instance, the
decision to use "soft" voting signifies that the probability scores produced by each base classifier will serve as the
foundation for the final prediction. Due to their complementing strengths, these two models were selected for the
Voting Classifier: Though Decision Tree Classifier captures greater details and could be a supplement to the
ensemble due to its capacity to capture complicated decision boundaries, Random Forest Classifier offers stability
and resilience through its ensemble approach. The Voting Classifier by combining these models tries to reduce the
overfitting issue and boost the generalization efficiency.

The Stacking Classifier, an effective ensemble model, to boost the predictive performance combines the predictions
of several base classifiers. Although its benefits include adaptability, model variety, and meta-knowledge, whether it
is the best choice depends on the dataset's features. RandomForest is a powerful ensemble learning method based on
decision trees. It is known for its ability to manage complex data interactions. It is one of the basic fundamental
models as it does well in categorization tasks and is less likely to overfit. A Multi-Layer Perceptron classifier is a
type of artificial neural network that is represented by the MLPClassifier. Neural networks accompanying deep
designs like MLP exceptionally, can discover difficult connections and understand complex patterns in data. The
MLPClassifier is included as a basis model to provide a different analysis of data and recognize nonlinearities that
other models might miss.

While the RandomForest method aims at partitioning feature space using decision rules, MLPClassifier is an
ensemble method that learns complex hierarchical representations of data. The Stacking Classifier by merging these
two different patterns can recognize various patterns and relationships in the input data.

1. RESULTS AND DISCUSSIONS


The efficiency of each model is assessed using metrics like accuracy, precision, F1-score and recall.

Table 2 Comparison Table


Accuracy: The fraction of the model’s total correct predictions is called accuracy. Usually, it is expressed
in %.
Accuracy = (TP+TN)/(TP+TN+FP+FN)
Where TP stands for True Positives, TN for True Negatives, FP for False Positives, and FN for False
Negatives.
Figure 6 Accuracy Comparison
Precision: It is the fraction of positive instances that are actually occur as predicted. This metric is used
when the false positive rate is high.
Precision = TP/(TP+FP)

Figure 7 Precision Comparison


Recall: It measures the capacity of a model to recognize all positive samples right. A extreme recall
signifies that the model is adept at recognizing positive samples, while a reduced recall implies that the
model is missing some certain samples.
Recall = TP/(TP+FN)

Figure 8 Recall Comparison


F1 Score: The F1 Score is used for binary categorization tasks, and is the harmonious mean of recall and
precision. It considers both recall and precision for a balanced assessment of the model’s performance.
F1 Score = 2* (Precision * Recall)/(Precision + Recall)

Conclusion
The proposed system presents a progressive architecture addressing the difficult challenges raised due to
illegal web activities for Darknet traffic categorization. By combining binary and multi-class
classifications and using advanced algorithms like Autoencoder, CNN + LSTM, and XGBoost, the system
achieves an inclusive description of complicated network traffic. The introduction of ensemble techniques
like the Voting Classifier and Stacking Classifier reinforces predicting accuracy, while the combination of
Autoencoder and XGBoost, alongside the investigation of CNN + LSTM, improves model robustness. In
addition to the advancements in the Darknet traffic field, this hybrid approach also offers a better
understanding of network dynamics. The system's potential to precede more bouncy protection measures
signifies an important step towards reducing the misuse of Darknet actions in the ever-developing
cybersecurity field.
The future scope of this research lies in improving the current proposed Darknet traffic classification
scheme. Exploration of arising deep learning architectures, continuous dataset advancement to
progressing Darknet techniques could improve accuracy. A proactive defense against Darknet actions
could be achieved by integrating threat information and collaborating with cybersecurity organizations.
Additionally, the system's tolerance to evolving cyber threats could be fortified by using decentralized
techniques and also by integrating anomaly detection techniques.

REFERENCES
[1] M. B. Sarwar, M. K. Hanif, R. Talib, M. Younas and M. U. Sarwar, "DarkDetect: Darknet Traffic Detection and
Categorization Using Modified Convolution-Long Short-Term Memory," in IEEE Access, vol. 9, pp. 113705-
113713, 2021, doi: 10.1109/ACCESS.2021.3105000.

keywords: {Feature extraction;Deep learning;Monitoring;Drugs;Random forests;Visualization;Virtual private


networks;Characterization;darknet traffic;deep learning;encrypted traffic;machine learning;Tor;VPN},

[2] S. Kumar, H. Vranken, J. v. Dijk and T. Hamalainen, "Deep in the Dark: A Novel Threat Detection System using
Darknet Traffic," 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp.
4273-4279, doi: 10.1109/BigData47090.2019.9006374. keywords: {Feature extraction;Computer
crime;Feeds;Monitoring;Training;Machine learning;Botnet;Darknet traffic;DDoS;Machine Learning;Threat
Detection;Network Telescope},

[3] Habibi Lashkari, Arash, Gurdip Kaur and Abir Rahali. “DIDarknet: A Contemporary Approach to Detect and
Characterize the Darknet Traffic using Deep Image Learning.” Proceedings of the 2020 10th International
Conference on Communication and Network Security (2020): n. pag.

[4] https://doi.org/10.48550/arXiv.2005.10086
[5] Muhammad Asad, Muhammad Asim, Talha Javed, Mirza O Beg, Hasan Mujtaba, Sohail Abbas, DeepDetect:
Detection of Distributed Denial of Service Attacks Using Deep Learning, The Computer Journal, Volume 63, Issue
7, July 2020, Pages 983–994, https://doi.org/10.1093/comjnl/bxz064

[6] Almomani, A. Darknet traffic analysis, and classification system based on modified stacking ensemble learning
algorithms. Inf Syst E-Bus Manage (2023). https://doi.org/10.1007/s10257-023-00626-2

[7] F. Soro, M. Allegretta, M. Mellia, I. Drago and L. M. Bertholdo, "Sensing the Noise: Uncovering Communities
in Darknet Traffic," 2020 Mediterranean Communication and Computer Networking Conference (MedComNet),
Arona, Italy, 2020, pp. 1-8, doi: 10.1109/MedComNet49392.2020.9191555. keywords: {IP networks;Botnet;Data
mining;Bipartite graph;Malware;Detection algorithms;Security;graph analysis;community detection},

[8] Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, and Omri Abend. 2019. The Language of Legal
and Illegal Activity on the Darknet. In Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics, pages 4271–4279, Florence, Italy. Association for Computational Linguistics.

[9] Abu Al-Haija Q, Krichen M, Abu Elhaija W. Machine-Learning-Based Darknet Traffic Detection System for IoT
Applications. Electronics. 2022; 11(4):556. https://doi.org/10.3390/electronics11040556.

[10] Jin, Yang and Liang, Weiheng and Wang, Xin and Li, Siyu and Jiang, Xinyun and Mu, Yufei and Zeng,
Shunyang, Darkmor: A Framework for Dark Web Traffic Detection that Integrates Local Feature and Spatial
Characteristics. Available at SSRN: https://ssrn.com/abstract=4648074 or http://dx.doi.org/10.2139/ssrn.4648074

[11] Song B, Chang Y, Liao M, Wang Y, Chen J, Wang N. CDBC: A novel data enhancement method based on
improved between-class learning for darknet detection. Math Biosci Eng. 2023 Jul 12;20(8):14959-14977. doi:
10.3934/mbe.2023670. PMID: 37679167.

[12] arXiv:2311.16276 [cs.CR]

[13] arXiv:2206.06371 [cs.LG]

[14] Allhusen, Andrew and Alsmadi, Izzat and Wahbeh, Abdullah and Al-Ramahi, Mohammad and Al-Omari,
Ahmad, Dark Web Analytics : A Comparative Study of Feature Selection and Prediction Algorithms (October 25,
2021). Available at SSRN: https://ssrn.com/abstract=3949786 or http://dx.doi.org/10.2139/ssrn.3949786

[15] N. Nirmalajyothi K. G. Rao, B. Bobba, K. Swathi, “Performance Analysis of Different Activation and Loss
Functions of Stacked Autoencoder for Dimension Reduction for NIDS on Cloud Environment,” International
Journal of Engineering Trends and Technology, vol. 69(4), pp. 169-176.

[16] Narisetty, Nirmalajyothi, et al. "Hybrid Intrusion Detection Method Based


on Constraints Optimized SAE and Grid Search Based SVM-RBF on
Cloud." International Journal of Computer Networks and Applications 8.6
(2021): 776-787.

You might also like