You are on page 1of 8

Efficient Network Traffic Analysis using Machine Learning and Data Mining:

Abstract:
Due to recent improvements in smart devices, we are now seeing an explosion in the amount of
data being created, as well as a heterogeneity that makes it necessary to create new network
solutions to better understand and analyse traffic. These systems need to be smart and scalable so
that they can automatically handle the huge amounts of data. As high-performance computing
(HPC) keeps getting better, machine learning (ML) is becoming a more practical and easy way to
solve hard problems. ML has been proven to be effective in a number of different fields. At the
same time, both business and academic circles have shown a lot of interest in network slicing
(NS). This is because it is important to meet a wide range of service needs. Because of this, the
use of machine learning in NS management is an interesting subject. In this way, I've been
focusing on network data analysis with the goal of dividing the network into slices based on how
traffic flows.

The K-means clustering method is used to better understand and separate the ways that traffic
acts. The results showed that there was a strong connection between cases that were put into the
same cluster by unsupervised learning. With the help of network function virtualization, this
technology can be used more in a real-world setting.

Introduction:

As smart devices become more common, networks are becoming more dynamic and diverse than
ever before. So, network operators are rushing to come up with new ways to run their networks.
Because of this, it is very hard to build a network architecture that can handle the different types
of devices and make the best use of resources.[1]

The study and prediction of network traffic can be used in a wide range of fields, and more and
more research is being done on it recently. Different kinds of experiments are done, and the
results are looked at to find problems with computer network applications that are already in use.
Analysis and forecasting of network traffic is a preventive step that can be taken to make sure
that communication over a network is kept private, reliable, and of high quality. Several
methods, such as those based on neural networks and those based on data mining, have been
proposed and are currently being tested to see how well they work. In a similar way, other

1|Page
models, both linear and nonlinear, have been proposed to predict network traffic. Different and
interesting combinations of network analysis and prediction algorithms have been used to get
results that are both useful and efficient. [2]

No longer is the ability of modern Networks-on-Chip (NoC) to grow in size the only thing that is
used to judge them. Instead, they are judged by how well they can deliver high performance,
Quality-of-Service (QoS), and flow separation at the lowest possible cost. Even though
traditional architectures that support Virtual Channels (VC) have tools for flow partitioning and
isolation, it is still possible for an adversarial workload to interfere with and slow down the
performance of other workloads. This is because adversarial workloads run in a different set of
VCs than other workloads. This is because a VC has a chance of failing if an opponent puts a lot
of effort into stopping it. [3]

A multi-layered satellite network, also called an MLSN, seems like it would be a good way to
make broadband communication available in every part of the world. To make the best use of the
many network resources available to the MLSN, it is important to make sure that traffic is spread
out evenly across all of the satellite levels. A traffic distribution model is used in the suggested
way to navigate to figure out how to spread the load across the network in the best way. This
model is based on an estimate of the network's capacity as well as a theoretical study of how
often each tier gets backed up. With this model, which was made for that purpose, the load-
balancing plan for the suggested routing approach was worked out. Using computer simulations,
a lot of time has been spent figuring out how well the recommended routing method works. The
results of these simulations have shown that our model of how traffic moves through the MLSN
is accurate enough to show how it does so. [4]

Architecture:

Network-slicing based architecture:

Network slicing is one of the most common ways to make better use of the resources that are
already there. It has been suggested as an alternative to the "one size fits all" principle, with the
goal of using the same physical network infrastructure for multiple services to meet their
different needs. Because of this, the idea of network slicing leads to lower costs for both capital
expenses and operational expenses.[3]

2|Page
The International Telecommunications Union says that network slicing is "considered to be a key
enabler for a wide range of services" (ITU). It can handle multiple tenants and offers
customizable slices for different services, which are two of its many benefits. Each service has
different needs for quality of service and resources. [3]

Figure 1 Network slicing using behavior clustering

K-means

The K-means algorithm is one of the first and most widely used methods for partitioning data. It
is well recognized for the effectiveness with which it clusters vast amounts of data sets. Its goal
is to group the data into K clusters according to some metric of their similarity to one another. In
other words, there is a high degree of similarity between the instances that belong to a certain
group and those that belong to different clusters. Every cluster has a category center denoted by
the letter k. The Euclidean metric was chosen to serve as the similarity criteria in this study. [4]

3|Page
Figure 2 k value vs Davies Bouldin Score

Figure 3 k value vs silhouette coefficient

4|Page
Decision Tree Algorithm:

The decision tree algorithm is a member of the big family of supervised machine learning
algorithms. In some situations, it can be used both to classify data and to find out how it changed
over time. The goal of this algorithm is to make a model that can predict the value of a target
variable with high accuracy. To do this, the decision tree will use the tree representation to figure
out how to solve the problem. The leaf node of the tree will be mapped to a class label, and the
other nodes will be used to map attributes.[5]

Figure 4 Decision Tree Algorithm

SVM (Support Vector Machine) Algorithm:

Support Vector Machine is an approach to supervised machine learning that may be used for
classification as well as regression analysis. SVM is an abbreviation for "Support Vector
Machine." In spite of the fact that we also discuss the issues associated with regression, its use is
more suitable for classification. Finding a hyperplane in an N-dimensional space that can
categorise the data points in a different manner is the objective of the Support Vector Machine
(SVM) approach. This aim may be stated as the following: The amount of characteristics that are
being utilised will determine which dimension of the hyperplane will be used. When there are
just two input features, the hyperplane is nothing more than a straight line. When three attributes
are utilised as input, the hyperplane transforms into a plane that only exists in two dimensions. It

5|Page
is more difficult to picture what the full thing will look like when there are more than three
components to it. [6]

Figure 5 Support Vector Machine Algorithm

Index of Davies and Boudin

The fundamental idea of validity clustering is to either minimize the distance between
individuals inside a cluster or maximize the distance between clusters. In this work, the Davies-
Bouldin Index (DBI) was used to determine the optimal number of clusters as well as assess the
distance between clusters and within clusters.

Silhouette coefficient

The silhouette coefficient (SC) is a metric that determines how cohesive an item is in comparison
to other clusters based on how similar it is to its own cluster (separation). It determines the intra-
cluster distance as well as the distance to the closest cluster for each item. In contrast to DBI, the
quality of the clustering result is directly correlated to how near SC is to 1. To put it another way,
SC might be anywhere from -1 to 1. If the coefficient is less than one, it indicates that the

6|Page
grouping is incorrect; if it is equal to zero, it indicates that the clusters are comparable; if it is
equal to one, it indicates that the clusters are geographically far from one another.

Conclusion:

This research was done to show that machine learning and traffic analysis can work together,
especially to define network slicing. This study shows how useful machine learning can be for
analysing traffic patterns and, as a result, for making intelligent network slices.

Using the chosen attributes, an experiment was done to look at these groups, which are also
called "future slices," in order to find useful behaviours. This has been done to make operations
like network slicing and managing resources easier. These traits are put into three groups:
features that have to do with applications, features that have to do with time, and features that
have to do with bandwidth.

7|Page
References:

[1] X. Shen et al., "AI-assisted network-slicing based next-generation wireless networks,"


IEEE Open Journal of Vehicular Technology, vol. 1, pp. 45-66, 2020.
[2] M. Joshi and T. H. Hadi, "A review of network traffic analysis and prediction
techniques," arXiv preprint arXiv:1507.05722, 2015.
[3] O. Aouedi, K. Piamrat, S. Hamma, and J. K. Perera, "Network Traffic Analysis using
Machine Learning: an unsupervised approach to understand and slice your network,"
Annals of Telecommunications, pp. 1-13, 2021.

[4] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM computing
surveys (CSUR), vol. 31, no. 3, pp. 264-323, 1999.
[5] Decision tree algorithm for classification: Machine learning 101. (2021, February 25).
Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/02/machine-learning-
101-decision-tree-algorithm-for-classification/
[6] Support vector machine algorithm. (2021, January 20). GeeksforGeeks.
https://www.geeksforgeeks.org/support-vector-machine-algorithm/

8|Page

You might also like