Network Traffic Identification Using Deep Learning: Anand G. Jitendra K. Darren L. Gokulnath M

Network Traffic Identification using Deep Learning
Anand G.1 Jitendra K.1 Darren L.1 Gokulnath M.1

1 Guide:
Prof. (Dr) Narendra Shekokar
Department of Computer Engineering
Dwarkadas J. Sanghvi College of Engineering
April 22, 2019
Anand G., Jitendra K., Darren L., Gokulnath Network

M. Traffic Identification using Deep Learning April 22, 2019 1 / 18
Acknowledgement
First and foremost, we would like to thank our guide Prof. (Dr.)
Narendra Shekokar for his timely guidance and suggestions, without
which this project would not have seen completion.

Outline
Introduction and problem statement

Shortcomings of current methods
Proposed Method
Examples
Datasets
PCAP files used
Architecture
Results
Summary

Introduction
Recent times have shown massive surge in reliance on Internet for

providing various services. For this, it becomes important to know
what type of data is being sent on the network to both the ISPs and
the clients using it.
Traffic classification refers to to the task of identifying the types
and quanties of packets flowing across the network.
Given a network flow, we have to determine how much of the data
belongs to which classes. Example of classes include HTTPS, SMTP,
VoIP, VPN, non-VPN etc.

Shortcomings of current methods
A naive method to solve this problem would be applying port based

classification methods. However, this simple approach has become
less effective as applications have found ways to change their port
number.
Signature based methods involve matching a sequence of bytes or
properties to a known signature. However, this works only for known
protocols thus limiting its use.
Another approach would be deep packet inspection. This has its
own issues - breach of privacy, high time consumption, failure on
encrypted data.
As malicious content often tries to obscure its intent, it is more
difficult to analyse their intent without performing a thorough
statisitical analysis.

Proposed Method
Deep learning addresses the issues faced by the other methods by

not relying on the actual contents, but by replacing handcrafted
features with efficient algorithms for unsupervised or
semisupervised feature learning and hierarchical feature
extraction
This would mainly involve only tuning the hyper-parameters of the
neural network with respect to the problem of traffic classification.
The parameters are: number of layers, number of nodes in the
layer, padding style, input length(in case of text input) etc.
One way to approach this problem would be by evaluating what tasks
neural networks are good at, and whether the data can be represented
in that format.

Literature Survey

Examples:
Here, the PCAP data was transformed into an image which was fed into
the neural network:
Figure: CNN - Convolutional Neural Network

Datasets
Since deep learning requires large amounts of data, we have procured

the dataset- VPN-nonVPN dataset
The VPN-nonVPN dataset, obtained from Canadian Institute of
Cybersecurity, contains data collected in a regular session and a
session over VPN, It has 14 traffic categories: VoIP, VPN-VoIP,
P2P, VPN-P2P, etc.
Out of this, we chose certain PCAP files belonging to a few classes
that we tried running this model on.

PCAP files used
The model we built was trained to classify the following data:

Skype Audio
YouTube
Email
Skype Chat
SCP (Download)
Vimeo
Facebook Audio

Preprocessing
We hex dump the .pcap file. This results in an array of hex strings,
each string containing data of a single packet.
Then, we club a pair of consecutive characters to form a byte value(8
bits).
Each byte is interpreted as a pixel value in gray scale.
We consider only the first 784 bytes or 1568 characters to form a
28x28 image.
If packet data is less than 784 bytes, pad the image with 0 bytes to
complete.

Architecture
The images are of size 28*28, which is fed into the neural network. The
architecture is as follows:
First Convolution Layer, with 32 filters. Generates 32 convoluted
images of the same dimension.
First Max Pool Layer, which reduces the dimension to 14*14.
Second Convolution Layer, with 64 filters. Generate 64 convoluted
images of the same dimension.
Second Max Pool layer, which reduces the dimension to 7*7.
Fully Connected Layer, with 256 nodes.
Two LSTM layers.

Visualisation
Figure: CNN+LSTM

Results
We ran the model for 10 epochs on Google Colab, using a GPU.

We were able to achieve an accuracy of 90 % on the training set and
a validation accuracy of 86 %.
The weighted average of precison and recall was 87.66 % and 86.13
%.
The weighted average of the F1 score was 82.39 %

Results - continued
Figure: Classification Report Figure: Validation Accuracy

Future Work
Make use of weighted classes.

Make use of vanilla RNNs to prevent over fitting.
Develop/Experiment with architectures to generalize based on data.
Use pre-trained models like Inception, VGG16 and fine tune it to our
use case.

Summary
Current methods used for traffic classification are not efficient due to
various reasons mentioned above.
These shortcomings can be overcome by deep learning.
We propose a model based on a combination of CNN and RNN in
order to exploit both the spatial and temporal features present in
network flow.

Related Works I
T. Auld, A. W. Moore, and S. F. Gull (2007)

Bayesian Neural Networks for Internet TrafficClassification
IEEE Transactions on Neural Networks
Lotfollahi, Mohammad and Shirali, Ramin and Jafari Siavoshani,
Mahdi and Saberian, Mohammadsadegh (2017)
Deep Packet: A Novel Approach For Encrypted Traffic Classification
Using Deep Learning
Wang, Z. (2015)
The Applications of Deep Learning on Traffic Identification.
Black Hat USA


Network Traffic Identification Using Deep Learning: Anand G. Jitendra K. Darren L. Gokulnath M

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Network Traffic Identification Using Deep Learning: Anand G. Jitendra K. Darren L. Gokulnath M

Uploaded by

Copyright:

Available Formats

Network Traffic Identification using Deep Learning

Anand G.1 Jitendra K.1 Darren L.1 Gokulnath M.1

April 22, 2019

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

Introduction and problem statement

Anand G., Jitendra K., Darren L., Gokulnath Network

Recent times have shown massive surge in reliance on Internet for

Anand G., Jitendra K., Darren L., Gokulnath Network

A naive method to solve this problem would be applying port based

Anand G., Jitendra K., Darren L., Gokulnath Network

Deep learning addresses the issues faced by the other methods by

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

Figure: CNN - Convolutional Neural Network

Anand G., Jitendra K., Darren L., Gokulnath Network

Since deep learning requires large amounts of data, we have procured

Anand G., Jitendra K., Darren L., Gokulnath Network

The model we built was trained to classify the following data:

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

We ran the model for 10 epochs on Google Colab, using a GPU.

Anand G., Jitendra K., Darren L., Gokulnath Network

Figure: Classification Report Figure: Validation Accuracy

Anand G., Jitendra K., Darren L., Gokulnath Network

Make use of weighted classes.

Anand G., Jitendra K., Darren L., Gokulnath Network

Anand G., Jitendra K., Darren L., Gokulnath Network

T. Auld, A. W. Moore, and S. F. Gull (2007)

Anand G., Jitendra K., Darren L., Gokulnath Network

You might also like