You are on page 1of 18

Network Traffic Identification using Deep Learning

Anand G.1 Jitendra K.1 Darren L.1 Gokulnath M.1


1 Guide:
Prof. (Dr) Narendra Shekokar
Department of Computer Engineering
Dwarkadas J. Sanghvi College of Engineering

April 22, 2019

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 1 / 18
Acknowledgement

First and foremost, we would like to thank our guide Prof. (Dr.)
Narendra Shekokar for his timely guidance and suggestions, without
which this project would not have seen completion.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 2 / 18
Outline

Introduction and problem statement


Shortcomings of current methods
Proposed Method
Examples
Datasets
PCAP files used
Architecture
Results
Summary

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 3 / 18
Introduction

Recent times have shown massive surge in reliance on Internet for


providing various services. For this, it becomes important to know
what type of data is being sent on the network to both the ISPs and
the clients using it.
Traffic classification refers to to the task of identifying the types
and quanties of packets flowing across the network.
Given a network flow, we have to determine how much of the data
belongs to which classes. Example of classes include HTTPS, SMTP,
VoIP, VPN, non-VPN etc.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 4 / 18
Shortcomings of current methods

A naive method to solve this problem would be applying port based


classification methods. However, this simple approach has become
less effective as applications have found ways to change their port
number.
Signature based methods involve matching a sequence of bytes or
properties to a known signature. However, this works only for known
protocols thus limiting its use.
Another approach would be deep packet inspection. This has its
own issues - breach of privacy, high time consumption, failure on
encrypted data.
As malicious content often tries to obscure its intent, it is more
difficult to analyse their intent without performing a thorough
statisitical analysis.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 5 / 18
Proposed Method

Deep learning addresses the issues faced by the other methods by


not relying on the actual contents, but by replacing handcrafted
features with efficient algorithms for unsupervised or
semisupervised feature learning and hierarchical feature
extraction
This would mainly involve only tuning the hyper-parameters of the
neural network with respect to the problem of traffic classification.
The parameters are: number of layers, number of nodes in the
layer, padding style, input length(in case of text input) etc.
One way to approach this problem would be by evaluating what tasks
neural networks are good at, and whether the data can be represented
in that format.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 6 / 18
Literature Survey

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 7 / 18
Examples:
Here, the PCAP data was transformed into an image which was fed into
the neural network:

Figure: CNN - Convolutional Neural Network

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 8 / 18
Datasets

Since deep learning requires large amounts of data, we have procured


the dataset- VPN-nonVPN dataset
The VPN-nonVPN dataset, obtained from Canadian Institute of
Cybersecurity, contains data collected in a regular session and a
session over VPN, It has 14 traffic categories: VoIP, VPN-VoIP,
P2P, VPN-P2P, etc.
Out of this, we chose certain PCAP files belonging to a few classes
that we tried running this model on.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 9 / 18
PCAP files used

The model we built was trained to classify the following data:


Skype Audio
YouTube
Email
Skype Chat
SCP (Download)
Vimeo
Facebook Audio

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 10 / 18
Preprocessing

We hex dump the .pcap file. This results in an array of hex strings,
each string containing data of a single packet.
Then, we club a pair of consecutive characters to form a byte value(8
bits).
Each byte is interpreted as a pixel value in gray scale.
We consider only the first 784 bytes or 1568 characters to form a
28x28 image.
If packet data is less than 784 bytes, pad the image with 0 bytes to
complete.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 11 / 18
Architecture

The images are of size 28*28, which is fed into the neural network. The
architecture is as follows:
First Convolution Layer, with 32 filters. Generates 32 convoluted
images of the same dimension.
First Max Pool Layer, which reduces the dimension to 14*14.
Second Convolution Layer, with 64 filters. Generate 64 convoluted
images of the same dimension.
Second Max Pool layer, which reduces the dimension to 7*7.
Fully Connected Layer, with 256 nodes.
Two LSTM layers.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 12 / 18
Visualisation

Figure: CNN+LSTM

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 13 / 18
Results

We ran the model for 10 epochs on Google Colab, using a GPU.


We were able to achieve an accuracy of 90 % on the training set and
a validation accuracy of 86 %.
The weighted average of precison and recall was 87.66 % and 86.13
%.
The weighted average of the F1 score was 82.39 %

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 14 / 18
Results - continued

Figure: Classification Report Figure: Validation Accuracy

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 15 / 18
Future Work

Make use of weighted classes.


Make use of vanilla RNNs to prevent over fitting.
Develop/Experiment with architectures to generalize based on data.
Use pre-trained models like Inception, VGG16 and fine tune it to our
use case.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 16 / 18
Summary

Current methods used for traffic classification are not efficient due to
various reasons mentioned above.
These shortcomings can be overcome by deep learning.
We propose a model based on a combination of CNN and RNN in
order to exploit both the spatial and temporal features present in
network flow.

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 17 / 18
Related Works I

T. Auld, A. W. Moore, and S. F. Gull (2007)


Bayesian Neural Networks for Internet TrafficClassification
IEEE Transactions on Neural Networks
Lotfollahi, Mohammad and Shirali, Ramin and Jafari Siavoshani,
Mahdi and Saberian, Mohammadsadegh (2017)
Deep Packet: A Novel Approach For Encrypted Traffic Classification
Using Deep Learning
Wang, Z. (2015)
The Applications of Deep Learning on Traffic Identification.
Black Hat USA

Anand G., Jitendra K., Darren L., Gokulnath Network


M. Traffic Identification using Deep Learning April 22, 2019 18 / 18

You might also like