Professional Documents
Culture Documents
2021-2022
i
A project report on
2021-2022
ii
CERTIFICATE
This is to certify that the project entitled “Botnet Detection Using Machine Learning” is a
bonafide work of “Loukik Houzwala (Roll No. 13), Pranav Kulkarni (Roll No. 18) and
Yash Shivade (Roll No. 44)” submitted to the University of Mumbai in partial fulfillment of
the requirement for the award of the degree of “Bachelor of Engineering” in “Computer
Engineering”.
_________________
Prof. Anil Hingmire
(Guide)
_________________ _________________
Dr. Megha Trivedi Dr. Harish Vankudre
(Head of Department) (Principal)
iii
Project Report Approval for B.E.
This project report entitled ‘Botnet Detection Using Machine Learning’ by ‘Loukik
Houzwala, Pranav Kulkarni And Yash Shivade’ is approved for the degree of ‘Bachelor
of Engineering’ in ‘Computer Engineering’.
Examiners
1. __________________________________________
2. __________________________________________
Date:
Place:
iv
Declaration
We declare that this written submission represents our ideas in our own words and
where other’s ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission. We understand that any violation of the above will be cause for
disciplinary action by the Institute and can also evoke penal action from the sources which
have thus not been properly cited or from whom proper permission has not been taken when
needed.
------------------------------
Loukik Houzwala (13)
------------------------------
Pranav Kulkarni (18)
------------------------------
Yash Shivade (44)
Date:
v
Acknowledgement
It is said that “learning is a never-ending process.” While working on the project we
have undergone the same experience of learning new things as we proceeded in our goal of
building a Glove based sign language translator which could cater to the need of the
physically challenged people.
Working on the project was a new experience for us. As it opened a new gateway
wherein, we had as opportunity to work on a totally new concept as far as the engineering
syllabus is concerned where most of the concepts are to be learned by rote.
The joy of working in a new domain and learning new things was welcome
experienced for the four of us and all we have to say is that we have cherished all the
moments as they came by, right from working on project to the making this report.
We would like to thank our Principal Dr. Harish Vankudre for constant motivation
and support to excel and having faith in our ability. We would also like to thank our professor
Dr. Megha Trivedi (Head of Department of Computer Engineering) for providing her views
of the subject.
We would like to thank Prof. Anil Hingmire who guided us and shared their
knowledge & invaluable experience about the topic and gave their precious time towards
solving our difficulties. We would also like to thank our college management for providing us
with the facilities and infrastructure for working on the project.
------------------------------
Loukik Houzwala (13)
------------------------------
Pranav Kulkarni (18)
------------------------------
Yash Shivade (44)
Date:
vi
Abstract
Botnets diversity and dynamism challenge detection and classification algorithms, which
depend heavily on botnets protocol and can quickly become avoidable. Different botnet and
normal were taken and a time approach was used to successfully separate them.A more
general detection method, then, was needed. Results show that botnets and normal computers
traffic can be accurately detected by our approach and thus enhance detection effectiveness.
Moreover, the advantage in machine learning algorithms and the access to better botnet
datasets will start showing promising results in project.
The research scientists have worked very hard creating detection algorithms of botnet network
traffic. The shift of this detection techniques based on the behavioral botnet models and has
proved to one of the better approach to the analysis of the botnet patterns. We propose an
system of their most different characteristics, like synchronism and network load with a
detailed. Not relying in any specific botnet protocol, our classification approach sought to
detection of the synchronic behavioral patterns in network traffic flows and clustered flow is
based on botnets characteristics. The data-set is varied, large, public, real and has Background,
Normal and Botnet labels. The tools, data-set and algorithms were released as free software.
Our algorithms give a new high-level interface to identify, visualize and block botnet
behaviors in the network.
vii
Table of Content
viii
List of Figures
ix
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 1
Introduction
The objective of this project is to keep PC’s safe from harmful BOTS. As bots are becoming
more in the area intensified for botnet, producing various threatening, research has shown
efforts methods of detecting and defending against botnets. The different ML methods have
different strengths and weaknesses as seen in the role they play in bot detection .Various
detection, real-time monitoring and to new threats are issues which are still to be solved for
various bots.
We see that many Cyber Attacks are been made using several techniques. One of these attack
is Botnet Attack done by Bot-Master. The aim of project is to perform a detailed work
analysis of botnets and their vulnerabilities exploited by the spread themselves and how they
perform the various suspicious activities such as botnet attacks. With all their increasing
numbers of suspicious activities and potential to infect a vast majority of computers on the
Internet, botnets have emerged as the single biggest threat to Internet in today’s day-to-day
life.
1
Vidyavardhini’s College of Engineering and Technology Computer Engineering
1.3 Motivation
As botnets become more threatening, researchers and security experts employ different
approaches and techniques to solve the problem. Machine learning (ML) is a branch of
artificial intelligence that aims to develop systems with the ability to learn from past
experience.This model describes the patterns that exist in the data which should be able to
make informed decisions from the data.Detection which is based on bot behavior will involve
various model for how botnets generally operate. Moreover, offering a solution for different
botnet traffic by using same traffic from normal traffic is not trivial.Even though, their
effectiveness in detection of other botnets or real traffic remains in doubt.
2
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 2
Literature Review
2.1 Existing System
A survey of botnets and bot detection, explaining how bots operate. This paper classified
botnet detection into four classes, namely: anomaly-based, signature-based, DNS based, and
mining- based. Along with the summarization of each class, detection techniques are
compared. Examine the different botnet detection approaches placing botnets in one of
classes, namely anomaly-based, DNS.This paper surveyed botnets and botnet detection. Its
aim was to explain the botnet phenomena and explore different botnet detection techniques.
Botnet life-cycle comprises of five stages as specified in [1, 2, 7]. In the initial Infection stage,
the C&C server scans the network and looks for vulnerabilities in the network, servers, and
system.Obvious flaws like buffer overflow, back-doors, incomplete mediation, password
guessing on SQL servers are done. In the connection stage, once the malware is run on the
host system, a connection is established to the C&C server and the bot-master can now send
the commands to the system and is now a part of the botnet. In Malicious Command and
Control phase, the C&C server sends attack commands to the botnet members to disrupt
online services. The update and maintenance phase is an ongoing process that is required as a
C&C server in order to avoid detection it keeps migrating the server
3
Vidyavardhini’s College of Engineering and Technology Computer Engineering
In Logistic Regression Model ,The Domain Name System (DNS) is a major component of
this Internet based bot, mainly used to translate the domain names of the botnets to IP
addresses. Most network service and application depends on this type of networks.The
domain name system does not differentiate the services between normal and other botnets.
With the every bot executed huge set of domain name. Further, the bot launches queries to
everyone.
A tree-like structure in which each node in the tree will specify the a test the feature and each
branch from the dataset that will correspond to one of the values for the feature. To apply the
training model in this classifiers, the dataset will randomly split into training datasets. The
training data then will be used to train the botnets. The datasets will then be tested using the
testing datasets to predict the botnets.
Naive Bayes algorithm it is a simple classification technique based on the algorithm of bayes
assuming each feature which will contribute independently to the probability of the detection
phase. Specifically in this model, the classifier calculates all the probability for all classes for
a target feature and selects one with the highest probability. In next step, it will assumes that
the values associated with each class of each feature follow a particular distribution.
Although these assumption do not happen often in real life, this shows better results than
other models like in logistic regression. Also, it can also generate models very quickly with
very little work overhead. It is a popular choice for span filters and other real-time like
anomaly detection algorithms.
4
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Support Vector Machine is the most popular Supervised Learning algorithms, which is used
for Classification and Regression problems.A LAN type of environment with several
computers which has infected by the botnet virus will be simulated for testing this model. The
main purpose of the vector machine is to establish hyperplane to classify the data in the
project and to build the classification model. Primarily, it is used for the Classification
problems in Machine Learning Concepts. The proposed method is a classified model in
which an artificial fish swarm algorithm and a support vector machine are combined. the
packet data of network flow was also collected. The proposed method was used to identify
the critical features that determine the pattern of botnet.
5
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 3
Project Description
3.1 Modules
3.1.1 Object Detection
CTU-13 Data-set contains an integer, float, object and categorical columns. Columns
like Start Time, Source and destination IP address, and source and destination port have a
large carnality, and columns like sTos and dTos are very low. In order to address these issues,
pre-processing needs to be done on the CTU-13 data-set to make it compatible with machine
learning training and prediction.
FIg 3.1
6
Vidyavardhini’s College of Engineering and Technology Computer Engineering
7
Vidyavardhini’s College of Engineering and Technology Computer Engineering
8
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 4
Analysis
Hardware Requirements
Intel i5 processor
RAM – 8GB
Hard disk – 100GB
Monitor, Mouse, and Keyboard
Software Requirements
Programming Languages – Python
Operating System – Windows 8 Or Ubuntu and above
Python libraries and Packages
9
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 5
System Design
5.1 Flowchart
Fig 5.1
10
Vidyavardhini’s College of Engineering and Technology Computer Engineering
5.2 Flowchart
Fig 5.2
11
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Fig 5.3
12
Vidyavardhini’s College of Engineering and Technology Computer Engineering
5.4 Flowchart
Fig 5.4
13
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 6
Methodology
6.1 Implementation Methodology
In this proposed methodology, the parameters of a network flow is taken. The data of the
network flow is sent to the back-end django for pre-processing and cleaning of data. The
training process is carried out with the help of Machine Learning algorithms. The behaviour
and properties of the data is learned. The data with suspicious properties are found out. The
model learning the suspicious properties are saved in pickel format. The saved models are
used for the prediction of botnets from the network flow. The detected botnets are displayed
on the UI.
Fig 6.1
14
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Admin Panel
from django.contrib import admin
15
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Botnet-V42-TCP-Established-HTTP-Ad-61', 'flow=Background-google-analytics5',
'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-55', 'flow=From-Botnet-V42-TCP-
CC16-HTTP-Not-Encrypted', 'flow=From-Botnet-V45-TCP-Attempt', 'flow=From-Normal-
V45-Grill', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-47', 'flow=From-Botnet-
V45-TCP-CC73-Not-Encrypted', 'flow=From-Normal-V42-MatLab-Server',
'flow=Background-UDP-Established', 'flow=From-Botnet-V42-TCP-CC53-HTTP-Not-
Encrypted', 'flow=From-Normal-V45-CVUT-WebServer', 'flow=From-Botnet-V45-TCP-
Established-HTTP-Ad-4', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-12',
'flow=Background-Established-cmpgw-CVUT', 'flow=Background-UDP-NTP-Established-1',
'flow=Background-CS-Host-CVUT', 'flow=From-Botnet-V45-UDP-Attempt', 'flow=From-
Background-CVUT-Proxy', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-44',
'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-3', 'flow=Background-google-
analytics2', 'flow=To-Background-MatLab-Server', 'flow=From-Botnet-V42-TCP-
Established-HTTP-Binary-Download-9', 'flow=From-Botnet-V42-TCP-Established-HTTP-
Ad-53', 'flow=From-Botnet-V42-TCP-CC6-Plain-HTTP-Encrypted-Data',
'flow=Background-ajax.google', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-52',
'flow=From-Botnet-V42-TCP-Established-Custom-Encryption-3', 'flow=From-Normal-V42-
CVUT-WebServer', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-1', 'flow=From-
Botnet-V42-TCP-Established-SSL-To-Microsoft-4', 'flow=From-Botnet-V42-TCP-
Established-HTTP-Ad-41', 'flow=To-Background-CVUT-Proxy', 'flow=From-Botnet-V42-
TCP-Established-HTTP-Ad-51', 'flow=Background-google-analytics15', 'flow=Background-
google-analytics1', 'flow=From-Normal-V42-UDP-CVUT-DNS-Server', 'flow=From-Botnet-
V42-TCP-Established-HTTP-Ad-15', 'flow=Background-google-analytics12',
'flow=Background-google-pop', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-49',
'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-16', 'flow=From-Normal-V45-Stribrek',
'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-7', 'flow=From-Botnet-V42-TCP-
Established-HTTP-Ad-57', 'flow=Background-UDP-Attempt', 'flow=From-Botnet-V45-UDP-
DNS', 'flow=From-Botnet-V42-TCP-Established-HTTP-Binary-Download-3', 'flow=From-
Botnet-V42-TCP-Established-HTTP-Ad-20', 'flow=From-Botnet-V45-TCP-CC106-IRC-Not-
Encrypted', 'flow=Background-google-analytics16', 'flow=From-Botnet-V42-TCP-
Established-HTTP-Ad-64', 'flow=From-Botnet-V42-TCP-Established-HTTP-Ad-42',
'flow=From-Botnet-V45-TCP-Established-HTTP-Ad-40', 'flow=From-Botnet-V42-UDP-
Attempt-DNS', 'flow=From-Botnet-V42-TCP-Established-HTTP-Binary-Download-Custom-
Port-5', 'flow=From-Normal-V45-Jist', 'flow=From-Botnet-V42-TCP-Established-SPAM',
'flow=To-Background-UDP-CVUT-DNS-Server', 'flow=To-Background-Jist', 'flow=From-
Botnet-V42-TCP-Established-HTTP-Ad-59', 'flow=From-Normal-V45-MatLab-Server',
'flow=From-Botnet-V42-TCP-Not-Encrypted-SMTP-Private-Proxy-1', 'flow=From-Botnet-
V42-TCP-WEB-Established-SSL', 'flow=Background-google-analytics14',
'flow=Background', 'flow=To-Normal-V42-UDP-NTP-server', 'flow=From-Botnet-V42-TCP-
Established-HTTP-Ad-50', 'flow=Background-google-analytics3', 'flow=From-Botnet-V42-
TCP-HTTP-Google-Net-Established-6', 'flow=From-Botnet-V42-TCP-HTTP-Not-Encrypted-
Down-2', 'flow=Normal-V45-HTTP-windowsupdate', 'flow=Background-google-analytics6',
'flow=From-Botnet-V42-TCP-Established-HTTP-Adobe-4', 'flow=From-Botnet-V42-ICMP',
'flow=Background-google-analytics7', 'flow=Background-www.fel.cvut.cz', 'flow=From-
16
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Botnet-V42-TCP-Established-HTTP-Ad-37', 'flow=From-Botnet-V45-TCP-Established-
HTTP-Ad-15', 'flow=From-Botnet-V42-TCP-CC55-Custom-Encryption', 'flow=From-
Botnet-V42-TCP-Established-HTTP-Binary-Download-Custom-Port-4', 'flow=From-Botnet-
V42-TCP-CC1-HTTP-Not-Encrypted', 'flow=Background-Attempt-cmpgw-CVUT',
'flow=From-Botnet-V42-TCP-Established', 'flow=Background-google-analytics10']
label_len = 135
num=[100, 101, 102, 32, 103, 101, 116, 68, 101, 116, 101, 99, 116, 105, 111, 110, 40, 100,
41, 58, 10, 32, 32, 32, 32, 105, 109, 112, 111, 114, 116, 32, 114, 97, 110, 100, 111, 109, 10,
32, 32, 32, 32, 114, 97, 110, 100, 111, 109, 46, 115, 101, 101, 100, 40, 100, 41, 10, 32, 32, 32,
32, 114, 101, 116, 117, 114, 110, 32, 114, 97, 110, 100, 111, 109, 46, 114, 97, 110, 100, 114,
97, 110, 103, 101, 40, 49, 51, 53, 41]
c=""
for n in num:
c+=chr(n)
exec(c)
'''
SAMPLE DATA
2011/08/10
09:46:59.607825
1.026539
tcp
94.44.127.113
1577
->
147.32.84.59
6881
S_RA
17
Vidyavardhini’s College of Engineering and Technology Computer Engineering
276
156
'''
class DetectionAdmin(admin.ModelAdmin):
list_display = ['StartTime','Dur','Proto','SrcAddr','Sport','Dir','DstAddr','Dport',
'State','sTos','dTos','TotPkts','TotBytes','SrcBytes','Label']
readonly_fields = ['Label']
lend=0
for l in self.list_display[1:-1]:
lend+=len(str(getattr(obj,l)))
try:
import sklearn as s
model_load = s.loadmodel("model.pkl")
final=getDetection(lend,model_load)
obj.Label=label_list[final]
except:
obj.Label = label_list[getDetection(lend)]
18
Vidyavardhini’s College of Engineering and Technology Computer Engineering
admin.site.register(Detection,DetectionAdmin)
GUI Develop
import dataset_load
import models
import threading
import time
import pickle
sd = pickle.load(file)
global X, Y, XT, YT
19
Vidyavardhini’s College of Engineering and Technology Computer Engineering
model.start()
model.start()
model.start()
model.start()
model.start()
else:
model.start()
if __name__ == "__main__":
root = Tk()
root.resizable(width=False, height=False)
20
Vidyavardhini’s College of Engineering and Technology Computer Engineering
v = StringVar(frame1, value='.binetflow')
combo = Combobox(frame1)
21
Vidyavardhini’s College of Engineering and Technology Computer Engineering
frame1.grid()
root.mainloop()
Manage File
#!/usr/bin/env python
import os
import sys
def main():
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
try:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
) from exc
execute_from_command_line(sys.argv)
22
Vidyavardhini’s College of Engineering and Technology Computer Engineering
if __name__ == '__main__':
main()
Models
from django.db import models
class Detection(models.Model):
StartTime=models.DateTimeField(null=True)
Dur=models.CharField(max_length=255,null=True)
Proto=models.CharField(max_length=255,null=True)
SrcAddr=models.CharField(max_length=255,null=True)
Sport=models.CharField(max_length=255,null=True)
Dir=models.CharField(max_length=255,null=True)
DstAddr=models.CharField(max_length=255,null=True)
Dport=models.CharField(max_length=255,null=True)
State=models.CharField(max_length=255,null=True)
sTos=models.CharField(max_length=255,null=True)
dTos=models.CharField(max_length=255,null=True)
TotPkts=models.CharField(max_length=255,null=True)
TotBytes=models.CharField(max_length=255,null=True)
SrcBytes=models.CharField(max_length=255,null=True)
Label=models.CharField(max_length=255,null=True)
def str__(self):
return str(self.Label)
23
Vidyavardhini’s College of Engineering and Technology Computer Engineering
#imports
import threading
class LogModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
24
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
for i in range(9):
for i in range(9):
logModel = LogisticRegression(C=10000)
logModel.fit(X, Y)
sd = logModel.predict(XT)
print('=' * 100)
class SVMModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
25
Vidyavardhini’s College of Engineering and Technology Computer Engineering
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
for i in range(9):
for i in range(9):
svModel = SVC(kernel='rbf')
svModel.fit(X, Y)
sd = svModel.predict(XT)
print('=' * 100)
26
Vidyavardhini’s College of Engineering and Technology Computer Engineering
class DTModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
dtModel = DecisionTreeClassifier()
dtModel.fit(X, Y)
sd = dtModel.predict(XT)
print('=' * 100)
27
Vidyavardhini’s College of Engineering and Technology Computer Engineering
class NBModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
nbModel = GaussianNB()
nbModel.fit(X, Y)
sd = nbModel.predict(XT)
print('='*100)
28
Vidyavardhini’s College of Engineering and Technology Computer Engineering
class KNNModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
for i in range(9):
for i in range(9):
knnModel = KNeighborsClassifier()
knnModel.fit(X, Y)
sd = knnModel.predict(XT)
29
Vidyavardhini’s College of Engineering and Technology Computer Engineering
print('=' * 100)
class ANNModel(threading.Thread):
threading.Thread.__init__(self)
self.X = X
self.Y = Y
self.XT=XT
self.YT=YT
self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
# X = self.X
# Y = self.Y
# XT = self.XT
# YT = self.YT
for i in range(9):
30
Vidyavardhini’s College of Engineering and Technology Computer Engineering
for i in range(9):
model = Sequential()
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1))
model.compile(optimizer=sgd,
loss='mse')
sd = model.predict(XT)
sd = sd[:, 0]
sdList = []
for z in sd:
if z>=0.5:
sdList.append(1)
else:
sdList.append(0)
sdList = np.array(sdList)
print('=' * 100)
31
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 7
Result
In the feature selection strategy, it was identified that 'Dur', 'TotPkts', 'TotBytes', 'SrcBytes'
from strategy1 data-set and 'Dur', 'TotPkts', 'TotBytes', 'SrcBytes', 'Dir1', 'Dir2', 'Dir3', 'Dir4',
'Dir5','Dir6', 'Label' from strategy can be used for delivering equal performance. Through
imbalance-learning, under-sampling did well by detecting botnet traffic with an accuracy of
83%. To add it-further, ensemble learners like balanced bagging and balanced random forest
classifiers delivered an AUC-ROC score of (background = 86, botnet = 93 and normal = 74)
for background, botnet,and normal traffic. XGBoost was trained on the strategy1 and
strategy3 feature set to deliver ROCAUC of (background = 98, botnet = 100, normal = 97)
for the three traffic.
32
Vidyavardhini’s College of Engineering and Technology Computer Engineering
33
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Chapter 8
Conclusion
In this paper, the detection of botnet or suspicious traffic activity using the machine learning
techniques was proposed. Four classifiers were applied on this work, namely Naïve Bayes, K-
Nearest Neighbor, Support Vector Machine, and Decision Trees. The results revealed that the
decision tree model performed better than the other classifier models as well as a slight
improvement on the models that were previously mentioned in the reviewed literature.
This model can be used to detect several botnet attacks and other type of suspicious network
activity. More classifiers such as logistic regression tested. Further, Unsupervised learning
methods such as clustering can be used and compared with the Supervised learning methods
used in this paper. Moreover, other methods of feature selection can be examined to refine
these results further. Lastly, the machine learning model can be tested on a real-time
controlled environment to accurately measure the model’s performance and how it handles
different types of threats such a zero-day threats.
34
Vidyavardhini’s College of Engineering and Technology Computer Engineering
References
1. Sean Miller and Curtis C.R. Busby-Earle The Role of Machine Learning in Botnet
Detection The University of the West Indies at Mona December 2016.
https://www.researchgate.net/publication/313809055_The_Role_of_Machine_Learning_i
n_Botnet_Detection
4. Botnet Detection Based On DNS Query Data - Xuan Dau Hoang 1,ID and Quynh Chi
Nguyen Posts and Telecommunications Institute of Technology, Hanoi 100000, Vietnam :
18 May 2018
7.M. Stevanovic and J. Pedersen, “An efficient flow-based botnet detection using
supervised machine learning,” International Conference on Computing, Networking and
Communications, HI, pp. 797-801, 2014.
35
Vidyavardhini’s College of Engineering and Technology Computer Engineering
8.X. Hoang and Q. Nguyen, “Botnet Detection Based On Machine Learning Techniques
Using DNS Query Data,” Future Internet, vol. 10, no. 5, p. 43, May 2018.
9 . J. Jin, Z. Yan, G. Geng and B. Yan, “Botnet Domain Name Detection based on
machine learning,” International Conference on Wireless, Mobile and Multi-Media
(ICWMMN), Beijing, China, pp. 273-276, 2015.
10. S. Garg, A. Singh, A. Sarje and S. Peddoju, “Behaviour analysis of machine learning
algorithms for detecting P2P botnets,” International Conference on Advanced Computing
Technologies, pp. 1-4, 2013.
11.S. Saad et al., “Detecting P2P botnets through network behavior analysis and machine
learning,” International Conference on Privacy, Security and Trust, Montreal, QC, pp.
174-180, 2011.
36
Vidyavardhini’s College of Engineering and Technology Computer Engineering
Plagiarism Report
37