You are on page 1of 51

TABLE OF CONTENTS

COPYRIGHT iii

DECLARATION iv

RECOMMENDATION v

DEPARTMENTAL ACCEPTANCE vi

ACKNOWLEDGEMENT vii

ABSTRACT viii

TABLE OF CONTENTS ix

LIST OF FIGURES xii

LIST OF ABBREVIATIONS xiv

1 INTRODUCTION 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Cross-site Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Reflected XSS . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Stored XSS . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 SQL injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Distributed Denial of Service Attack . . . . . . . . . . . . . . . . . 6

1.5 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.7 Scope of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 LITERATURE REVIEW 9

2.1 History of Web Application Firewall . . . . . . . . . . . . . . . . . . 9

2.2 Architecture of Web Application Firewall . . . . . . . . . . . . . . . 10

2.3 Working of Web Application Firewall . . . . . . . . . . . . . . . . . 11

ix
2.4 Web Application Firewall Capabilities . . . . . . . . . . . . . . . . . 11

2.5 Web application security . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6 Why Web security is important . . . . . . . . . . . . . . . . . . . . 12

2.7 Different type of web attacks . . . . . . . . . . . . . . . . . . . . . . 13

2.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.9 Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 METHODOLOGY 15

3.1 System Working Architecture . . . . . . . . . . . . . . . . . . . . . 15

3.2 Flowchart of the Module . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3.1 Data preparation for SQL injection and XSS . . . . . . . . . 18

3.3.2 Setup the environment For the for DDoS data Preparation . 19

3.3.3 Correlative Data collection . . . . . . . . . . . . . . . . . . 20

3.3.4 Performance Parameters . . . . . . . . . . . . . . . . . . . . 20

3.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 RESULTS AND DISCUSSION 23

4.1 IDS ISCX 2012 DDOS data analysis . . . . . . . . . . . . . . . . . 23

4.2 Analysis of CISC 2019 Dataset . . . . . . . . . . . . . . . . . . . . . 27

4.3 DDoS Generated Data set . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 Sample raw Packets for DDoS Detection . . . . . . . . . . . . . . . 32

4.5 Sample raw Data for XSS and SQL injection . . . . . . . . . . . . . 33

4.6 Implementation of XXS and SQL injection in the LSTM Model . . 33

4.7 Implementation result of DDoS LSTM Model . . . . . . . . . . . . 35

4.8 Time for the execution . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 CONCLUSION AND RECOMMENDATION 38

x
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

REFERENCES 41

APPENDIX A 42

APPENDIX A 45

xi
LIST OF FIGURES

1.1 Working With the WAF . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Reflected XSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Stored XSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 SQL injection attack . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Working of DDoS attack . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 History of WAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 System Architecture of WAF . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Data collection Methodology for XSS and SQL injection . . . . . . 19

3.4 Data collection Methodology for DDoS attack . . . . . . . . . . . . 19

3.5 Correlative Data collection . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 IP header length normal vs DDoS IDS dataset . . . . . . . . . . . . 24

4.2 Dont Fragment status DDoS vs Normal traffic . . . . . . . . . . . . 24

4.3 IP Time to live Normal Vs DDoS . . . . . . . . . . . . . . . . . . . 25

4.4 IP protocol DDoS vs Normal Traffic . . . . . . . . . . . . . . . . . . 25

4.5 Acknowledgement flag DDoS vs Normal traffic . . . . . . . . . . . . 26

4.6 Source and destination port Normal VS DDos Traffic . . . . . . . . 27

4.7 Flow bites/s DDoS vs Normal traffic . . . . . . . . . . . . . . . . . 28

4.8 IP protocol DDoS vs Normal traffic . . . . . . . . . . . . . . . . . . 28

4.9 Destination port DDoS vs Normal traffic . . . . . . . . . . . . . . . 29

xii
4.10 Push Flag DDoS vs Normal traffic . . . . . . . . . . . . . . . . . . . 29

4.11 Flow Rate DDoS vs Normal traffic . . . . . . . . . . . . . . . . . . 30

4.12 Dont fragment flag DDoS vs Normal traffic . . . . . . . . . . . . . . 30

4.13 Packet Length DDoS vs Normal traffic . . . . . . . . . . . . . . . . 31

4.14 Frame length DDoS vs Normal traffic . . . . . . . . . . . . . . . . . 31

4.15 TTL value DDoS vs Normal traffic . . . . . . . . . . . . . . . . . . 32

4.16 Sample Dataset captured in Wire Shark . . . . . . . . . . . . . . . 32

4.17 Raw data collected in Burp Suite . . . . . . . . . . . . . . . . . . . 33

4.18 Confusion matrix of XSS and SQL injection model . . . . . . . . . 34

4.19 Confusion Matrix of testing XSS and SQL injection model . . . . . 34

4.20 Raw data collected in Burp Suite . . . . . . . . . . . . . . . . . . . 35

4.21 Raw data collected in Burp Suite . . . . . . . . . . . . . . . . . . . 36

4.22 Time taken for the feature extraction . . . . . . . . . . . . . . . . . 36

4.23 Time taken for the prediction . . . . . . . . . . . . . . . . . . . . . 37

5.1 Traffic from the LOIC instance to the Webs server . . . . . . . . . . 42

5.2 Capture packet in Wire Shark tool . . . . . . . . . . . . . . . . . . 42

5.3 DoS traffic send from the HULK tool . . . . . . . . . . . . . . . . . 43

5.4 Payload passing from the DVW Application . . . . . . . . . . . . . 44

5.5 HTTP traffic captured at the Burp Suite tool . . . . . . . . . . . . 44

xiii
LIST OF ABBREVIATIONS

XSS : Cross-Site scripting


HTTP :Hypertext Transfer Protocol
FPR : False Positive Rate
ROC : Receiver Operating Characteristics
TPR : True Positive Rate
HTTPS : Hypertext Transfer Protocol Secure
WAF : Web application firewall
ACL : Access Control List
FTP : File transfer protocol
RSH : Remote shell
NN : Neural Network
DOS : Denial Of service
SQL : Structured Query language
WCSS : Within Cluster Sum of Squares
TP : True Positive
TN : True Negative
FP : False Positive
FN : False Negative
AI : Artificial Intelligence
DDoS : Distributed Denial Of Service
LOIC : Low orbit ion Canon

xiv
CHAPTER 1

INTRODUCTION

1.1 Background and Motivation

One of the common difficulties in various disciplines of computer science is pro-


tecting computers and networks from infiltration, theft, and disturbance. The
importance of a security system increases as the number of internet users increases.
Many attempts have been made to build various security solutions, such as In-
trusion Detection Systems and firewalls. In most cases, network layer firewalls
and intrusion detection systems do not inspect HTTP packets in the application
layer[1]. As a result, they are incapable of fully safeguarding Web servers. Web
applications especially in the cloud are one of the most appealing targets for at-
tackers looking to break into an organization’s information infrastructure. Internal
data leaks, financial and credit losses, and website manipulation can all result from
an organization’s failure to implement Web security. A WAF is a tool to identify
and prevent many types of attacks, such as SQL injection, XSS, and DDoS [2].
WAFs use IDS methods in the application layer to secure web applications.

Many users depend on web applications for education, banking, social media,
information, etc. However, when using these applications, the existence of security
vulnerabilities can bring risks. Attackers can use these vulnerabilities to get access
to this sensitive information and send bad HTTP requests or install malware,
redirect unsuspecting users to malicious websites, and engage in other malicious
activities, by controlling, filtering, and monitoring HTTP traffic between a web
application and client on the Internet. A web application firewall helps to secure
online applications. It usually defends online applications against threats including
cross-site scripting (XSS),cross-site forgery, SQL injection, DDOS, etc[3][4][5].
A web application firewall is a web security measure that works on protocol
layer 7 defense that is designed to fight against different forms of attacks in the
application layer. This type of attack minimization is usually part of a larger set

1
of technologies that work together to provide comprehensive protection against a
variety of threats[3].

A Web application firewall acts as a barrier between a web application and the
client on the internet when it is deployed in front of a web application [1]. A Web
application firewall is a type of reverse proxy that protects the web server from
exposing to the client by detecting the bad traffic in the web application firewall.
While a proxy server act as an intermediary to protect a client machine’s identity.
A WAF is controlled by a set of rules known as policies and the pre-trained module
to predict the new incoming requests. By filtering out harmful communications,
these policies try to guard against application vulnerabilities. The usefulness of a
WAF is derived in part from the speed and ease with which policy modifications
may be deployed, allowing for a faster reaction to various attack vectors[2].

Figure 1.1: Working With the WAF

Hence, we introduce a layered architecture of Web application firewall. Based


on the rate of the traffic, the higher rate of detection is filtered in the first layer,
and only filtered traffic from the first layer is processed in the second layer. Since
the nature of the attack is different for the different attacks so, extracting the
required parameter based on the nature of the attack and predicting the new
request using a pre-trained model will increase the performance and accuracy of
the web application firewall.

2
1.2 Cross-site Scripting

Cross-site SCripting is an injection attack that occurs when attacket use vulner-
abilities in trusted websites to inject any malicious code and this code can be
implemented to steal personal information (PII)from users such as login informa-
tion, session cookies, and their sensitive information [6]. It can even remain on
the website permanently to continue targeting multiple users. The XSS attacks
have recently targeted many social networks such as LinkedIn ,Orkut, Twitter,
and Facebook, exposing many users personal information theft and other criminal
activities[5].

The two approaches for attackers to insert malicious code into a webpage are:

1.2.1 Reflected XSS

A reflected or non-persistent XSS attack is passed to the victim through the


attacker by a different medium such as an email message or from another website
and this allows an attacker to send malicious code as part of a server request that
is directed to a vulnerable site and then returns to the user’s browser[7].

Figure 1.2: Reflected XSS

3
1.2.2 Stored XSS

A stored or persistent XSS attack occurs when a malicious content is stored on


the targeted server and when a user requests to the stored information from that
malicious web server that may be a web page with a malicious script, the code will
be returned as part of the message[7].

Figure 1.3: Stored XSS

4
1.3 SQL injection

SQL injection is a sort of online security issue that allows an attacker to manipulate
database queries in a web application[4]. It gives an attacker access to data they
wouldn’t normally have access to. This could include data from other users as
well as any other information the app has access to. An attacker can change or
erase this data in many cases, causing the application’s content or behavior to be
permanently changed[8].

Figure 1.4: SQL injection attack

The WHERE clause of a SELECT query is where the majority of SQL injection
problems occur. On the other hand, SQL injection vulnerabilities, can occur at
any point in the query and across multiple query types[8]. The following are the
most prevalent other places when SQL injection occurs:

• In UPDATE statements, within the updated values or the WHERE clause.

• In INSERT statements

• In SELECT statements

• In SELECT statements, within the ORDER BY clause.

5
1.4 Distributed Denial of Service Attack

DDoS attacks are deliberate attempts to interrupt the normal traffic of a targeted
server, service, or network by flooding the target or its surrounding infrastructure
with Internet traffic. Because DDoS attacks leverage numerous compromised
computer systems as attack traffic sources, they are effective[3][9]. A DDoS attack
is similar to an unanticipated traffic congestion that prevents regular traffic from
reaching its target. PCs and other networked resources, such as IoT devices, are
examples of exploited machines. These networks are made up of malware-infected

Figure 1.5: Working of DDoS attack

computers and other devices (such as IoT devices) that an attacker may control
remotely. Bots are individual devices, while a botnet is a collection of bots. Once
a botnet has been formed, the attacker can lead an attack by sending remote
commands to each bot. When a botnet targets a victim’s server or network, each
bot sends requests to the target’s IP address, potentially overwhelming the server
or network and preventing normal traffic from passing through. Because each bot
is a legal Internet device, distinguishing attack traffic from typical Internet traffic
is challenging[3].

6
A site or service suddenly showing different behaviour or unavailable is the most
visible indicator of a DDoS attack. However, because a variety of factors, such
as a legitimate traffic spike, can produce identical performance concerns, further
analysis is usually required. Some of these telltale signals of a DDoS assault can
be detected analyzed using different analytical tools.

1.5 Problem Definition

It is necessary to protect web applications from web attacks. Most of the current
WAFs worked on signature-based systems. Now a day’s attackers are being smarter
and they find new technique and tactics so signature-based protection systems
is not a solution. The signature-based system only works for the know attacks
and threats, but can’t work on zero days attacks. Also if we have been providing
the service through a web, the entire service and business process depends on this
site any kind of DDoS attack may hamper the web application which may affect
directly to the service, business, and then the economy. A deep learning-based web
application firewall is made by training the system such that it is capable to detect
the new attack vectors and is capable to protect the web application based on the
effort we have made in training the module. Also correctly identifying the threat
is a challenging task, if any threat is detected incorrectly it may hugely impact the
business process so reducing the false Positive in the Web application firewall is a
big challenge in this field.

7
1.6 Objectives

The main objective of the thesis is to analyze the different types of web attacks
and to develop a web application firewall that is capable to detect different types
of cross-site scripting, SQL injection attacks, and Distributed Denial of Service
Attack.

• Identify different types of attack vectors and payloads used for web attacking
and comparative study between different type of attacks in a simulation
environment.

• Design and implement a layer module WAF architecture and implement it.

• Train the module and analyze the result, efficiency and accuracy.

1.7 Scope of the Work

Basically, there are different types of web attacks for which the network layer level
of defence is not sufficient. The most popular attacks among the web attacks are
DDoS, XSS, and SQL injection. This thesis is mainly focused on the detection of
web attacks using the deep learning module LSTM. The primary goal of this thesis
work is to analyze HTTP traffic and packets in order to detect DDoS, XSS, and
SQL injection attacks.

8
CHAPTER 2

LITERATURE REVIEW

2.1 History of Web Application Firewall

Network firewalls started to appear in the early 1990s and in the modern world, it
has developed to become WAF[10]. At that time, application firewalls were not
a separate system but instead, they were a part of a Network Intrusion Defense
System (NIDS) with limited layer 7 application capabilities and few specific
protocols like FTP and remote shell (rsh). With the proliferation of Internet, the
need to protect web application such as in e-commerce it became important, thus
came the inception of dedicated WAFs. It is separated from the network firewall
capabilities and are specific to web application and web services protection. WAF
has been evolving by using different techniques to improve threat detection rate, it
improves the percentage of attacks, threat detection and accuracy which represents
the percentage of false positives. ML based WAF is the enhancement of signature
and anomaly based detection and is one of the notable evolution[10].

Figure 2.1: History of WAF

9
2.2 Architecture of Web Application Firewall

Web application firewall Engine, which consists of two modules namely the Config-
uration Module and Packet Analyzer Module which are the core components of the
WAF i.e., WAF engine. When the live packets are received from the internet, the
configured and rules file filters it from configuration module and pass the traffic to
packet analyzer module. In packet analyzer module it analyzes the packet/traffic
and extract the features from the packet. It tests with previously trained data
and identified the nature/characters of that packet. Therefore, only the allowed
and analyzed packet are passed through the packet analyzer module to the web
application server.

Figure 2.2: System Architecture of WAF

The major components of WAF are:


a) Configuration Module The Configuration Module applies the settings and
Access Control List (ACL) to the WAF[1]. The settings and ACL are configured
through text files and through a specific syntax. The configuration Module will
apply the setting and access control list if no errors are found in the configuration
files, and the packets are routed to the Packet Analyzer module.

10
b) Packet Analyzer Module The Packet Analyzer Module analyze and inspect
the queued packet from the kernel level and this in turn lead to decide the packets
to allow or reject it. The allowed packets are passed to the web application servers
assuming the packets are safe.

2.3 Working of Web Application Firewall

WAF is deployed as hardware devices on virtual appliances, Software running


on the same web server as the web application both or through the cloud and
operate with a particular set of rules called policies and in each of these deployment
models, the WAF is always placed in front of the web application intercepting all
traffic between the application and the Internet, Thus these policies determine
the WAF firewalls that look for the traffic behavior and decides the action needs
to do with these bad traffic and vulnerabilities. WAF will continue scanning the
web applications and receives GET and POST requests to identify and filter the
HTTP request and malicious activities. Furthermore, the intelligent WAFs can
even request to identify that the users that are participating are human or a bot.
When vulnerabilities are found in the application, the WAF immediately patches
them to automatically block attackers and malicious actors like bots, attack IP
addresses, and attack-based inputs from finding these loopholes.

2.4 Web Application Firewall Capabilities

WAF are the first line of defense against complex attacks that would threaten the
integrity of any businesses and the most effective and efficient solutions provide
the following WAF capabilities.

• Input protection It provides a comprehensive application filtering that


accepts only the valid user input.

• HTTP validation It detects HTTP vulnerabilities and prevent attacks by


setting up the validation rules[2].

11
• Policies tailored to widely used applications It provides for setting
up the policies according to the requirements and need. Thus, it protects
applications from vulnerabilities and also provide the real-time insights of
the traffics.

• Data leakage prevention It provides an alert and prevent for any kind
of unusual traffics or data leakage by identifying, filtering and shielding the
private data.

• Automated attack blocking It provides the automation tools for blocking


the attacks by denying the malicious traffic from entering into the network.

2.5 Web application security

Web application security is significant to securing information, clients, and or-


ganizations from information robbery, interferences in commerce progression, or
other destructive comes about of cyber crime. More than three-quarters of all
cybercrime targets applications and their vulnerabilities. Web application secu-
rity and protection approaches endeavor to ensure applications through measures
such as web application firewalls (WAFs), multi-factor confirmation for clients,
the utilized security, and approval of treats to preserve client states. Also, the
protection status, and different strategies for approving client input to guarantee it
isn’t noxious some time so it confirms that input is prepared by an application[11].

2.6 Why Web security is important

One recent case study predicted that the cybercrime will cost 6 trillion lost by
2024. Security device and technology is crucial for limiting them. In expansion to
coordinate monetary and information robbery, web application dangers can devas-
tate resources, client goodwill, and trade notorieties. That’s why web application
security basic for organizations of all sizes.

12
2.7 Different type of web attacks

Every website on the Internet is vulnerable to cyber-attacks in some way. The


dangers range from human error to sophisticated cyber-attacks carried out by an
organized group of criminals. The major incentive for cyber attackers, according to
Verizon’s Data Breach Investigations Report, is financial. Whether you’re running
an ecommerce site or a simple small company website, you’re at risk of being
attacked. Each harmful assault on your website is unique, and with so many
various sorts of attacks circulating, it may appear impossible to defend yourself
against them all. We can still do a lot to protect your website from these assaults
and reduce the chances that dangerous hackers would target it. Major known web
attack types are:

• Distributed denial of service attack

• Brute force attack

• Men in the middle attack

• Cross site scripting attack

• Code injection attack

• SQL injection attack

• Phishing attack

• Fuzzing

• Zero-days attack

• Path Traversal

2.8 Related Works

Gustavo et al.[12] explored deep learning techniques for the analysis of HTTP
traffic. The author used a transformer encoder to analyze the HTTP traffic for
easier classification of HTTP traffic.

13
Moradi et al. [1] used the n-gram feature extraction model in web application
firewall to detect anomalies. The author used three different Machine learning
models and compare the performance of these models.

Pen et al. [6] analyzed the XSS and SQL injection in supervised and unsupervised
learning models and proposed the auto-encoder-based model for the detection of
such attacks.

Rajesh et al.[3] proposed the analysis of different features for DDoS attack detection.
The author also presents the comparative analysis result of the system in different
machine learning methods.

Lente et al.[13] used the LSTM model by converting the word into the vector and
implementing and evaluating the performance in the different batch sizes of the
LSTM model.

Keracan et al.[14] performed analysis in different data sets to analyze the HTTP
traffic for the web attacks. The authors used tokenization, data argumentation,
and k-fold validation to train the model for the prediction of normal and malicious
traffic.

2.9 Research Gap

There has been research into TCP, UDP, Sync, and NTP flood types of DDoS
attacks, but not specifically the HTTP flood DDoS attack, so this thesis work
attempts to analyze the HTTP flood DDoS attack and correlate the two types of
attacks, one affecting the availability of the service and the other the confidentiality
and integrity of the web services. There has been work on the detection of XSS
and SQL injection using word tokenization, but there is no related work with the
features extraction from the HTTP traffic, so this work focused on this as well.

14
CHAPTER 3

METHODOLOGY

3.1 System Working Architecture

A web application firewall exists between the web server and the client. Incoming
HTTP traffic is parsed and analyzed in the request processing unit WAF. The WAF
is trained with the training data set, it predicts whether the new incoming HTTP
traffic is good or bad. Since the nature of DDoS attacks is different from the other
two XSS and SQL injection, so the system is trained with a separate appropriate
data set. The new HTTP request is parsed and extracted the parameters required
for the prediction by the module. Then it applies to the pre-trained module for the
prediction, if the HTTP traffic is classified as a Bad request the request is dropped
otherwise it is passed to the second module for testing the SQL injection and XSS.
In the same way, this module identifies whether the HTTP traffic is good or Bad.
If the HTTP traffic is predicted as good traffic the HTTP session is passed to
the Web Server, Otherwise, the HTTP session is discarded/dropped in the WAF
itself. Since the rate of DDoS requests is very high if we checked DDoS in the
first layer of the WAF the accuracy and performance of the WAF system increase.
WAF consists of two modules one for DDoS attack detection and another one
for SQL injection and XSS detection. Rather than training the module with the
single dataset, it will have better results while training the module with a separate
dataset since the nature of the data and the nature of these attacks are different.

15
Figure 3.1: Block Diagram

3.2 Flowchart of the Module

For the training, module two different data sets is used, Since different features
and parameters are used for the detection of DDOS, XSS, and SQL injection. One
data set contains the DDOS detection parameters like IP address, Packet length,
Request type, time, etc. Whereas the SQL injection and XSS will extract the
HTTP header and HTTP body parts and analyzes the characters, and parameter
present in it. So both training and testing the data are applied to the LSTM model
using the generated data.After the completion of taraining the system the weight
of the node in the module adjusted so that we can predict the new request by
applying to the pre-trained module.

• Decoder: The captured data is in raw form so we need to decode the


captured data to the standard format. The DDoS data is decoded in the
Wireshark tool whereas the SQL injection and XSS log is decoded using URL
decoding.

• Feature and Parameters Selection: The decoded dataset consists of


different features and parameters present in it. We need to select the ap-
propriate parameters/features for training the module. The DDoS attack
detection features are selected by analyzing standard datasets and correlation
analysis in the captured data. For the SQL injection and XSS detection

16
Figure 3.2: Flow chart

analyze the standard dataset, perform a comparative analysis of normal and


attack traffic, and selected the characters and parameters.

The Features need to convert to the numerical data for training the module
so non numeric data is converted to the numeric one.

• Normalization The Features need to convert to numerical data for training


the module so non-numeric data is converted to numeric one.

• Vectorization/Tokenization The normalized numeric data is converted to


the LSTM vector form which is compatible with the training of the LSTM

17
model.

• LSTM module LSTM is an artificial neural network used in the field of


deep learning. Not only the feed-forward connection it can process a sequence
of data. It has memory as per necessity any parameter need to memorize
for a long time and certain parameter for the short time, LSTM model is
suitable for such conditions. LSTM consists of four gate input gate from
where the input at the time sequence and previous gate output is taken.
Similarly, forget gate and remember gate decides which content to be passed
and drop. And output get provides the input in the next sequence. In the
DDoS attack detection, we have to use large sequence of data for the output
and the sequence data are dependent to each other so, rather than the normal
feed forward network model LSTM could be the better model.

3.3 Data Collection

The standard dataset for DDOS IDS ISCX 2012 and CISCDOS2019 is used for
the analysis and to identify what is the parameter to distinguish the normal traffic
and DDOS attack traffic. Similarly, for the analysis of SQL injection and XSS
detection use the standard dataset CISC dataset to identify the features that could
be the indicator of XSS and SQL injection attacks. Also prepared the simulation
environment for DDoS, XSS, and SQL injection and collected data processed to
make a WAF training module.

3.3.1 Data preparation for SQL injection and XSS

Set up the test environment by setting up the DVWA web application from where
we can pass the XSS and SQL injection payloads and capture the traffic in the
middle proxy using the BurpeSuite tool. Different XSS and SQL injection payloads
are passed through the web browser, The traffic forwarded from the web browser
to the server are captured in the middleware Burpsuite proxy. The captured raw
traffic is then processed to extract the required parameter from the raw log, then
analyze the captured traffic and extract the features and parameters from it. Use

18
around 5700 payloads to collect the HTTP attack traffic and the normal traffic is
collected with the normal input from the user interface.

Figure 3.3: Data collection Methodology for XSS and SQL injection

3.3.2 Setup the environment For the for DDoS data Preparation

Set up the LOIC tools and hulk tools in the Kali Linux environment in VMware.
hosted a sample e-Commerce site on a Windows environment’s local host, then
flooded the traffic from the four LOIC instances, two instances of LOIC on each
machine. The forwarded traffic is captured in the Wireshark, which is in raw form,
processed, and extracted the useful information from it to train the WAF module.

Figure 3.4: Data collection Methodology for DDoS attack

19
3.3.3 Correlative Data collection

For the correlative analysis, the first layer of defense will be DDoS protection
whereas the second layer of protection will be for XSS and SQL injection. The
SQL injection data act like normal data in the DDoS attack Layer of protection
so to set up the environment and passed the different payload from the browser
the HTTP request is passed through the Burp Suite proxy and collects the HTTP
traffic and the same traffic is collected at the Wireshark. The collected log in
the respective tool used for the training the the model. The collected data in the
Wireshark taken as the normal data because it is collected via a norma browsing
the web. In case of the collected Burp suite log it depends whather it is from
payloads or norma browsing.

Figure 3.5: Correlative Data collection

3.3.4 Performance Parameters

Following parameters will be calculated for the both model, and identifies the
performance of the model for the attack detection.

20
• True Positive: Is the condition where the actuarial attack is detected as
attack.

• False Positive: It is the condition where normal traffic is detected as a attack


traffic.

• True Negative: It is the condition where attack traffic is detected as the


normal traffic.

• False Negative: It is the condition where normal traffic is detected as a


normal traffic.

• Attack Detection Rate: It determines the how accurate the attacks is detected
by the system. ADR = (Total detected attacks / Total attacks) * 100

• Recall : It defines the the actual percentage of attack is being detected by


the system among the actual attack. correctly identified. Recall Rate = TP/
(TP + FN)

• Precision: It defined the among the predicted attacks how many percentage
of the attacks are the actual attack Precision Rate = TP/ (TP + FP)

3.4 Tools

• Python: Python is a programming language and is most popular in Data


science and machine learning. In this, thesis work python is used for both
DDoS, SQL injection, and XSS model development and graphically analyzes
the distribution of data. We have used python libraries like NumPy, mat-
plotlib, and Keras for the model development. Also for the feature extraction
from the raw data python programming is used.

• Jupiter Lab: Jupiter Lab is an interactive environment for the code, data,
and notebooks. Specially for the data analysis and presentation of the data,
it provides a flexible and interactive environment for project development. In
this thesis work Jupiter, Notebook is used for coding and model development
as a platform.

21
• Low Orbit Ion Cannon: LOIC is an Open source application developed by
Praetox Technologies. It is mainly used to stress testing of the application,
basically, the working of this application is to download and request the
service to a large extent. IT floods the TCP, UDP, and HTTP request as per
selection, to the target so basically, the working of a single LOIC is a DoS
attack, using LOICs in multiple instances from the different source machines
and attacking the target from Multiple sources then collected the traffic in
Wireshark.

• Wire shark: It is an open-source packet collector and analyzer tool, from


where we can capture the packet at the interface level of the device. In the
project, we have captured the incoming packet from the different sources for
the DDoS traffic and normal traffic arriving at the web server.

• Burp Suite Burp Suite acts as a web proxy such that HTTP traffic generated
from the client is routed toward it and it forward the request to the web
server. In the thesis work, we have captured the HTTP traffic with Payloads
and without the payloads and collected the data for the SQL injection and
XSS detection model training.

• Kali Linux: Kali Linux is a Debian-based OS platform that provides different


security testing and auditing tools. An Offensive Security company founded
Kali Linux and is being maintained by the same company. In this thesis
work kali linux is used for the deploying the DDoS attackt tools for the Data
genratation.

For the DDoS simulation setup, we have used LOIC and Hulk tool, which is used
to generate the DDoS traffic in a simulation environment. Similarly, we captured
the XSS and SQL injection HTTP traffic in the Burp Suite Proxy tool.

22
CHAPTER 4

RESULTS AND DISCUSSION

4.1 IDS ISCX 2012 DDOS data analysis

To identify the features which can be used to distinguish the normal traffic and
DDoS attack Traffic used ISCX dataset, by taking the sample of 100K normal
data and 100K DDoS attack data. From the Sample, data analyzed the features in
attack traffic and normal traffic. The analyzed features are listed below:

The more distinguishable features to distinguish between normal traffic and attack
traffic are graphically presented below.

• IP header length: As shown in the below line graph the first 100K data is
Attack traffic and the last 100K is normal traffic, it seems the attack traffic
is food with a smaller IP header, whereas the normal traffic has a higher IP
header length comparative to attack traffic. Attack traffic mainly has the
same length of the IP header whereas the length of the IP header is varying
in normal traffic.

23
Figure 4.1: IP header length normal vs DDoS IDS dataset

• IP Don’t fragment: As shown in the below bar graph the first 100K data
is Attack traffic and the last 100K is normal traffic, from this Bar graph
observation we can clearly see that most of the normal traffic is with the
Don’t fragment enabled whereas the attack traffic seems without the don’t
fragment enabled. The packets are passed with the don’t fragment enabled in
normal traffic so that these packets are not able to further fragment whereas
attack traffic can fragment and pass through.

Figure 4.2: Dont Fragment status DDoS vs Normal traffic

• IP Time to live: As shown in the below bar graph the first 100K data
is Attack traffic and the last 100K is normal traffic, From the form below

24
graph, it can be concluded that the attack traffic is passed with lower TTL
than the normal traffic. There is a TTL in at around 100 for most of the
traffic if we compare the normal vs attack traffic normal traffic has higher
TTL compared to attack traffic.

Figure 4.3: IP Time to live Normal Vs DDoS

• IP protocol: As shown in the below bar graph the first 100K data is
Attack traffic and the last 100K is normal traffic, from the below protocol
we can observe that attack traffic is UDP data protocol and normal traffic is
mostly TCP data protocol. As UDP did not need the acknowledgment for
the connection so most of the attack packets are forwarded with the UDP
protocol.

Figure 4.4: IP protocol DDoS vs Normal Traffic

25
• Acknowledgment Flag As shown in the below bar graph the first 100K
data is Attack traffic and the last 100K is normal traffic, From the below
graph we can conclude that the attack traffic is flooded with the request
without the acknowledgment received whereas the normal traffic is with
mainly with the acknowledgment flag. Attack traffic food the request without
the acknowledgment received from the target.

Figure 4.5: Acknowledgement flag DDoS vs Normal traffic

• Source and destination port Similarly, while analyzing the source and
destination port status in normal and attack traffic which is graphically
presented below where the first 100K is attack and the last 100K is attack
traffic. From the below two graphs we can conclude that the attack traffic
used the higher valued port number rather than the normal traffic. Normally
most popularly used port is in a lower range, it seems in the attack the higher
range port and higher veriation of port is used.

26
Figure 4.6: Source and destination port Normal VS DDos Traffic

Above 6 feature are more distinguishable to differentiate between Normal traffic


and DDoS attack traffic. Since Attack traffic is mostly UDP so TCP packet length
found higher in Normal traffic then that of attack traffic. And Use of TCP push
flag is mostly used in Normal traffic, similarly the length of TCP packet, TCP
windows size is slightly denser in the normal traffic as compared to the attack
traffic. Likewise, there was no distinguishable parameter/property while studying
the features like Reset flag, TCP fin flag, TCP sync flag, TCP time delta, Frame
length.

4.2 Analysis of CISC 2019 Dataset

Also, for the data variation analyzed another standard dataset CISC dataset there
is DDoS data based on methods like DNS, LDAP, MSSQL, NetBios, NTP, SNMP,
SSDP, UDP, Sync, TFTP, and TCP. Out of which TCP and UPD base attacks
data checked, the data has conversed in terms of total length, flow rate, flag count,
etc. During the analysis found there is a significant difference in the total length
of the forwarded packet, total forwarded packets, flow rate bites/sec, flow rate
packet/sec, push flag, urgent flag, and average forwarded segment size. Though it
is calculated normalized value of different parameters it is suitable for comparative
analysis and identify the parameter to differentiate the normal traffic and DDoS
attack traffic

taking the sample data from the dataset where 80 is the attack data and 20K is

27
the normal data, the graphical representation of variation of value in the normal
data set and DDoS attack data set is as follows:

• flow Rate There is higher rate of bits flow in the attack data set as compared
to the normal data set. As shown in the below figure the first 80 data is the
flow rate of attack traffic where as last 20K is the normal traffic.

Figure 4.7: Flow bites/s DDoS vs Normal traffic

• IP Protocol: As shown in the below graphical representation the UDP


traffic its hex value is 16, is used mainly in the attack traffic where as normal
traffic is with TCP protocol whose hex is 6. Here, the first 80K is attack
traffic and the last 20K is normal traffic.

Figure 4.8: IP protocol DDoS vs Normal traffic

• Destination port: As shown in the below graphical representatin, the first


80K is the attack traffic where there is traffic sent to different ports, where
as last 20K with the less veriation of the destination ports.

28
Figure 4.9: Destination port DDoS vs Normal traffic

• Push flag: Normal traffic is passed with the push flag enabled, but it seems
attack traffic is mostly without the push flag. Here the first 80K is attack
traffic with push falg representation and the last 20K is for normal traffic.

Figure 4.10: Push Flag DDoS vs Normal traffic

4.3 DDoS Generated Data set

Normal traffic and DDoS traffic are collected in the simulation environment and
compared with the different parameters to identify how the DDoS traffic differs
from the normal traffic. collected the features which are analyzed in the standard
dataset and studied the nature of these features in the generated dataset. Out
of almost 50 characteristics, the most distinguishable parameters between normal
traffic and DDoS traffic are:

• Flow rate There is a significant difference in the flow rate between normal
and DDoS attack traffic. According to the graphical representation below,

29
the first 50% of the data is an attack dataset from a single LOIC instance,
while the remaining 50% is a normal data set.

Figure 4.11: Flow Rate DDoS vs Normal traffic

• Dont fragment flag There is a difference between normal and DDoS attack
traffic. The first 50K of data is an attack dataset in which data is transferred
with the don’t fragment flag enabled, while the last 50K is a normal dataset
in which most packets are transferred without the don’t fragment flag not
enabled.

Figure 4.12: Dont fragment flag DDoS vs Normal traffic

• Length of Packet As shown in the below graphical representation, the first


50K is an attack data set where the length of packet is comparatively small
as compared to the normal data set. The last 50K is the normal data set.

30
Figure 4.13: Packet Length DDoS vs Normal traffic

• Frame length on wire As shown in the below graphical representation,


the first 50K data is an attack dataset where the frame length on wire is
comparatively small as compared to normal data. The last 50K is a normal
data set.

Figure 4.14: Frame length DDoS vs Normal traffic

• Time to live: Thtime-to-liveve value of attack traffic is comparatively


low as compared to normal traffic. As shown in the below graph, the first
50K is DDoS attack traffic and the last 50K is normal traffic. Time to live in
normal traffic is high. It means a packet of normal traffic cabe forwardedrd
a higher number of hopes.

31
Figure 4.15: TTL value DDoS vs Normal traffic

In addition to these parameters, other parameters like request method, window size,
window scaling factors, and acknowledgement number have slightly distinguishable
graphical representation while analyzing the normal and DDoS traffic. These
parameters, which have the distinguishable property of separating the normal and
DDoS traffic, are used for the training of the module.

4.4 Sample raw Packets for DDoS Detection

The captured interface traffic is captured in the Wire Shark tool such that different
parameters from the captured packets are extracted to make compatible data for
the training of a deep learning model.

Figure 4.16: Sample Dataset captured in Wire Shark

32
4.5 Sample raw Data for XSS and SQL injection

First of all, analysed the standard dataset for XSS, SQL injection, and then studied
the features present in the HTTP traffic. The web proxy intercepts web traffic as
it travels from the web client to the web server. By comparing captured data for
XSS, SQL injection with a standard dataset to find the appropriate parameters to
distinguish between normal and attack traffic. From the computer data extracted,
the required parameters are identified during the traffic analysis of XSS and SQL
injection. The extracted feature value is used to train the LSTM module.

Figure 4.17: Raw data collected in Burp Suite

4.6 Implementation of XXS and SQL injection in the LSTM Model

While implementing the prepared data in the LSTM model, the observed accuracy
was 9% and the train and test accuracy in each epoch is shown below. To implement
the LSTM model, first the features for training the model are chosen, and then the
data for the selected features is extracted. Then the data is normalized, transformed
and implemented in the model in the train test ratio of 80:20. 20% of the data is
used for the validation of the model.

As shown in the below confusion matrix, among the 20% of testing data, 922
attacks are correctly predicted as bad, while 1129 good data are correctly predicted
as bad. Similarly, 7 good data sets are detected as bad by the model, and 221 bad
data sets are classified as good.

33
Figure 4.18: Confusion matrix of XSS and SQL injection model

Figure 4.19: Confusion Matrix of testing XSS and SQL injection model

As a result, the model’s overall accuracy is nearly 89 percent. The obtained


attack detection rate is approximately 80%. Similarly, the model’s precision and
recall are 99 percent and 83 percent, respectively.

34
4.7 Implementation result of DDoS LSTM Model

While implementing generated dataset after processing the 264K dataset, where
around 132K dataset is normal traffic dataset, and the remaining 132K attack
dataset in the LSTM model, an accuracy of almost 96% is obtained, and the graph
of training and validation accuracy is shown in the below line graph.

Figure 4.20: Raw data collected in Burp Suite

As shown in the below confusion matrix, among the 20% of testing data, 26357
attacks are correctly classifies as bad, while 25262 good data are correctly predicted
as good. Similarly, 1043 good data sets are detected as bad by the model, and
1033 bad data sets are classified as good.

As a result, the model’s overall accuracy is almost 97 percent. The obtained attack
detection rate is almost 99%. Similarly, the model’s precision and recall are 80
percent and 99 percent, respectively.

35
Figure 4.21: Raw data collected in Burp Suite

4.8 Time for the execution

As it is important that the performance of the web application should not be


hampered due to the WAF. The processing of input traffic in the WAF should not
be hampered/altered. We have to consider the performance aspect while improving
the security aspect.

As shown in below figure the time taken to collect the single HTTP packet data
input from the browser in the Jupiter Notebook as a proxy and extract the features
is 0.0019 seconds.

Figure 4.22: Time taken for the feature extraction

Similarly, the time taken for the prediction of the new input while predicting the
data should be considered. Here, the average time taken for the prediction class in
the XSS, SQL injection model for a single input data point is 0.0007 sec. Similarly,

36
Figure 4.23: Time taken for the prediction

the average time taken for the prediction, the class of a single dataset in the DDoS
detection model is 0.00039 sec. The time taken for the conversion of single data is
0.004 seconds. In this way, the total time taken for the processing data in the
WAF is considerable, which would not feel delayed by the end user.

37
CHAPTER 5

CONCLUSION AND RECOMMENDATION

5.1 Conclusion

Using LSTM deep learning modules, the proposed model detects DDoS, XSS, and
SQL injection attacks with good accuracy. The first detection layer is a DDoS
attack detection model with an accuracy of nearly 97%, and the second layer is for
XSS and SQL injection detection with an accuracy of nearly 89%. We analyzed
and used additional features and parameters for attack detection, which reduced
the false positives during traffic filtering in the WAF. Since DDoS traffic is at a
higher rate than normal, it improves the system’s performance when we check the
traffic in the layered format, i.e. first check for DDoS then SQL injection, and XSS.
Also, analyze the performance perspective of the web application when an extra
layer of filtering is added and find there is a slight impact on the performance but
not quite distinguishable from the user experience perspective.

5.2 Limitation

The nature of HTTP traffic toward the different sites may differ, so studied features
may not be sufficient to identify the good or bad traffic. For the DDoS data
preparation, we have used only 4 instances of LOIC which might not be sufficient
because in a real environment thousands of such bots are used. This research is
limited to three types of attack only further we can add other types of attacks like
Remote code execution, Brute force, path traversal, etc. Since performance of the
web application while deploying the web application firewall is critical so while
training the system with huge amount of data set it might impact the performance
during the prediction.

38
5.3 Recommendation

This thesis work is only discusses the three types of web attacks, DDoS, SQL Injec-
tion, and XSS. We can add other types of common web attacks like RCE,malware,
brute force etc. Because of the similar detection properties, we examined SQL
injection and XSS in a single module in this work. Similarly, we can study the
characteristics of attack and develop a module by adding the types of attack.
Because the characteristics of a brute force attack are somewhat similar to those
of a DDoS attack, we can add detection parameters to the dataset and build the
model to detect both types of attacks. Also, RHC can be added to the XSS and
SQL injection models, since remote code have similar characteristics like XSS and
SQL injection.

39
REFERENCES

[1] Ali Moradi Vartouni, Mohammad Teshnehlab, and Saeed Sedighian Kashi.
Leveraging deep neural networks for anomaly-based web application firewall.
IET Information Security, 13(4):352–361, 2019.

[2] Michiaki Ito and Hitoshi Iyatomi. Web application firewall using character-level
convolutional neural network. In 2018 IEEE 14th International Colloquium
on Signal Processing & Its Applications (CSPA), pages 103–106. IEEE, 2018.

[3] Shriram Rajesh, Marvin Clement, Sooraj SB, Al Shifan SH, and Jyothi John-
son. Real-time ddos attack detection based on machine learning algorithms.
Available at SSRN 3974241, 2021.

[4] Manar Hasan Ali AL-Maliki and Mahdi Nsaif Jasim. Review of sql injection at-
tacks: Detection, to enhance the security of the website from client-side attacks.
International Journal of Nonlinear Analysis and Applications, 13(1):3773–3782,
2022.

[5] Caio Lente, Roberto Hirata Jr, and Daniel Macêdo Batista. An improved
tool for detection of xss attacks by combining cnn with lstm. In Anais
Estendidos do XXI Simpósio Brasileiro em Segurança da Informação e de
Sistemas Computacionais, pages 1–8. SBC, 2021.

[6] Yao Pan, Fangzhou Sun, Zhongwei Teng, Jules White, Douglas C Schmidt,
Jacob Staples, and Lee Krause. Detecting web attacks with end-to-end deep
learning. Journal of Internet Services and Applications, 10(1):1–22, 2019.

[7] BA Vishnu and KP Jevitha. Prediction of cross-site scripting attack using ma-
chine learning algorithms. In Proceedings of the 2014 International Conference
on Interdisciplinary Advances in Applied Computing, pages 1–5, 2014.

[8] Sandeep Kumar, Renuka Mahajan, Naresh Kumar, and Sunil Kumar Khatri.
A study on web application security and detecting security vulnerabilities. In
2017 6th International Conference on Reliability, Infocom Technologies and

40
Optimization (Trends and Future Directions)(ICRITO), pages 451–455. IEEE,
2017.

[9] Yan Li and Yifei Lu. Lstm-ba: Ddos detection approach combining lstm and
bayes. In 2019 Seventh International Conference on Advanced Cloud and Big
Data (CBD), pages 180–185. IEEE, 2019.

[10] A. S. Hovan George SH George S. A brief study on the evolution of next


generation firewall and web application firewall. In Internation journal of
advance research and communication Engineering, 2021.

[11] Asish Kumar Dalai and Sanjay Kumar Jena. Evaluation of web application
security risks and secure design patterns. In Proceedings of the 2011 Interna-
tional Conference on Communication, Computing & Security, pages 565–568,
2011.

[12] Nicolás Montes, Gustavo Betarte, Rodrigo Martı́nez, and Alvaro Pardo. Web
application attacks detection using deep learning. In Iberoamerican Congress
on Pattern Recognition, pages 227–236. Springer, 2021.

[13] Caio Lente, Roberto Hirata Jr, and Daniel Macêdo Batista. An improved
tool for detection of xss attacks by combining cnn with lstm. In Anais
Estendidos do XXI Simpósio Brasileiro em Segurança da Informação e de
Sistemas Computacionais, pages 1–8. SBC, 2021.

[14] Hacer Karacan and Mehmet Sevri. A novel data augmentation technique
and deep learning model for web application security. IEEE Access, 9:150781–
150797, 2021.

41
APPENDIX A

APPENDIX I

Figure 5.1: Traffic from the LOIC instance to the Webs server

Above figure illustrate the HTTP traffic send to the web server from the LOIC
tool. We can select the nature of traffic we want to send TCP,UDP or HTTP.

Figure 5.2: Capture packet in Wire Shark tool

Above figure illustrate the capturing of HTTP packets at the Wire Shark.

APPENDIX II

As shown in above figure the once we started the hulk program it will create the
bots which send the traffic towards the target in this way huge traffic is flooded to

42
Figure 5.3: DoS traffic send from the HULK tool

the target.

APPENDIX III

As shown in above figure using the field provided in the DVWA application for the
different types of attack. We as passed the payloads from the input filed.

43
Figure 5.4: Payload passing from the DVW Application

APPENDIX III
the passed http traffic from the DVWA site is collected at the web proxy Burp
Suite

Figure 5.5: HTTP traffic captured at the Burp Suite tool

44
APPENDIX A

Questions During the Mid Defense


1)There was the question for the accuracy of the model in the graphical represen-
tation. I have corrected this and train the model by balanced data set and get the
better result presented in the section 4.5 and 4.6

2) There was question about the performance of the system and time taken to
process in the WAF and its impact in the web application performance. I have
discussed this topic in the sectin 4.8.

45

You might also like