You are on page 1of 14

Article

Detection of Denial of Service (DoS) attacks in Cloud


IoT with environmental sensors, using anomaly
detection algorithms.
Sergio M. Martinez 1
1 Group of Artificial Intelligence Applications. Department of Software Engineering and Artificial Intelligence.
Faculty of Computer Science, Office 420 bis. Universidad Complutense de Madrid. Address: Calle Profesor
José García Santesmases, 9, Ciudad Universitaria, 28040 Madrid, Spain. E-mail: sergim13@ucm.es
Version September 6, 2019 submitted to Journal Not Specified

1 Abstract: Nowadays computer attacks are quite common and every day new types of threats are
2 discovered putting in danger the normal activities of the computer systems producing serious losses
3 to the companies. Given this, it is necessary to take into account the Cyber Situational Awareness
4 of our network and our system, combined with new cognitive approaches we seek to have a more
5 effective detection of attacks and a better response time. This research proposes the implementation
6 of a real-time denial of service attack detection module that works in a Cloud IoT environment using
7 a variety of remote environmental sensors. In the tests under this architecture, a DoS attack was
8 simulated where the data from the network frames were captured in its normal operation and under
9 attack. The obtained data was analyzed and an experimental data set was proposed and together
10 with the KDD-Cup99 data set, two algorithms for network anomaly detection were evaluated. The
11 unsupervised algorithms HBOS and OCSVM were used. Their performance was measured using the
12 ROC graph, determining that the best algorithm for the proposed purpose is the HBOS because of its
13 good performance and speed.

14 Keywords: Cloud IoT; environmental sensors; anomaly detection; DoS attacks; Cyber Situational
15 Awareness.

16 1. Introduction
17 Any computer system regardless of its platform is vulnerable to some kind of attack. Given this,
18 the need arises for the development and constant research of tools that can help in the detection and
19 mitigation of these malicious attempts that are constantly evolving. Artificial Intelligence (AI) and the

20 Cyber Situational Awareness are becoming leading tools in the field of computer security thanks to the
21 good performance of new algorithms developed with these technologies and methodologies.

22 The environment proposed in this work follows a "Cloud IoT" architecture for the implementation
23 of an assisted living system for the elderly. IoT devices such as environmental sensors connected to a
24 cloud in a "centralized cloud" scheme will be used for the control and customization of environments

25 based on preferences and needs of the elderly. This system handles large amounts of real-time data
26 thought the network from sensors that constantly measure environmental parameters, thus requiring

Submitted to Journal Not Specified, pages 1 – 14 www.mdpi.com/journal/notspecified


Version September 6, 2019 submitted to Journal Not 2 of 14
Specified

27 high availability. So there is a need to add an extra layer of security as a network level anomaly
28 detection system. The security system must be able to differentiate between normal network traffic
29 and a real time Denial of Service (DoS) attack.

30 HBOS (Histogram-BasedOutlier Score) and OCSVM (One-Class Support Vector Machine) anomaly
31 detection algorithms were used. The HBOS was chosen based on its speed and low consumption
32 of computational resources, while the OCSVM was used as a reference to contrast the results of the
33 HBOS algorithm. For the evaluation of these algorithms two data sets were used, the first known
34 as KDD-Cup99 and the second a custom data set, obtained based on the implementation of a test
35 environment with IoT Cloud architecture and the simulation of a DoS attack.

36 Based on the tests carried out with the two algorithms and the data sets used, it was concluded
37 that the HBOS algorithm is the best option for the implementation of a DoS attack detection system in
38 real time, since excellent results were obtained and mainly its data processing speed is quite high.

39 This research is structured as follows: In the state of the art there is a summary of the technologies
40 used and the conclusions of some previous works on which this research is based. Then the
41 methodologies used for the implementation of the testing environment and the obtaining of the
42 custom data set are exposed. Finally, the contributions of the research are described and the results
43 and conclusions are analyzed.

44 2. State of the art

45 2.1. Cloud IoT


46 A Cloud refers to an IT Environment that is designed for the purpose of provisioning IT resources
47 remotely, scalably, and measurably. IT resources comprise the use of computers to store, retrieve,
48 transmit and manipulate data or information [6].

49 The Internet of Things (IoT) refers to the point in time at which there are more "Things" or "Objects"
50 connected to the Internet than people. This connected environment will generate large amounts of
51 data that must be processed, stored and presented in a simple and accessible way [25]. According to
52 the numbers shown by Cisco IBSG, it is estimated that the Internet of Things was reached between
53 2008 and 2009 [8]. We can understand IoT devices as devices other than standard computers that
54 have technology that allows them to establish a connection with a network, having data transmission

55 capabilities that allow them to interact over the Internet and can be monitored and controlled. These
56 devices can be: Intelligent sensors, speakers, clocks, intelligent actuators, "wereable" devices, etc [31].
57 Cloud computing is built on the sharing of resources, which is a key requirement for an IoT
58 platform. The cloud, apart from sharing resources, is always looking to maximize those resources
59 and is also independent of location as it can be accessed from anywhere and with any device that
60 has an active Internet connection. This suggests that a convergence between IoT and the Cloud
61 can generate great opportunities for both technologies. Cloud computing can provide the virtual
62 infrastructure to manage the IoT paradigm, integrating monitoring devices, storage devices, analytical

63 tools, visualization platforms and data delivery to the user [25].

64 As a result of the convergence between IoT and the Cloud, the architecture of IoT-Centric Cloud
65 is born, where IoT devices are connected to different local clouds and these are connected to a central

66 cloud. In this way the processing and storage of information is close to the user and to the data source.
67 The local cloud is created on demand and consists of a device capable of providing processing, storage
68 and network capabilities necessary to provide service to users for a defined time, local clouds can
69 involve a large number of nodes (sensors, actuators, smart phones, etc). In the other hand, the global
70 cloud is the most traditional form of cloud, within which there is a scalable and flexible processing and
71 storage capacity, being thus the main structure or "Backbone" of the network [10].
72 Figure 1 shows the general scheme of the centralized IoT Cloud architecture.
Figure 1. General scheme of the centralized IoT-Cloud architecture, where the IoT devices are
distributed in several local clouds in charge of providing resources to the devices and serving as
a bridge of communication with the global cloud.

73 2.2. MQTT IoT protocol


74 IoT devices and sensors generally have limited processing and storage capabilities, connected
75 in a dynamic wireless structure where several devices can be connected and disconnected replacing
76 and/or augmenting connection nodes. This situation makes a traditional connection paradigm very
77 complex to implement and maintain. This problem can be solved by using centralized data
78 communication in which information is delivered to receivers not based on their network
79
addresses, but on their content and interests. [24] MQTT (MQ Telemetry Transport) is a connectivity
80
protocol for IoT. This protocol is based on a centralized data communication paradigm using
81
messaging systems based on Publication/Subscription (pub/sub). The centralized node is called
82
"Broker" and is in charge of managing the so-called topics, which are used to identify devices
based on the contents of their messages, the broker is in charge of managing customer subscriptions
83
to the topics and the routing of messages received from the devices within each topic. It was
84
designed as an extremely light and simple publication/subscription message carrier, so that it
85
optimizes the resources it uses on the devices on which it works. Its design also allows minimizing
86
the network bandwidth used trying to ensure reliability and a certain degree of security in the
87
delivery of messages. It is all these features that have made the MQTT protocol a fairly popular
88
standard in IoT applications.[22]
89
2.3. Cyber Situational Awareness.
90
Situational Awareness is a multifaceted phenomenon that can be understood from a technical
91
and cognitive viewpoint. In a technical way Situational Awareness comes down as the compiling,
92
processing and fusing of multivariate data. The main goal is to have pieces of information that can
93 be related and evaluated to each other. In the other hand the cognitive viewpoint concerns the
94 human capacity of understand the technical implications of the data and to draw conclusions
95 accordingly. We can take the Cyber Situational Awareness as a part of the Situational Awareness in
96 which we take into account the "Cyber" environment. The data for the technical viewpoint can be
97 obtained from "IT sensors" that can be fed into a data fusion process or be directly interpreted as the
98 application requires. This Cyber environment includes any kind of computer network activity in
99 which suspicious activity can occur. The detected activity at any level of the TCP/IP stack provides
100 insight, that fused with other state parameters can provide basic data for obtaining the overall
situational awareness.[40]
101

2.4. Intrusion detection systems using Artificial Intelligence.


102

103
Artificial Intelligence is currently in a period of boom and growth thanks to new trends in
104
development and research and thanks to technological advances at hardware levels, which support
105
the processing and storage needed to operate the AI algorithms [27]. Based on this, the need arises
106
to identify and classify its integration with other types of current software development based on
the following points: [26]
107
• Point of application: Defined as the "when" and "where" the AI is integrated into the system.
108
With this definition we can have three levels: At process level, at product level and at execution
109
time level. At the process level we understand that AI is being used in the software
development
110 process and that it does not necessarily directly affect the source code that will be deployed. At
111 the product level we have that AI does directly affect the source code, such as an intelligent
112 system that is able to edit the code to repair errors and development bugs. The execution level
113 represents AI applications that affect the software during its execution time, such as self-
114 adaptive systems that have active feedback during its operation.
115 • Level of automation: This refers to the level of freedom of action that the AI used in the system
116
will have. At low levels of automation, AI will be limited to collecting and sending relevant
117
data to a person who will understand the information and then determine actions. The more
118
freedom of action the AI has, the less human interference there will be, at higher levels the AI
119
will be able to suggest courses of action to the human operator or ultimately will be able to
directly take decisions, perform actions and only inform the person of the actions taken and
120
their results if the AI deems it necessary.
121
• Technology used: The IA is a field in constant change and evolution, reason why it is difficult to
122
arrive at a defined consensus of which are its main technologies, but generalizing it is possible
123
to arrive at defining 5 types. Algorithms that use inverse deduction, backpropagation, genetic
124
programming, probabilistic inferences and kernel machines.
125

Intrusion Detection Systems (IDS) are systems dedicated to computer security and
126
communication system security. Generally speaking, these systems maintain a basic architecture
127
based on four types of functional blocks: [34]
128

• E blocks: Blocks of Events, composed by sensory elements that monitor the system to protect,
129
acquiring the information related to the events that occur for later analysis in other blocks.
130 • D blocks: Database Blocks are elements oriented to store the information coming from the E
131 blocks.
132 • A blocks: Analysis Blocks are the processing modules for the analysis of events and thus be
133 able to detect possible hostile behavior towards the system to be protected, detecting a possible
134 hostile event, a type of alarm will be generated to take action in response.
135 • R blocks: Response Blocks, if a hostile event has been detected this block is responsible for
136 executing a response to the threat.
137
Figure 2 shows how the different Blocks of an IDS can interact with each other. One or several
138 Blocks E obtain the data of the monitored environment, these blocks can pass directly their data to
139 Blocks A and/or they can store them in Block D. Block D stores the data and delivers them later to
140 the blocks that may need them. There may be several A Blocks in charge of processing several types
141 of parameters, in the same way these blocks can be cascaded, representing several stages of
142
processing. Block R takes the answers to be made based on the answers of the analyses carried out
143
by Blocks A and based on data saved by Block D.
144

Figure 2. Functional blocks of an intrusion detection system.

145 The IDS systems could be classified through the analysis performed by Blocks A, they can be
146
based on Signatures or based on Anomalies. The systems based on signatures look for defined
147
patterns between the analyzed data, for this the systems are based on attacks already known of
148
previous way, reason why they are not capable of detecting an "attack of day zero" when not having
149
a previous registry of this attack. While systems based on anomalies seek to define the "normal"
functioning of the system to protect and generate an alarm in case of an event that is considered
150
anomalous or
151 outside the general guidelines of the system, this way if they are able to detect zero-day attacks. [34]
152 Anomaly-based Network Intrusion Detection Systems (A-NIDS) are security systems that have been
153 successfully integrated with AI at the execution level, since the module in charge of processing the
154 data is working together with the rest of the system. The methods currently implemented are still far

155 from perfect, so there are several approaches and directions in which improvements can be made to
156 generate more efficient detection methods.[4]

157 In the detection of anomalies at network level it must be taken into account that network traffic
158 is extremely complicated and new applications emerge every day with their own traffic patterns, so
159 researchers in the field have had enough problems to obtain and build a model that understands the
160 normality of the network in a generalized way. Therefore, network anomaly detection systems are
161 currently trained and specialized in specific systems.[37]

162 2.5. Anomaly detection algorithms.


163 There are several types of anomaly detection algorithms that can be classified as supervised,
164 unsupervised and semi-supervised. The supervised ones normally start from a set of training data that
165 contain normal registers and anomalies but being surpassed to a great extent by the normal data. The

166 unsupervised are algorithms that do not use training data to make their prediction. The general idea
167 of this type of algorithm is to use the intrinsic characteristics of the data to perform a classification.
168 Semi-supervised algorithms use training data but only with normal data, called single class classifiers,
169 the algorithm is expected to learn the patterns of normal data to perform classification. Another way
170 to classify the anomaly detection algorithms is based on the practical requirements that are necessary,

171 in some cases it may be necessary to have algorithms that are quite fast, around times that would be
172 considered "real time" but sacrificing their overall performance. In other cases the fact of allowing an
173 anomaly to pass would compensate the time that the algorithm is taken in the data analysis, in the
174 same way it is important to consider the point in the time in which you want to detect the anomalies,
175 since in some applications a detection in real time may be necessary as in others it can be considered a
176 post-incident detection.[2]

177 This paper focuses on the OCSVM algorithm and the HBOS algorithm. The selection criteria were
178 based on the versatility and flexibility of these algorithms and specifically the HBOS for its speed and

179 low consumption of computational resources, which makes it a good option for a real-time detection
180 system [2]. The OCSVM algorithm was selected to contrast the results of the HBOS algorithm, since
181 the supporting vector machines have been successfully used for the detection of anomalies in network
182 layer and in various other situations with varied data obtaining favorable results, so they are a good
183 reference for how an anomaly detection algorithm should work. [11]

184 The OCSVM algorithm (One-Class Support Vector Machine) is a kernel machine built on a
185 geometric model, which is solved by means of a problem of optimization with restrictions, the general
186 idea is to look for a hyperplane which is able to divide the space of characteristics (Feature Space),
187 separating a single class from the rest of data. [14] It is commonly used for detection of semi-supervised
188 anomalies, in this case the algorithm is trained with data free of anomalies or normal data, so that it
189 can then classify the anomalies of the rest of the data. This algorithm can also be used unsupervised
190 where it is trained with all the available data, including anomalies, and then each instance of the data

191 is punctuated with a normalized distance to the desired decision limit. [2]

192 The HBOS algorithm, Histogram-Based Outlier Score, is known to be quite simple, fast and
193 based on statistical models for the detection of anomalies, in which an independence between the
194 characteristics is assumed. The main idea is that for each instance of the data set an univariate
195 histogram is created, the inverse height of the bins of all characteristics, which represent the density
196 estimation, are multiplied. The histograms are normalized to have a maximum value of 1, thus
197 ensuring equal weights for each of the characteristics. Assuming independence between characteristics

198 gives the algorithm a very high processing speed with respect to other types of anomaly detection
199 algorithms. [2] The HBOS algorithm is denoted as an unsupervised anomaly detection algorithm, so
200 it does not need a training stage with the data, directly with the data set the algorithm calculates an
201 anomaly score based on its histograms, in performance tests it has been shown to be a fast algorithm,
202 especially in large data sets and quite accurate in global anomalies, but it has the problem of failing
203 with local anomalies, which cannot be calculated by histograms.[16]

204 3. Materials and Methods


205 For the testing of the DoS attack detection module, the Python programming language was used
206 with the open source distribution "Anaconda" on a computer with Windows 10 operating system [20].
207 The algorithms were implemented using the "PyOD" library dedicated to anomaly detection. This
208 library consists of a group of specialized tools for the implementation of anomaly detection algorithms
209 in varied data.[21]

210 The architecture used as the test environment for data capture is shown in Figure 3, which consists
211 of a global cloud, a local cloud and several IoT devices and sensors connected to the local cloud. The
212 main idea is to have a controlled environment in which the captures of the dataset, necessary for the
213 evaluation of the algorithms, can be made.

Figure 3. Test environment implemented.

214 For the global cloud was used a computer with "Linux" operating system, this is responsible for
215 receiving all data from local clouds. A public cloud was not used in the test environment because
216 they have their own defense systems that could interfere with simulated attacks against the system.
217 The local cloud was implemented using a "Raspberry Pi 2 model B" with Linux operating system and

218 installed open source software called "Eclipse Mosquitto". This software is a IoT device broker that
219 uses the MQTT protocol, used to manage the connections and messages sent between the IoT devices

220 and the network. The data collected by the local cloud is sent via the wifi network to the global cloud

221 and managed through Python’s open source library called "Paho-mqtt". The IoT environment was
222 simulated by means of a temperature sensor connected to an "Arduino" device and a "Raspberry Pi 2

223 model B" simulating IoT data in MQTT protocol. The figure 4 shows the temperature sensor "DHT22"

224 used and the Arduino device used in the test environment.
Figure 4. Temperature sensor and Arduino device.

225 To simulate traffic corresponding to a denial of service attack, open source software called
226 "HyeaneFE" was used. This tool is a network packet generator, quite flexible and advanced, which
227 allows users to generate various types of DoS and DDoS attacks, providing various parameters to be
228 configured such as packet type (TCP, UDP, ICMP), number of packets sent, packet size, delays, etc.[19]
229 The tool used for monitoring and capturing packets was "Wireshark", which is an open source tool
230 known as a network protocol analyzer. This tool allows for in-depth analysis of network packets, data
231 capture, and the ability to save data for later analysis, making it the ideal tool for dataset capture and

232 subsequent testing with anomaly detection algorithms.[32] When choosing which datasets are going
233 to be used, it is necessary to bear in mind that to design an effective intrusion detection system (IDS)
234 based on Machine Learning, it is necessary to have an evaluation dataset that is equally effective.[19]
235 The datasets are composed of several parameters or characteristics, known as "features", which should

236 be chosen taking into account the impact they have on the environment they characterize. The current
237 position based on researches carried out is that the number of parameters chosen needs to be as small

238 as possible.[18] Two datasets were used in the evaluation of the algorithms. The first, the "KDD-Cup99
239 Dataset", is generally used for unsupervised anomaly detection and it was artificially created based on
240 simulations of normal traffic and attacking traffic at network level under the IP protocol in computer

241 networks environment. It is generally used to test intrusion detection systems. The data set has a
242 total of 42 parameters and a total of 620098 data registers, among which there are 1052 anomalies
243 representing various types of attacks including DoS attacks. This dataset was obtained from the
244 Hardvard University website.[23] The second dataset used was obtained from the captured data in the
245 implemented test environment. The parameters were chosen from data that can be obtained directly
246 from the network frames coming from the local clouds, since the system is focused to work in real time,
247 any unnecessary processing in the data consumes extra time and resources. The dataset obtained has a
248 total of 12 parameters which are described in table 1. In this dataset the parameters corresponding to
249 the used protocol were omitted since they do not have a major impact on the detection of DoS attacks.

250 This is because the attacking packets that should be catalogued as anomalies use the same network
251 protocols as the packets corresponding to legitimate traffic, so it is difficult to detect anomalous patterns

252 with those parameters.[18]


Table 1. Parameters used for the captured dataset

Name Description
Parameter with great impact in the detection of DoS attacks, the
Packet length attacking packets have similar sizes.
Packet header field, contains a value representing the number of nodes
Time to live - TTL through which the packet passed from its source to its destination. It
may have distinctive patterns in the counterfeit IP addresses from
which the attacks are generated.
TCP packet header Used to represent the status of the received packet, their values may be
flags distinctive from some DoS attacks. In total we have 10 sub-parameters
within the TCP-SYN flags.

253 4. Evaluation and results


254 The tests were performed in two stages, one corresponding to the algorithms evaluated with the
255 KDD-Cup99 data set and the second corresponding to the dataset captured from the test environment.
256 It was chosen to perform the tests in this way since the KDD-Cup99 data set has already been used in

257 conjunction with the HBOS and OCSVM algorithms, obtaining performance metrics,[2] and is a good

258 reference to become familiar with the algorithms and check their proper functioning and the tuning of
259 their various hyper-parameters. The ROC graph and the value of the area under the AUC curve were

260 used to visualize the evaluation metrics.

261 4.1. Stage 1 - Evaluation with data set KDD-Cup99


262 The "HBOS" algorithm was initially evaluated. The most relevant parameter of this algorithm
263 is the number of bins used to represent the histogram on which it works, using a value of n_bins =
264 15, it came to have an ROC value of 0.993, which coincides with that obtained by [2]. The ROC graph

265 obtained can be seen in figure 5 where it can be seen that the area under the AUC curve is quite close

266 to the unit.

Figure 5. ROC - HBOS - KDD-Cup99 chart.

267 For the OCSVM algorithm, before being able to evaluate its performance with the KDD-Cup99
268 dataset it is necessary to go through a training phase using instances of the same dataset but
269 corresponding only to the normal class. Once the algorithm has been trained, it is evaluated where
270 its most important parameter is gamma, which is a coefficient used for the kernel of the algorithm.
271 The kernel is also considered as a parameter, in this case the "rbf" kernel was chosen. We used the
272 gamma value that the algorithm uses by default, obtaining a ROC value equal to 0.9528. This value
273 and the computational time handled by the algorithm coincide with the results of the investigation of
274 [2]. Figure 6 shows the ROC graph obtained, where a step generated by the erroneous results of the
275 algorithm can be observed.

Figure 6. ROC - OCSVM - KDD-Cup99 chart.

276 The tests performed with this dataset were useful to familiarize and check the correct functioning
277 of the PyOD library, its algorithms and its evaluation functions. The results were verified based on the
278 research already carried out by [2], arriving at the same conclusions.

279 4.2. Phase 2 - Evaluation with the captured data set


280 As with the KDD-Cup99 data set, the first algorithm to be evaluated will be the HBOS, with a
281 parameter of n_bins = 28 an ROC value equal to 0.881 was obtained with a processing speed as fast as
282 the evaluation of the KDD-Cup99 data set. Figure 7 shows the ROC graph obtained, where the step
283 generated by the erroneous results of the algorithm is more pronounced.

Figure 7. ROC - HBOS - Captured Dataset chart.

284 For the OCSVM algorithm in the same way as the previous data set, a training stage was first
285 used with only normal classes of the captured data set. After training, the evaluation was carried out
286 using a gamma parameter equal to 300. An ROC value of 0.919 was obtained, giving a better result
287 than the HBOS algorithm. Figure 8 shows how a step is formed due to the error in the predictions of
288 the algorithm.
Figure 8. ROC - OCSVM - Captured Dataset chart.

289 5. Discussion
290 Based on the results obtained from the ROC in the tests we can observe that the two algorithms,
291 both the HBOS and the OCSVM, have quite good values that exceed 85%. In the KDD-Cup99 data
292 set we have that the HBOS obtained an almost perfect score and was much faster than the OCSVM
293 algorithm, but in the captured data the opposite was given since the OCSVM obtained a better result in
294 the predicted values, but in the processing time the HBOS algorithm is much faster than the OCSVM

295 algorithm. This variation in the ROC value between the two datasets is due to the structure and the
296 parameters used in each dataset, since the KDD-Cup99 dataset has different parameters than the
297 captured dataset. The intrinsic properties of the parameters of the captured dataset make the OCSVM

298 algorithm work better than the HBOS algorithm. If tests were performed with different datasets in
299 which their parameters are changed, we will also be able to change the behavior of each algorithm.

300 Figure 9 shows the ROC curves belonging to the HBOS algorithm and the OCSVM algorithm
301 using the KDD-Cup99 data set. The area under the curve represented by the OCSVM algorithm is
302 smaller than that produced by the HBOS algorithm. This result is interpreted with the HBOS algorithm
303 being more accurate in this data set.

Figure 9. ROC Curve - HBOS vs OCSVM - KDD-Cup99 Data.

304 In Figure 10 the ROC curves belonging to the HBOS and OCSVM algorithm can be observed
305 using the dataset captured in the tests. In this case it is observed that the area under the curve is
306 smaller for the HBOS algorithm compared to the OCSVM algorithm. Having the opposite case than in
307 the previous data set where the HBOS was more accurate.
Figure 10. ROC Curve - HBOS vs OCSVM - Captured Data.

308 While the OCSVM algorithm has about 4% better forecast on captured data, the HBOS algorithm
309 also has a fairly high forecast and this added to its processing speed makes it the best choice for a
310 network anomaly detection system that works in real time, since the delay you have in data analysis
311 will be minimal. It must be taken into consideration that the data coming from the network flow have a
312 ratio with respect to the time quite high so the processing speed is the main factor to take into account.
313 It should also be noted that denial of service attacks are based on the overwhelming number of packets
314 sent to saturate the systems, with this in mind does not affect much having a few badly catalogued
315 packets because when detecting the vast majority of the packet flow from the attack, it will be enough

316 to issue an alert. The table 2 shows the results obtained by the algorithms using the 2 datasets.

Table 2. Comparison of the results of the algorithms HBOS and OCSVM.

Dataset Algorithm Obtained ROC


KDD-Cup99 HBOS 0.993
KDD-Cup99 OCSVM 0.9528
Captured Dataset HBOS 0.881
Captured Dataset OCSVM 0.919

317 6. Conclusions
318 This work implements a TCP-SYN denial of service attack detection module to protect an assisted
319 living system that could be applied in other similar scenarios. The Cyber Situational Awareness is
320 achieved by detecting abnormal network data patterns and behaviors. The proposed system offers a
321 great capacity of reasoning about noisy data, such as network frames, which are quite dependent on
322 the applications used. Another functionality that the system makes available is the ability to create and
323 use workflows and processes for the detection of attacks and to be able to take actions accordingly. In

324 the same way the system allows to be one step ahead of the possible actions of the attackers, being able
325 to detect zero-day attacks. Finally, strategic planning has to go hand in hand with network anomaly
326 detection systems, as it is necessary to have a clear and defined planning in order to determine which

327 network traffic will be considered "normal" in the future.

328 Another contribution of this research was the analysis of the properties and characteristics of
329 the data captured from the simulated environment, coming from various remote sensors in a Cloud
330 IoT environment, in order to propose an experimental data set that depends solely on data captured
331 directly from the network frames produced by normal traffic and by traffic under DoS attack.

332 An investigation of the operation of an IoT Cloud network architecture was conducted and a test
333 environment was implemented using IoT devices, a local cloud functioning as a data broker under the
334 IoT MQTT protocol and a global cloud in which data from the devices is processed. The necessary
335 technologies to implement the described architecture and the necessary tools to carry out a TCP-SYN
336 denial of service attack were explored, maintaining the controlled environment in order to capture the
337 network packets sent and received in the global cloud.

338 Unsupervised anomaly detection algorithms, HBOS and OCSVM were implemented in Python
339 using the library dedicated to anomaly detection algorithms called "PyOD". The performance tests of

340 the algorithms were divided into 2 stages and the ROC (Receiver Operator Characteristic) graph and
341 the AUC (Area Under the Curve) metric were used as performance measures. In the first stage, the data
342 set called KDD-Cup99 was used, which bases its data on normal network traffic and on traffic under
343 various computer attacks. This first stage had the purpose of serving as a means of familiarization
344 with the algorithms, and its results were contrasted with the results of investigations already carried
345 out, confirming its good functioning. The second stage was the evaluation of the captured data set
346 where the good performance of the algorithms could be appreciated.

347 It was concluded that the best algorithm for the implementation of the DoS attack detection
348 module is the HBOS algorithm. This decision was based on its good performance in the tests with
349 the captured data and mainly in the high speed at which the algorithm processes the data. It was
350 determined that the processing speed was the most important factor based on the need to implement

351 the detection module in real time. While this algorithm performed slightly less than the OCSVM
352 algorithm in the evaluation metrics with the captured data, it was concluded that it maintains an
353 equally acceptable performance for the detection of DoS attacks, as having few badly classified packets
354 does not affect the goal of detecting and fully alerting an incoming attack.

355 In general, the feasibility of implementing an anomaly detection module was determined using an
356 algorithm that does not require prior training nor does it require labeled data sets for operation. It was
357 determined that the use of data captured directly from network frames without further pre-processing
358 is feasible for the detection of DoS attacks and that the anomaly detection module can be implemented
359 in the proposed IoT Cloud architecture.

360 Possible continuations for the present work can be raised based on improving the performance
361 of the HBOS algorithm, testing different combinations of parameters in the captured data sets. Since
362 each parameter reflects specific points and situations of the network state, different combinations of
363 parameters should be tested to improve detection of anomalous packets and test new combinations
364 to detect more types of attacks that may occur. The proposed test environment can be improved,
365 including the global cloud anomaly detection module. In this way, the performance of the HBOS
366 algorithm can be checked by receiving data from the network in real time. It would be interesting
367 to perform various types of attacks to see the behavior of the algorithm and the variations in the
368 parameters of the network data that these attacks cause. In the same way, it would be necessary to
369 investigate the implementation of a module that receives the alert signals from the attack detection
370 module and that can perform actions on the network. The detection module should be improved
371 so that apart from being able to detect the attacks, it is able to identify what type of attack is being
372 suffered and depending on this different actions can be taken to mitigate each type of attack.

373 Author Contributions: For research articles with several authors, a short paragraph specifying their individual
374 contributions must be provided. The following statements should be used “conceptualization, X.X. and Y.Y.;
375 methodology, X.X.; software, X.X.; validation, X.X., Y.Y. and Z.Z.; formal analysis, X.X.; investigation, X.X.;
376 resources, X.X.; data curation, X.X.; writing–original draft preparation, X.X.; writing–review and editing, X.X.;
377 visualization, X.X.; supervision, X.X.; project administration, X.X.; funding acquisition, Y.Y.”, please turn to
378 the C R ed iT t ax on o my for the term explanation. Authorship must be limited to those who have contributed
379 substantially to the work reported.

380 Funding: Please add: “This research received no external funding” or “This research was funded by NAME OF
381 FUNDER grant number XXX.” and and “The APC was funded by XXX”. Check carefully that the details given
382 are accurate and use the standard spelling of funding agency names at https://search.crossref.org/funding, any

383 errors may affect your future funding.

384 Acknowledgments: This work has been made possible by the support of SECTEI (Subsecretaría de Ciencia,
385 Tecnología e Innovación de la Ciudad de México) for the second author during his postdoctoral studies at the
386 Universidad Complutense de Madrid.
387 Conflicts of Interest: The authors declare no conflict of interest.

388 References
389 1. FE. Grubbs, Procedures for Detecting Outlying Observations in Samples. Technometrics 1969.
390 2. M. Goldstein and S. Uchida, A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for
391 Multivariate Data. PLoS ONE 2016.
392 3. P. Garcia and J. Díaz, Anomaly-based network intrusion detection:Techniques, systems and challenges.
393 Elsevier 2008.
394 4. A. Gurina and E. Vladimir, Anomaly-Based Method for Detecting Multiple Classes of Network Attacks.
395 MDPI 2019.
396 5. Cisco - What are the Most Common Cyberattacks?. Available online:
397 https://www.cisco.com/c/en/us/products/security/common-cyberattacks.html (accessed on 15
398 May 2019).
399 6. T. Erl and R. Puttini, Cloud Computing: Concepts, Technology, & Architecture. Prentice Hall 2013.
400 7. M. Zekri and S. El Kafhali, DDoS Attack Detection using Machine Learning Techniques in Cloud Computing
401 Environments. IEEE 2017.
402 8. D. Evans, The Internet of Things - How the Next Evolution of the Internet is Changing Everything. White
403 Paper. Cisco Internet Business Solutions Group 2011.
404 9. V. Chandola and A. Banerjee, Anomaly Detection: A Survey. ACM Computing Surveys 2009.
405 10. A. Biswas and R. Giaffreda, IoT and cloud convergence: Opportunities and challenges. IEEE Word Forum on
406 Internet of Things (WF-IoT) 2014.
407 11. M. Behniafar and A. Nowroozi, A Survey of Anomaly Detection Approaches in Internet of Things. ISeCure
408 2018.
409 12. Top iot vulnerabilities. Available online: https://www.owasp.org/index.php/Top_IoT_Vulnerabilities
410 (accessed on 26 May 2019).
411 13. I. Butun and B. Kantarci, Anomaly Detection and Privacy Preservation in Cloud-Centric Internet of Things.
412 IEEE 2015.
413 14. L. Bouchra and A. Gjini, Anomaly Detection Using Similarity-based One-Class SVM for Network Traffic
414 Characterization. 2018.
415 15. State of the IoT 2018: Number of IoT devices now at 7B. Available online:
416 https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/ (accessed
417 on 29 May 2019).
418 16. M. Goldstein and A. Dengel, Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly
419 Detection Algorithm. 2012.
420 17. T. Salman and D. Bhamare, Machine Learning for Anomaly Detection and Categorization in Multi-cloud
421 Environments. IEEE CSCloud 2017.
422 18. I. Cvitic and D. Perakovic, Network Parameters Applicable in Detection of Infraestructure Level DDoS
423 Attacks. TELFOR 2017.
424 19. M. Alkasassbeh and E. Hawari, Towards Generating Realistic SNMP-MIB Dataset for Network Anomaly
425 Detection. 2016.
426 20. Anaconda Distribution - Home Page. Available online: https://www.anaconda.com/distribution/ (accessed
427 on 06 Jun 2019).
428 21. Zhao, Yue and Nasrullah, Zain and Li, Zheng, PyOD: A Python Toolbox for Scalable Outlier Detection.
429 Journal of Machine Learning Research 2019.
430 22. MQTT Protocol. Available online: http://mqtt.org/ (accessed on 06 Aug 2019).
431 23. KDD-Cup99 Dataset. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF
432 (accessed on 06 Aug 2019).
433 24. Stanford-Clark, Andy and Linh Truong, Hong, MQTT For Sensor Networks (MQTT-SN) Protocol
434 Specification 2013.
435 25. Gubbi, Jayavardhana and Buyya, Rajkumar, Internet of Things (IoT): A vision, architectural elements and
436 future directions. 2013.
437 26. Feldt, Robert and de Oliveira, Francisco, Ways of Applying Artificial Intelligence in Software Engineering.
438 2018.
439 27. Russell, Stuart and Norvig, Peter, Artificial Intelligence. A Modern Approach 2004.
440 28. Domingos, Pedro, A few useful things to know about Machine Learning. 2012.
441 29. Atzori, Luigi and Iera, Antonio, The Internet of Things: A Survey. 2010.
442 30. Parwekar, Pritee, From Internet of Things towards cloud of things. 2011.
443 31. Soro, Alessandro and Brereton, Margot, The Messaging Kettle: It’s ioTea time. 2015.
444 32. Shaoqiang, Wang and DongSheng, Xu, Analysis and Application of Wireshark in TCP/IP Protocol Teaching.
445 2010.
446 33. Kruegel, Christopher and Vigna, Giovanni, Anomaly Detection of Web-based Attacks. 2003.
447 34. Zhang, Like and White, Gregory, Analysis of Payload Based Application Level Network Anomaly Detection.
448 2007.
449 35. Lu, Wei and Traore, Issa, A New Unsupervised Anomaly Detection Framework for Detecting Network
450 Attacks in Real-Time. 2005.
451 36. Hadian, Jazi and Gonzales, Hugo, Detecting HTTP-based application layer DoS attacks on web servers in
452 the presence of sampling. 2017.
453 37. Zhang, Like and White, Gregory, Anomaly Detection for Aplication Level Network Attacks Using Payload
454 Keywords. 2017.
455 38. Herrera, Juan and Barona, Lorena, A Survey on Situational Awareness of Ransomware Attacks - Detection
456 and Prevention Parameters. Remote Sensing - MDPI. 2019.
457 39. Liu, Peng and Jajodia, Sushil, Theory and Models for Cyber Situation Awereness. Springer. 2017.
458 40. Franke, Ulrik and Brynielsson, Joel, Cyber situational awareness – a systematic review of the literature
459 Computers & Security. 2014.

460 §c 2019 by the authors. Submitted to Journal Not Specified for possible open access
461 publication under the terms and conditions of the Creative Commons Attribution (CC BY) license
462 (http://creativecommons.org/licenses/by/4.0/).

You might also like