You are on page 1of 9

A Honey Net, Big Data and RNN Architecture

for Automatic Security Monitoring


of Information System

Alaeddine Boukhalfa(&), Nabil Hmina, and Habiba Chaoui

Ibn Tofail University, Kenitra, Morocco


{alaeddine.boukhalfa,mejhed90}@gmail.com,
hmina@univ-ibntofail.ac.ma

Abstract. The security monitoring of the information system represents a


major concern for organizations. Attackers can use multiple and different ways
to harm or abuse system resources, this variety of attacks raises issues related to
how to treat it. In addition, these attacks can evolve and be undetectable by the
existing methods of security. To solve these problems, we propose, in this paper,
the implementation of an automatic security monitoring system of the infor-
mation system, based on exposing Honeypots and collecting data of attacks
from them, storing the variety of attacks using Big Data techniques, and pro-
cessing and analyzing them by Recurrent Neural Network (RNN) which is a
Deep Leaning method, in order to extract knowledge from these threats and face
the others unknown similar.

Keywords: NIDS  Security monitoring  Big data  Deep learning  RNN 


Honeypot  Honey net

1 Introduction

Currently, with the rapid evolution of information technology and its large application
in all life domains, the detection and prevention of intrusions constitute a major security
challenges. The role of Network Intrusion Detection Systems (NIDS) is to inspect and
analyze data traffic passing through network in order to detect anomalies, and raise an
alert or block communication between the communicating entities. This analyze is
based on predefined algorithms and rules that rely on signatures or suspected traffic
behavior.
Moreover, attackers and intruders are always trying with impatience to find new
means and possibilities to destroy the security obstacles and attack the system. While
current methods of analysis and detection of the NIDS are inefficient, they do not
evolve to automatically detect and reveal this unidentified new manner of attacks. What
makes thinking about smart new methods of analysis and recognition which can adapt
to changing aspects of menaces.
In addition, attacks can be of different shapes like SQL Injection or probing attacks,
this issue of variety of attacks pushes us to find solution to treat them all in a unique
way.

© Springer Nature Switzerland AG 2019


M. Ezziyyani (Ed.): AI2SD 2018, AISC 915, pp. 800–808, 2019.
https://doi.org/10.1007/978-3-030-11928-7_72
A Honey Net, Big Data and RNN Architecture … 801

To solve these problems, we have proposed, in this paper, architecture of a local


organization network which evolves to detect automatically intrusions. This new
approach relies on attracting attackers and gathering there attacks using Honeypots,
storing the variety of attacks with a Big Data storage system, and applying the
Recurrent Neural Network (RNN) which is a Deep Learning method in order to rec-
ognize attacks and exploit it to stop new threats.
The paper has the following structure. We provide an overview of related work in
Sect. 2. In Sect. 3 we present our proposed architecture. Sect. 4 is reserved for con-
clusion and future work.

2 Related Work

The idea of a NIDS based on Deep Learning was discussed in [1], the authors applied
Self-Taught Learning (STL) which is a Deep Learning approach on a set of traffic data
of network called NSL KDD [2], which contains normal records and attack records.
They compared the attack recognition performance with an old classification method
Soft-Max Regression (SMR) and proved with experimentations that the STL recog-
nizes attacks better. This solution presents a good approach to distinguish normal traffic
from suspected traffic, but no illustration of the implementation of NIDS in the real
world is described.
Almost the same work was done in [3], KDD CUP 99 [4] which is an old version of
NSL KDD and which also gathers the traffic data, was used to perform a precision
benchmarking of traffic detection, between Support Vector Machine method based on
Restricted Boltzmann Machine (SVM-RBMs) [5], and classic classification methods.
The researchers do not showed any proof of choice of the employed Deep Learning
method. The study concluded clearly that SVM-RBMs can better identify the origin of
threats and it takes evidently less time processing big amount of data.
In the paper [6], a new approach to detect code injection attacks have been
established. A new hybrid of Deep Learning called Hybrid Deep Learning Network
(HDLN) was built. The injection attacks are attached to JavaScript code. They used the
Abstract Syntax Tree (AST) to identify more features and employed three methods to
distinguish key features. HDLN has been evaluated, firstly, relatively to the number of
hidden layer, the number of filters and number of neurons, the results showed that the
accuracy is higher as the number of filters increases. Secondly, it was evaluated against
the other traditional classifiers using IG feature vectors, the accuracy was greater than
all the other classifiers. Finally, they compared the precision of the model to a work of a
machine learning already done by the team, they showed that this modern model is
more efficient than the previous. The submitted effort is prestigious, except that they
only focused on injection attacks related to JavaScript code and not all kinds of attacks.
Thanks to this solution.
Paper [7] proposes also a hybrid of Deep Learning which combines Auto-Encoder
and DBN. The Auto-Encoder was employed in order to decrease the dimensionality of
data and identify the principal features of data, whereas the DBN had the role of
discovering the malicious code. The new Deep Learning was applied on KDD CUP 99
[4], and assessed against the DBN alone, the results indicated effectively that the new
802 A. Boukhalfa et al.

Deep Learning is the best in term of accuracy, and it consumes less time than the DNB.
The authors have not specified the motivation to use this combination of DBN and
Auto-Encoder to form the presented Deep Learning model.
A distributed way to detect attacks in the space of Internet of Things (IoT) was
adopted in [8], the authors applied Deep Learning on each node of fog-to-things
networks, the purpose was to obtain independence of identifying locally attacks and
sharing parameters with neighbors, in order to accelerate the identification and optimize
the update of parameters. The safety of system was verified by NSL KDD [2], and Big
Data management system Spark to concretize the design of fog nodes. The results were
demonstrated that the scattered concept is better effective then centralized conception
and Shallow Learning which is a machine learning method. The proposal is very
interesting, it is an advanced and significant enhancement of detection, except that the
researchers have not shown exactly which method of the Deep Learning family have
been adopted.
In an environment where Big Data transits, network security monitoring was dis-
cussed, the manuscript [9] cited, in the first, the raisons of need of network security
monitoring which are prediction issues, security devices and mechanisms are not
suitable for amount data environments, abnormal alarm must be detected quickly and
equitable diagnostics of alert information must be done, the correlation algorithms are
employed only for anomaly identification and not for the full devises and tools for
network security. Secondly the authors illustrate a network security monitoring system
based on accumulation of Big Data, its integration which consists of purification and
classification, and analyzing to extract information to expose it in order to make
decisions. Finally, they give an overview on some correlation algorithms used for
analyzing data. The paper proposed a schema of a security monitoring system but it did
not think about improving correlation methods to effectively accompany the evolution
of attacks.
Another approach for security monitoring of Internet of Things (IoT) networks was
debated in [10], a big and variety of security logs was gathered from consumer elec-
tronic devices, then it was stored in a parallel way in the nodes of Big Data man-
agement system Hadoop, the operation of normalization was applied to get a unique
format of data, the analysis was performed by applying methods of aggregation and
correlation adopting Complex Event Processing (CEP) which aims to detect and
analyze information contained in the events then take action in real time, the
results was visualized using advanced tools of visualization. The principle of the
article is founded solely on displaying and reporting data without taking action against
attacks.
Our work will be different by comparing it with the existing works, it is a concrete
representation of a system of security monitoring of the organization network,
it treats any type of attack and will be evolutionary to take decisions against the new
threats.
A Honey Net, Big Data and RNN Architecture … 803

3 System Architecture
3.1 Components
The proposed architecture of monitoring security of information system is described in
Fig. 1, it is based on the establishment of the mechanisms of the collection and analysis
of attacks, to infer knowledge from them and to cope with others which have some
points of resemblance. It is composed of an organization network separated by two
firewalls. We depict the components one by one:

Fig. 1. Architecture of automatic security monitoring of the information system based on Honey
Net, Big Data and RNN

Demilitarized Zone
The Demilitarized Zone (DMZ) is an isolated subnet of the local network and the
internet. It contains:
Honey Net
Honeypots are devices dedicated to attract attackers and record information about their
attacks. So, any attempt to interact with a Honeypot server is a proof of an unwanted
activity.
804 A. Boukhalfa et al.

The interaction level of a Honeypot defines at which point the intruder can attack
the system. Low interaction Honeypots minimize the level of interaction of the pro-
vided services, however, attackers can easily reveal their natures. Medium interaction
Honeypots have a reduced set of services with some ability to hide from the attackers.
With a high interaction level, Honeypots expose themselves as attractive real machines
with complete operating systems.
Research Honeypots are intended for scientific research, they are used only to
memorize intrusion attempts, contrary to production Honeypots which protect the
system and take action when it is about menace [11].
We opt for a collection of a high interaction research Honeypots which constitutes a
Honey Net to gather as much data as possible, the non-use of production Honeypots
lies in the fact that an action produced against threats may make the hackers believing
that the machine is monitored, however, we want them to keep attacking and using
more means, it will help us to gather more data. Therefore, our Honeypots will play the
role of a network machines with security flaws intended to be attacked from the
internet.
Firewall 1
Honeypots are attractive, we have to think about the prevention of our network. The
Firewall 1 is configured to block traffic coming from the internet to the local network to
ensure its security, and allow only traffic from the internet to the DMZ. Thus, we offer
permission to intruders to attack Honeypots from the internet.
Firewall 2
Firewall 2 will stop all traffic to prevent intruders from infiltrating via Honeypots, and
allow only traffic coming from a specific ports dedicated for loading data from
Honeypots to our production machines.
In order to obtain more security, we try to get the two firewalls from two different
builders because they will have different security bugs, this will enhance the security of
our local network [12].
Thus, in case of compromise of our local network, two obstacles must be
overcome.
Intranet
It is our network of organizations which groups production machines, in addition to:
Data storage server.
Given the variety and mass of data that can be collected from our Honeypots, we
thought about setting up a Big Data management system for storing and analyzing this
data, two of the most famous open source Big Data frameworks are:
• Hadoop is a dedicated framework for storage and processing of Big Data in a
distributed way, it is based mainly on two parts, one part for storage called Hadoop
Distributed File System (HDFS), and one part for the processing called MapRe-
duce. To managing cluster, Hadoop use master/slave architecture [13].
A Honey Net, Big Data and RNN Architecture … 805

• Spark is a framework also designed for Big Data management, it does not have its
own distributed storage file, it relies on storage systems such as that of Hadoop
(HDFS) or others, but it is considered faster than Hadoop regarding iterative
treatments [14].
We will try to test them both during our futures experiments and choose the most
efficient and suitable for our architecture.
Archiving server of processed data
Because of the amount of data which can be processed over time, this server is a Big
Data management system, it is dedicated for the archiving of the data of the attacks
after the treatment by the RNN.

3.2 Treatments
After exposing the set of Honeypots, the treatment operations are as follows:
Collecting of attacks
In this phase, we use an Extract Transform Load (ETL) tool to load the data of attacks
recorded by Honeypots to the Big Data storage server, we attempt to choose the most
efficient and adequate during our researches.
We describe in detail the loading algorithm in Table 1 and following program:
LoadData (StartDate)
{
EndDate = Null;
If (StartDate == null)
{
StartDate = SysDate;
}
While (DataDate >= StartDate)
{
LoadDataToServer;
}
EndDate = SysDate;
Return EndDate;
}

With this algorithm, the data that can be recorded by the Honeypot during the
current loading will be loaded during the next loading.
Elimination of redundancy
After loading the data into the Big Data storage server, the step of elimination of
redundancy is necessary, we will try to delete duplicate data with a NoSQL request to
provide more performance to the next processing.
806 A. Boukhalfa et al.

Table 1. The description of the variables of the loading algorithm


Variable Description
StartDate Is the start date of loading data to the Big Data storage server. During
the first load, it is initialized with the current date of the operating
system where it is installed the ETL which will perform the loading
SysDate Is the current date of the operating system of machine where the ETL
which will perform the loading is installed
DataDate Is the date of the recording of the attack noted by the Honeypot
LoadDataToServer Is the operation which consists of loading data from the Honeypot to
the Big Data storage sever
EndDate Is the end date of loading data, noted by the operating system where it
is installed the ETL which will perform the loading. It is returned and
kept at the end of loading in order to be the start date of the next data
loading

Learning by Deep Learning


Although Big Data constitutes a processing challenge related to volume, variety,
velocity, and veracity, Deep Learning appears to extract knowledge and perform
predictive analytics from it, it is a promising research field [15]. It is a set of machine
learning techniques inspired from the deep structure of the human brain [3]. Several
models of Deep Learning have appeared recently, those who are original and typically
used are: Stacked Auto-Encoder (SAE), Deep Belief Network (DBN), Convolutional
Neural Network (CNN) and Recurrent Neural Network (RNN), the others are hybrids
based on these [16].
Unlike other Deep Learning models, the RNN is characterized by its memory, as
depicted in Fig. 2, it includes Input Layer, Hidden Layer and Output Layer, to make
decision at the time step t, it takes into his input in addition to the current input, the
information about decision at time step t − 1 stored in the Hidden Layer. Moreover, it
is designed to draw knowledge from sequences of data like sentences, word sequence
and speech [6]. Which justifies our choice of model, in order to have a memory on the
old attacks.
So the RNN will be applied to all the data loaded to learn about the different
attacks.
Archiving of processed attacks
At the end of each analyze by the RNN, We will archive all processed data in another
Big Data storage server so as not to lose our data, and we will empty the server which is
dedicated for processing to prepare a blank server for future operations of loading of
the new data and treatment by the RNN.
With this native and particular design of the system, we have illustrated a new
manner of monitoring security, which can progress automatically, by learning, to
discover new menaces.
A Honey Net, Big Data and RNN Architecture … 807

Reccurent Neural Network

Input Layer Hidden Layer Output Layer

Fig. 2. Design of layers of RNN

4 Conclusion and Future Work

We have presented in this paper, an automatic security monitoring system of the


information system in order to recognize old attacks and cope with new others which
look like them, our architecture is based on collection of Honeypots to collect threats,
Big Data storage server to store the amount and the variety of data of attacks, and the
applying of the RNN method to learn about menaces.
In the future, we will implement our work in a real-life environment and, at the
same time, try to exploit our RNN in a NIDS with keeping our architecture running to
allow it to know more about attacks. Moreover, we will try to schedule loading,
processing and archiving of data in order to have automatic real time monitoring of
security.

References
1. Niyaz, Q., Sun, W., Javaid, A., Alam, M.: A deep learning approach for network intrusion
detection system. In: Proceedings of the 9th EAI International Conference on Bio-inspired
Information and Communications Technologies (BICT’15). pp. 21–26. ACM, United States
(2015). https://doi.org/10.4108/eai.3-12-2015.2262516
2. NSL KDD. https://github.com/defcom17/NSL_KDD
808 A. Boukhalfa et al.

3. Dong, B., Wang, X.: Comparison Deep learning method to traditional methods using for
network intrusion detection. In: 2016 8th IEEE International Conference on Communication
Software and Networks, pp 581. https://doi.org/10.1109/iccsn.2016.7586590
4. KDD Cup 99. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
5. Yang, J., Deng, J., Li, S., Hao, Y.: Improved traffic detection with support vector machine
based on restricted Boltzmann machine. Soft Comput. 21(11), 3101–3112 (2017). https://
doi.org/10.1007/s00500-015-1994-9
6. Yan, R., Xiao, X., Hu, G., Peng, S., Jiang, Y.: New deep learning method to detect code
injection attacks on hybrid applications. J. Syst. Softw. 137, 1–27 (2018). https://doi.org/10.
1016/j.jss.2017.11.001
7. Li, Y., Ma, R., Jiao, R.: A hybrid malicious code detection method based en deep learning.
Int. J. Secur. Appl. (IJSIA) 9(5), 205–216 (2015). https://doi.org/10.14257/ijsia.2015.9.5.21
8. Abeshu Diro, A., Chilamkurti, N.: Distributed attack detection scheme using deep learning
approach for Internet of Things. Int. J. Future Gener. Comput. Syst. (FGCS) 82, 761–768
(2018). https://doi.org/10.1016/j.future.2017.08.043
9. Lan, L., Jun, L.: Some special issues of network security monitoring on big data
environments. In: 2013 IEEE 11th International Conference on Dependable, Autonomic and
Secure Computing, pp. 10–15. (2013) https://doi.org/10.1109/dasc.2013.30
10. Saenko, I., Kotenko, I., Kushnerevich, A.: Parallel processing of big hterogeneous data for
security monitoring of IoT networks. In: 2017 25th Euromicro International Conference on
Parallel, Distributed and Networks-Based Processing, pp 329–336 (2017). https://doi.org/10.
1109/pdp.2017.45
11. Campbell, M.R., Padayachee, K., Masombuka, T.: A survey of Honeypot research: trends
and opportunities. In: The 10th International Conference for Internet Technology and
Secured Transactions (ICITST-2015), pp. 208–210 (2015). https://doi.org/10.1109/icitst.
2015.7412090
12. Designing a DMZ.: SANS Institute 2003. https://www.sans.org/reading-room/whitepapers/
firewalls/designing-dmz-950
13. Saraladevi, B., Pazhaniraja, N., Victer Paul, P., Saleem Basha, M.S., Dhavachelvan, P.: Big
data and Hadoop—a study in security perspective. In: 2nd International Symposium on Big
Data and Cloud Computing (ISBCC’15), p. 598 (2015). https://doi.org/10.1016/j.procs.
2015.04.091
14. Gu, L., Li, H.: Memory or time: performance evaluation for iterative operation on hadoop
and spark. In: 2013 IEEE International Conference on High Performance Computing and
Communications & 2013 IEEE International Conference on Embedded and Ubiquitous
Computing, pp. 721–722 (2013). https://doi.org/10.1109/hpcc.and.euc.2013.106
15. Chen, X., Lin, X.: Big data deep learning challenges and perspectives. IEEE Access 2, 514
(2014). https://doi.org/10.1109/access.2014.2325029
16. Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inform.
Fusion 42, 147 (2017). https://doi.org/10.1016/j.inffus.2017.10.006

You might also like