Professional Documents
Culture Documents
1 Introduction
Currently, with the rapid evolution of information technology and its large application
in all life domains, the detection and prevention of intrusions constitute a major security
challenges. The role of Network Intrusion Detection Systems (NIDS) is to inspect and
analyze data traffic passing through network in order to detect anomalies, and raise an
alert or block communication between the communicating entities. This analyze is
based on predefined algorithms and rules that rely on signatures or suspected traffic
behavior.
Moreover, attackers and intruders are always trying with impatience to find new
means and possibilities to destroy the security obstacles and attack the system. While
current methods of analysis and detection of the NIDS are inefficient, they do not
evolve to automatically detect and reveal this unidentified new manner of attacks. What
makes thinking about smart new methods of analysis and recognition which can adapt
to changing aspects of menaces.
In addition, attacks can be of different shapes like SQL Injection or probing attacks,
this issue of variety of attacks pushes us to find solution to treat them all in a unique
way.
2 Related Work
The idea of a NIDS based on Deep Learning was discussed in [1], the authors applied
Self-Taught Learning (STL) which is a Deep Learning approach on a set of traffic data
of network called NSL KDD [2], which contains normal records and attack records.
They compared the attack recognition performance with an old classification method
Soft-Max Regression (SMR) and proved with experimentations that the STL recog-
nizes attacks better. This solution presents a good approach to distinguish normal traffic
from suspected traffic, but no illustration of the implementation of NIDS in the real
world is described.
Almost the same work was done in [3], KDD CUP 99 [4] which is an old version of
NSL KDD and which also gathers the traffic data, was used to perform a precision
benchmarking of traffic detection, between Support Vector Machine method based on
Restricted Boltzmann Machine (SVM-RBMs) [5], and classic classification methods.
The researchers do not showed any proof of choice of the employed Deep Learning
method. The study concluded clearly that SVM-RBMs can better identify the origin of
threats and it takes evidently less time processing big amount of data.
In the paper [6], a new approach to detect code injection attacks have been
established. A new hybrid of Deep Learning called Hybrid Deep Learning Network
(HDLN) was built. The injection attacks are attached to JavaScript code. They used the
Abstract Syntax Tree (AST) to identify more features and employed three methods to
distinguish key features. HDLN has been evaluated, firstly, relatively to the number of
hidden layer, the number of filters and number of neurons, the results showed that the
accuracy is higher as the number of filters increases. Secondly, it was evaluated against
the other traditional classifiers using IG feature vectors, the accuracy was greater than
all the other classifiers. Finally, they compared the precision of the model to a work of a
machine learning already done by the team, they showed that this modern model is
more efficient than the previous. The submitted effort is prestigious, except that they
only focused on injection attacks related to JavaScript code and not all kinds of attacks.
Thanks to this solution.
Paper [7] proposes also a hybrid of Deep Learning which combines Auto-Encoder
and DBN. The Auto-Encoder was employed in order to decrease the dimensionality of
data and identify the principal features of data, whereas the DBN had the role of
discovering the malicious code. The new Deep Learning was applied on KDD CUP 99
[4], and assessed against the DBN alone, the results indicated effectively that the new
802 A. Boukhalfa et al.
Deep Learning is the best in term of accuracy, and it consumes less time than the DNB.
The authors have not specified the motivation to use this combination of DBN and
Auto-Encoder to form the presented Deep Learning model.
A distributed way to detect attacks in the space of Internet of Things (IoT) was
adopted in [8], the authors applied Deep Learning on each node of fog-to-things
networks, the purpose was to obtain independence of identifying locally attacks and
sharing parameters with neighbors, in order to accelerate the identification and optimize
the update of parameters. The safety of system was verified by NSL KDD [2], and Big
Data management system Spark to concretize the design of fog nodes. The results were
demonstrated that the scattered concept is better effective then centralized conception
and Shallow Learning which is a machine learning method. The proposal is very
interesting, it is an advanced and significant enhancement of detection, except that the
researchers have not shown exactly which method of the Deep Learning family have
been adopted.
In an environment where Big Data transits, network security monitoring was dis-
cussed, the manuscript [9] cited, in the first, the raisons of need of network security
monitoring which are prediction issues, security devices and mechanisms are not
suitable for amount data environments, abnormal alarm must be detected quickly and
equitable diagnostics of alert information must be done, the correlation algorithms are
employed only for anomaly identification and not for the full devises and tools for
network security. Secondly the authors illustrate a network security monitoring system
based on accumulation of Big Data, its integration which consists of purification and
classification, and analyzing to extract information to expose it in order to make
decisions. Finally, they give an overview on some correlation algorithms used for
analyzing data. The paper proposed a schema of a security monitoring system but it did
not think about improving correlation methods to effectively accompany the evolution
of attacks.
Another approach for security monitoring of Internet of Things (IoT) networks was
debated in [10], a big and variety of security logs was gathered from consumer elec-
tronic devices, then it was stored in a parallel way in the nodes of Big Data man-
agement system Hadoop, the operation of normalization was applied to get a unique
format of data, the analysis was performed by applying methods of aggregation and
correlation adopting Complex Event Processing (CEP) which aims to detect and
analyze information contained in the events then take action in real time, the
results was visualized using advanced tools of visualization. The principle of the
article is founded solely on displaying and reporting data without taking action against
attacks.
Our work will be different by comparing it with the existing works, it is a concrete
representation of a system of security monitoring of the organization network,
it treats any type of attack and will be evolutionary to take decisions against the new
threats.
A Honey Net, Big Data and RNN Architecture … 803
3 System Architecture
3.1 Components
The proposed architecture of monitoring security of information system is described in
Fig. 1, it is based on the establishment of the mechanisms of the collection and analysis
of attacks, to infer knowledge from them and to cope with others which have some
points of resemblance. It is composed of an organization network separated by two
firewalls. We depict the components one by one:
Fig. 1. Architecture of automatic security monitoring of the information system based on Honey
Net, Big Data and RNN
Demilitarized Zone
The Demilitarized Zone (DMZ) is an isolated subnet of the local network and the
internet. It contains:
Honey Net
Honeypots are devices dedicated to attract attackers and record information about their
attacks. So, any attempt to interact with a Honeypot server is a proof of an unwanted
activity.
804 A. Boukhalfa et al.
The interaction level of a Honeypot defines at which point the intruder can attack
the system. Low interaction Honeypots minimize the level of interaction of the pro-
vided services, however, attackers can easily reveal their natures. Medium interaction
Honeypots have a reduced set of services with some ability to hide from the attackers.
With a high interaction level, Honeypots expose themselves as attractive real machines
with complete operating systems.
Research Honeypots are intended for scientific research, they are used only to
memorize intrusion attempts, contrary to production Honeypots which protect the
system and take action when it is about menace [11].
We opt for a collection of a high interaction research Honeypots which constitutes a
Honey Net to gather as much data as possible, the non-use of production Honeypots
lies in the fact that an action produced against threats may make the hackers believing
that the machine is monitored, however, we want them to keep attacking and using
more means, it will help us to gather more data. Therefore, our Honeypots will play the
role of a network machines with security flaws intended to be attacked from the
internet.
Firewall 1
Honeypots are attractive, we have to think about the prevention of our network. The
Firewall 1 is configured to block traffic coming from the internet to the local network to
ensure its security, and allow only traffic from the internet to the DMZ. Thus, we offer
permission to intruders to attack Honeypots from the internet.
Firewall 2
Firewall 2 will stop all traffic to prevent intruders from infiltrating via Honeypots, and
allow only traffic coming from a specific ports dedicated for loading data from
Honeypots to our production machines.
In order to obtain more security, we try to get the two firewalls from two different
builders because they will have different security bugs, this will enhance the security of
our local network [12].
Thus, in case of compromise of our local network, two obstacles must be
overcome.
Intranet
It is our network of organizations which groups production machines, in addition to:
Data storage server.
Given the variety and mass of data that can be collected from our Honeypots, we
thought about setting up a Big Data management system for storing and analyzing this
data, two of the most famous open source Big Data frameworks are:
• Hadoop is a dedicated framework for storage and processing of Big Data in a
distributed way, it is based mainly on two parts, one part for storage called Hadoop
Distributed File System (HDFS), and one part for the processing called MapRe-
duce. To managing cluster, Hadoop use master/slave architecture [13].
A Honey Net, Big Data and RNN Architecture … 805
• Spark is a framework also designed for Big Data management, it does not have its
own distributed storage file, it relies on storage systems such as that of Hadoop
(HDFS) or others, but it is considered faster than Hadoop regarding iterative
treatments [14].
We will try to test them both during our futures experiments and choose the most
efficient and suitable for our architecture.
Archiving server of processed data
Because of the amount of data which can be processed over time, this server is a Big
Data management system, it is dedicated for the archiving of the data of the attacks
after the treatment by the RNN.
3.2 Treatments
After exposing the set of Honeypots, the treatment operations are as follows:
Collecting of attacks
In this phase, we use an Extract Transform Load (ETL) tool to load the data of attacks
recorded by Honeypots to the Big Data storage server, we attempt to choose the most
efficient and adequate during our researches.
We describe in detail the loading algorithm in Table 1 and following program:
LoadData (StartDate)
{
EndDate = Null;
If (StartDate == null)
{
StartDate = SysDate;
}
While (DataDate >= StartDate)
{
LoadDataToServer;
}
EndDate = SysDate;
Return EndDate;
}
With this algorithm, the data that can be recorded by the Honeypot during the
current loading will be loaded during the next loading.
Elimination of redundancy
After loading the data into the Big Data storage server, the step of elimination of
redundancy is necessary, we will try to delete duplicate data with a NoSQL request to
provide more performance to the next processing.
806 A. Boukhalfa et al.
References
1. Niyaz, Q., Sun, W., Javaid, A., Alam, M.: A deep learning approach for network intrusion
detection system. In: Proceedings of the 9th EAI International Conference on Bio-inspired
Information and Communications Technologies (BICT’15). pp. 21–26. ACM, United States
(2015). https://doi.org/10.4108/eai.3-12-2015.2262516
2. NSL KDD. https://github.com/defcom17/NSL_KDD
808 A. Boukhalfa et al.
3. Dong, B., Wang, X.: Comparison Deep learning method to traditional methods using for
network intrusion detection. In: 2016 8th IEEE International Conference on Communication
Software and Networks, pp 581. https://doi.org/10.1109/iccsn.2016.7586590
4. KDD Cup 99. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
5. Yang, J., Deng, J., Li, S., Hao, Y.: Improved traffic detection with support vector machine
based on restricted Boltzmann machine. Soft Comput. 21(11), 3101–3112 (2017). https://
doi.org/10.1007/s00500-015-1994-9
6. Yan, R., Xiao, X., Hu, G., Peng, S., Jiang, Y.: New deep learning method to detect code
injection attacks on hybrid applications. J. Syst. Softw. 137, 1–27 (2018). https://doi.org/10.
1016/j.jss.2017.11.001
7. Li, Y., Ma, R., Jiao, R.: A hybrid malicious code detection method based en deep learning.
Int. J. Secur. Appl. (IJSIA) 9(5), 205–216 (2015). https://doi.org/10.14257/ijsia.2015.9.5.21
8. Abeshu Diro, A., Chilamkurti, N.: Distributed attack detection scheme using deep learning
approach for Internet of Things. Int. J. Future Gener. Comput. Syst. (FGCS) 82, 761–768
(2018). https://doi.org/10.1016/j.future.2017.08.043
9. Lan, L., Jun, L.: Some special issues of network security monitoring on big data
environments. In: 2013 IEEE 11th International Conference on Dependable, Autonomic and
Secure Computing, pp. 10–15. (2013) https://doi.org/10.1109/dasc.2013.30
10. Saenko, I., Kotenko, I., Kushnerevich, A.: Parallel processing of big hterogeneous data for
security monitoring of IoT networks. In: 2017 25th Euromicro International Conference on
Parallel, Distributed and Networks-Based Processing, pp 329–336 (2017). https://doi.org/10.
1109/pdp.2017.45
11. Campbell, M.R., Padayachee, K., Masombuka, T.: A survey of Honeypot research: trends
and opportunities. In: The 10th International Conference for Internet Technology and
Secured Transactions (ICITST-2015), pp. 208–210 (2015). https://doi.org/10.1109/icitst.
2015.7412090
12. Designing a DMZ.: SANS Institute 2003. https://www.sans.org/reading-room/whitepapers/
firewalls/designing-dmz-950
13. Saraladevi, B., Pazhaniraja, N., Victer Paul, P., Saleem Basha, M.S., Dhavachelvan, P.: Big
data and Hadoop—a study in security perspective. In: 2nd International Symposium on Big
Data and Cloud Computing (ISBCC’15), p. 598 (2015). https://doi.org/10.1016/j.procs.
2015.04.091
14. Gu, L., Li, H.: Memory or time: performance evaluation for iterative operation on hadoop
and spark. In: 2013 IEEE International Conference on High Performance Computing and
Communications & 2013 IEEE International Conference on Embedded and Ubiquitous
Computing, pp. 721–722 (2013). https://doi.org/10.1109/hpcc.and.euc.2013.106
15. Chen, X., Lin, X.: Big data deep learning challenges and perspectives. IEEE Access 2, 514
(2014). https://doi.org/10.1109/access.2014.2325029
16. Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inform.
Fusion 42, 147 (2017). https://doi.org/10.1016/j.inffus.2017.10.006