Cognitive Memory-Guided AutoEncoder For Effective Intrusion Detection in IoT

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3102637, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, 2021 1
Cognitive Memory-guided AutoEncoder for

Effective Intrusion Detection in IoT
Huimin Lu, Tian Wang, Xing Xu, and Ting Wang
Abstract—With the development of the Internet of Things (IoT)

Preprocess Create and Init Input sample to
technology, intrusion detection has become a key technology dataset model model
that provides solid protection for IoT devices from network
intrusion. At present, artificial intelligence technologies have
Training Calculate
been widely used in the intrusion detection task in previous dataset Back-propagation
reconstruction
Reconstruct
to optimize model sample
methods. However, unknown attacks may also occur with the errors
development of the network and the attack samples are difficult
to collect, resulting in unbalanced sample categories. In this
case, the previous intrusion detection methods have the problem Internet
of high false positive rates and low detection accuracy, which
restricts the application of these methods in a real situation.
Normal Attack Access
In this paper, we propose a novel method based on deep Access Access denied Preprocess
Reconstruct
traffic data by
neural networks to tackle the intrusion detection task, which traffic data
model
is termed Cognitive Memory-guided AutoEncoder (CMAE). The Intrusion
Detection
CMAE method leverages a memory module to enhance the System (IDS)
ability to store normal feature patterns while inheriting the
Calculate
advantages of AE. Therefore, it is robust to the imbalanced Access Classify data
reconstruction
allowed by threshold
samples. Besides, using the reconstruction error as an evaluation errors
criterion to detect attacks effectively detects unknown attacks.
To obtain superior intrusion detection performance, we propose
feature reconstruction loss and feature sparsity loss to constraint
the proposed memory module, promoting the discriminative of
memory items and the ability of representation for normal
data. Compared to previous state-of-the-art methods, sufficient Fig. 1. The upper part shows a block diagram of the proposed research,
while the lower part is the application of the proposed method in network
experimental results reveal that the proposed CMAE method intrusion detection.
achieves excellent performance and effectiveness for intrusion
detection.
Index Terms—Internet of Things, Intrusion Detection, Deep by network hackers, resulting in causing economic losses. For
Neural Networks, Cognitive Memory, AutoEncoder example, Yahoo suffered the theft of user account data in 2014,
and several organizations around the world were attacked by
I. I NTRODUCTION the WannaCry worm in 2017 [2]. According to the threat report
published by McAfee in August 2019, the current top ten
YBER security has become a very important field, with
C the rise of the Internet of Things (IoT) technology in
recent years. At present, the Internet of Things technology
attack methods are account hijacking, vulnerabilities, malware,
unauthorized access, code injection, defacement, theft, denial
of service (DoS), targeted attack, and unknown [3]. Besides,
has been integrated into daily life to provide convenience for
there is a paper [4] to discuss the unique security goals and
humans [1], such as smart home equipment and autonomous
challenges of the Industrial Internet of Things. Therefore, how
driving car, with the astounding development of 5G and cloud
to design an efficient detection system to protect IoT devices
computing technologies. The main task of IoT equipment is to
from network intrusion has attracted the attention of many
collect data and exchange information. Besides, we can control
researchers.
and communicate with it via the Internet. Due to the lack of
cyber security awareness, IoT devices are likely to be attacked In recent years, many innovative works in the field of intru-
sion detection have been proposed. Generally, most research
This work was supported in part by the National Natural Science Foundation work treats intrusion detection as a classification problem with
of China (61976049, 61906086) and the Sichuan Science and Technology labeled samples. Traditional machine learning algorithms, such
Program, China (2019ZDZX0008).(Corresponding author: Xing Xu)
H. Lu is with Department of Mechanical and Control Engineering, as K-nearest neighbors (KNN) [5], support vector machine
Kyushu Institute of Technology, Kita-Kyushu 819-0395, Japan. (E-mail: (SVM) [6], and random forest (RF) [7], have been widely
dr.huimin.lu@ieee.org) adopted in early intrusion detection work. These traditional
T. Wang and X. Xu are with the Center for Future Multimedia and
School of Computer Science and Engineering, University of Electronic methods require manual selection of features, making it im-
Science and Technology of China, Chengdu 610051, China (Email: wang- possible to extend it to large-scale intrusion detection data
tianguge@gmail.com and xing.xu@uestc.edu.cn) with high-dimensional. Li et al. [8] propose to learn hierar-
T. Wang is with Nanjing Tech University, College of Electrical En-
gineering and Control Science, Nanjing 211816, China (e-mail: wangt- chical features which are collected from IoT. Deep learning
ing0310@njtech.edu.cn). has shown superior performance in different fields, such as
1551-3203 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: VEL TECH MULTI TECH Dr RR Dr SR ENGG. COLLEGE. Downloaded on December 20,2021 at 05:47:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3102637, IEEE
image recognition, autonomous driving, and natural language normal data. Then, the decoder uses the feature retrieved by
processing, etc. For example, a method [9] is proposed to be the memory module to reconstruct the original input data.
applied to solve the problem of intrusion detection, taking ad- To record the diversity of normal patterns, we propose to
vantage of convolutional neural networks (CNN). In particular, apply feature reconstruction loss and feature sparsity loss to
other methods based on deep neural networks, e.g., recurrent train the proposed memory module. The feature reconstruction
neural network [10] and auto-encoder [11], have been adopted loss emphasizes that the memory module has the ability to
to the previously existing methods. Moreover, the existing represent and locating normal features. Besides, it requires
methods based on deep neural networks (DNN) usually use that the latent space feature and the memory representation
labeled attack data for training, classifying the categories of feature be closer. The feature sparse loss makes the memory
intrusion. item to be discriminative, recording the diversity of normal
However, the following flaws exist in the current methods, patterns. Extensive experimental conducted on KDDCUP [16]
when these methods are applied in IoT for intrusion detection. dataset demonstrate that the proposed CMAE model owns
First, collecting attack data from the internet is difficult to effectiveness and robustness for intrusion detection in IoT.
accomplish, leading to the issue of imbalanced categories in Summary, the listed content is the main contribution of this
current datasets. This problem causes the current model to work:
have a high false-positive rate for few attack samples [12], • We propose a novel model dubbed Cognitive Memory-
[13]. Second, with the evolution of intrusion technology, many guided AutoEncoder (CMAE) for intrusion detection,
new attack methods will appear in real network environments which proposes a memory module to store and locate
to hijack IoT devices. Traditional classification based methods the latent space feature representations of normal data.
cannot effectively solve this problem, which makes it difficult The proposed memory module makes the reconstruction
to be applied in real situations [14], [15]. In this paper, we split difference of attack samples higher than normal samples.
the network access in IoT into two categories: attack class and • We propose feature reconstruction loss and feature spar-
normal class. We propose to design a model that satisfies the sity loss to constrain the memory module, for improving
normal pattern well, so that the attack deviates from the pattern the representation ability of it and the diversity of the
represented by the model, resulting in the detection of network memory items. Moreover, the proposed loss function
attacks. We regard intrusion detection as an unsupervised saves the memory consumption of the memory module
problem, using only normal data through the autoencoder. and accelerates the convergence of the proposed model.
Since the model learns the pattern of normal data, it is not • We achieve a new excellent model on the KDDCUP
affected by the imbalance of the attack category samples. The dataset for intrusion detection. The extensive experimen-
previous method [15] also used an autoencoder to design a tal analysis is also presented in this paper for ablation
zero-day attack intrusion detection system, which proposes to studies.
utilize reconstruction error to detect intrusion. However, the
diversity of normal patterns make the autoencoder poorly solve
the network intrusion problem. In addition, the method suffers II. R ELATED W ORK
from high false-positive rates.
A. Intrusion Detection
Therefore, we propose an innovative model termed Cog-
nitive Memory-guided AutoEncoder (CMAE) for intrusion Intrusion Detection is a widely used technology to detect
detection in IoT. As shown in Fig. 1, we can observe that potential intrusions and suspicious activities in IoT. According
the training block diagram of the proposed CMAE model and to the characteristics of the methods, Anomaly-based Intru-
the potential application of the model in network intrusion sion Detection Systems (AIDS) and Signature-based Intrusion
detection. The upper part of Fig. 1 describes the training Detection Systems (SIDS) are the main methods. SIDS issue
procedure of the proposed model in this paper, including 6 an intrusion signal when monitored behavior matches the
main steps. In particular, due to the problem of the training previously defined intrusion pattern, which is collected by
set, that is discrete features and the missing value, we need analyzing and summarizing past intrusion data. Kruegel et al.
to perform preprocessing operations on data. Then, we will [17] proposed to apply machine learning clustering techniques
input data to the model for training by using gradient descent. to improve the matching process of SIDS. Moreover, AIDS
The lower part of Fig. 1 shows the application of the proposed learns the patterns of normal behavior easily collected to
model, which can help IoT devices resist network attacks. We detect attack behavior that deviates from normal patterns.
can intercept the network request, process the network request, Many innovative AIDS methods [9], [12] have also been
input it into the proposed model to obtain an anomaly score, proposed to resist network intrusion attacks. With the rise
and filter the network request by setting a threshold. of machine learning technology, various supervised learning
Moreover, Fig. 2 shows the overall framework of the methods are proposed to train a classifier model by pre-
proposed model, which consists of an encoder, a memory labeled data. However, traditional machine learning methods
module, and a decoder. We first preprocess the structured are limited by dimensional disasters. Therefore, deep neural
data from the training set and then input it into the proposed networks are proposed for the application to detect intrusions.
model. After obtaining a latent space feature representation For example, the Long Short Term Memory (LSTM) and
by encoding the input data, we leverage the memory module Recurrent Neural Network (RNN) methods are proposed to
to store and locate the latent space feature representations of solve intrusion detection tasks [18]. In addition, Ding et al. [9]
𝑧𝑞𝑢𝑒𝑟𝑦
query
Encoder operation Decoder
𝑤
Input Output
𝑧𝑟𝑒𝑡𝑟𝑖𝑣𝑒
Fig. 2. The architecture diagram of the proposed CMAE method. The CMAE model consists of three basic substructures: an encoder, a decoder, and a
memory module. Notably, the encoder and decoder are composed of convolutional neural networks. The memory module retrieves a feature presentation with
the attention-based query operation.
proposed a novel convolutional neural network (CNN) based the training process of memory networks requires layer-wise
method to efficiently detect network intrusions. supervision information, which makes it difficult to train using
In the actual environment, most of the network traffic in backpropagation. Therefore, Sukhbaatar et al. [27] proposes
IoT is normal behavior. Most of the datasets used for intrusion to use end-to-end learning by using soft attention to es-
detection have the problem of class imbalance, which leads to timate the relevance of each memory item. Graves et al.
the high false-positive rate and low detection performance of [28] extended the capability of neural networks with external
current methods, resulting in hardly applying to the actual pro- content-based attention memory. In the computer vision field,
duction environment. Some recent methods propose to solve memory networks have been adopted t for image generation
the problem of class imbalance by using some oversampling and visual question answering. Park et al. [29] proposed a
methods, including Synthetic Minority Over-sampling Tech- memory module for video anomaly detection, using a new
nique (SMOTE), Random Over Sample (ROP), and Adaptive read and write strategy, a feature compactness loss, and a
Synthetic Sampling Approach (ADASYN). Moreover, current feature separateness loss. In addition, Gong et al. [30] have
methods also propose generative models to generate data to also applied memory networks to remember patterns of normal
balance the dataset, with the rise of generative models. Yang feature representation for anomaly detection. In this paper,
et al. [12] proposed an intrusion detection method to improve we leverage a memory module to guide the autoencoder
the classification effectiveness. This method proposes to apply to remember the feature representations of normal data. We
a generator to balance the number of data samples, and then introduce memory modules into the field of intrusion detection,
use the balanced dataset to learn a classifier. Some unsuper- for storing the feature representations of normal network
vised methods [15], [19], [20] are also applied to intrusion connections. The previously existing methods [29], [30] take
detection tasks. These models are trained using normal data memory modules to be applied to visual data for anomaly
and consider intrusion behavior as an outlier. Zhai et al. [21] detection. The proposed method designed a specific memory
proposed to complete the anomaly detection task, which is model for structured data such as cyberspace data in IoT. To
divided into DSEBM-r and DSEBM-e, with the help of a deep extract better latent space features, we propose to use convolu-
structured energy based model (DSEBM). Zong et al. [22] tional neural networks, which obtain better performance in our
proposed to combine the autoencoder network and gaussian experiments, to process structured data. Moreover, we propose
mixture model for intrusion detection. A generative model is to use feature reconstruction loss and feature sparsity loss to
proposed to detect anomaly data [23]. Wang et al. [24] extend improve the proposed memory module, which can improve the
the univariate quantile function to the multivariate setting for diversity of the memory items and the representation ability
detecting outliers. of the memory module.
B. Memory Networks III. P ROPOSED M ODEL

Long-term dependency is a key feature in dealing with A. Overview
sequential data for the current model to improve performance. As shown in Fig. 2, the architecture of the proposed CMAE
LSTM [25] proposes to use local memory cells to record model is composed of three basic substructures: an encoder
past information. However, the memorization performance of (for encoding the input data to get a latent feature), a memory
LSTM is limited by the size of the memory cell. To overcome module (utilizing an additional memory to store and locate
this problem, memory network [26] has been proposed to features of the latent space), and a decoder (for decoding the
use long-term memory to store knowledge of question and feature located by the memory model to the original input
answer or to record contextual information of chat. However, data). We first preprocess the structured data from the training
set and then input it into the proposed model. The details of specific combination of memory items. Therefore, zretrieve
preprocessing data are listed in the experimental setup. After can be represented by the memory module M through a
obtaining a latent space feature representation by encoding specific combination, as follows:
the input data, we propose to use the memory module to K
X
store and locate the latent space feature representations of zretrieve = wi mi , (3)
normal data. Besides, we use reconstruction loss to demand i
the proposed model to be able to perfectly reconstruct normal where wi represents the similarity between query zquery and
data. Moreover, we propose feature reconstruction loss and memory item mi . Then, we obtain a combination pattern of
feature sparsity loss to constrain the memory module, which memory items by multiplying each similarity and correspond-
aims to better locate the normal features and store the sparse ing memory item. We regard the operation of obtaining a
feature representations of the latent space. At testing time, specific combination method as the attention-based query oper-
the feature representations of abnormal attack data tend to be ation. As with the previous method, [29], [30], the similarity,
reconstructed as normal data by the decoder, as the proposed calculate by each memory item m and the query zquery , is
memory module only records the latent space feature repre- regarded as a combination pattern. We obtain the similarity
sentation of normal data. Therefore, the reconstruction error of between the query zquery and each memory item m as follows:
the normal sample is smaller than that of the abnormal attack
T
data, which can be used to determine whether network access exp zquery (mi )
is an attack initiated by a criminal. wi = P . (4)
K T
j exp zquery (mj )
B. Network Architecture Then, we can obtain the zretrieve from the memory module
1) Encoder and Decoder: The proposed intrusion detection
model CMAE uses an auto-encoder as the basic network Algorithm 1 The proposed intrusion detection Algorithm
structure. The input data is encoded into a latent space feature Input: Training dataset D; Testing dataset T ; training epochs;
representation by an encoder, which is reconstructed to the batch size; learning rate lr.
original input data by a decoder. In the proposed model, the 1: repeat
memory module is proposed to store the feature representation 2: for sample in Training dataset T do
of the latent space, so the encoder can be regarded as a 3: Input x ∈ D to En (·) to get a zquery with Eq. 1.
query generator. The query generated by the encoder can 4: Input zquery to memory module to get a zretrieve
be represented by the memory module with the attention- with Eq. 3.
based query operation. Therefore, we first define x ∈ RN , 5: Reconstruct input with zretrieve with Eq. 1.
where N denotes the dimension of input data, to represent an 6: Compute L loss with Eq. 5.
input, En (·) to represent an encoder, and De (·) to represent 7: Back-propagate L to change the proposed model.
a decoder. Then, we input x into En (·) to get a zquery ∈ RD , 8: end for
where D is the dimension, and then obtain a zretrieve ∈ RD 9: until convergence
through the memory module. Finally, the decoder De (·) 10: repeat
reconstructs zretrieve back to the x. According to the above 11: for sample in Testing dataset T do
description, the network structure is as follows: 12: Input x ∈ T to the proposed model to get a recon-
struction xrec .
zquery = En (x; θEn ) , (1) 13: Compute difference errors between x and xrec .
xrec = De (zretrieve ; θDe ) , (2) 14: Mark intrusion samples by threshold.
15: end for
where θEn and θDe denote the parameters of the encoder 16: until
En (·) and the decoder De (·), respectively. In particular, Output: The classification results of the Testing dataset T .
we apply convolutional neural networks (CNN) as the basic
modules of the encoder and the decoder. The convolution by Eq. 3 and 4 with the zquery . Besides, the zretrieve is
kernel can effectively extract the interrelationships between reconstructed back to the original input data. In order to get
local features, thereby effectively processing discrete data. a better memory representation, we propose to use feature
Extensive experiments show that the use of convolution kernels reconstruction loss and feature sparsity loss. Their detailed
can successfully enhance the performance of the model on descriptions are presented in the following section.
KDDCUP datasets on intrusion detection task.
2) Memory Module: The proposed model proposes to use C. Training Loss
a memory module to store and locate the latent space feature
In this paper, we propose to use a memory module to
representations of normal data. The proposed memory module
improve the remember ability for normal data. We use recon-
is represented by a matrix M ∈ RK×D , where D is the
struction loss, feature reconstruction loss, and feature sparsity
dimension of the storage feature and K represents the number
loss to train the proposed CMAE model, formulated into the
of memory items, as shown in Fig. 2. Besides, each memory
following form:
unit mi represents the feature representation memorized in the
memory. We can obtain a feature representation by applying a L = Lrec + λr Lf ea rec + λs Lf ea spa , (5)
where Lrec , Lf ea rec and Lf ea spa denote reconstruction TABLE I

loss, feature reconstruction loss and feature sparsity loss, T HE SPECIFIC CATEGORIES OF KDDCUP DATASET.
respectively. Besides, λr and λs are parameters for balancing
Normal normal
the each loss function. Moreover, as shown in Algorithm 1,
DOS neptune satan portsweep teardrop pod land
the intrusion detection procedure of the proposed method is
presented. U2R buffer overflow warezmaster imap
1) Reconstruction loss: The reconstruction loss is the basic R2L guess password multihop phf ftp write spy warezclient
loss function that enables the model to be converged, making Probe ipsweep rootkit loadmodule nmap
the data reconstructed by the decoder to be more similar to
the input data. In this paper, we choose the squared error
loss to minimize the difference between the input x and the
IV. E XPERIMENT
reconstruction xrec as follows:
2 A. Experimental Setup
Lrec = kx − xrec k2 . (6)
In this section, the experiments of the proposed intrusion
Besides, reconstruction error is the criteria for determining detection CMAE model, conducted on the public benchmark
whether network access is an attack. KDDCUP dataset, are used to demonstrate performance and
2) Feature Reconstruction Loss: In the previous method effectiveness. The CMAE model only uses normal data for
[30], the entropy function loss of similarity w is proposed to training in this paper. Once the proposed model converges,
train the memory module. This method ignores the accuracy of a network connection data is marked as an attack if the
memory representation, that is, there is a large error between reconstruction difference is larger than a given threshold.
zretrieve and zquery after memory positioning. Therefore,
1) Dataset: In the experimental stage, we select the widely-
the feature reconstruction loss is proposed to minimize the
used KDDCUP dataset [16] to demonstrate the performance
representation error between the zquery and the zretrieve with
and effectiveness of the proposed model. In particular, the
the square L2 norm as:
dataset consists of five major categories, i.e., normal, surveil-
D
X lance and probing (Probe), denial of service (DOS), remote
Lf ea rec = kzqueryi − zretrievei k2 , (7) to local (R2L), and user to root (U2R). The four types of
i abnormal records can be subdivided into multiple types, as
where D is the dimension of zquery and zqueryi is the i- shown in Table I. Each data contains 41-dimensional fea-
th element of it. Without the feature reconstruction loss, the tures, of which 34-dimensional features are continuous and
difference between zquery and zquery indicates that the feature 7-dimensional features are discrete, and 1 classification label
space decoded by the decoder is not similar to the feature space and 1 classification difficulty label. The one-hot encoding is
encoded by the encoder, resulting some abnormal samples can proposed to represent the discrete features and we finally
be reconstructed. We propose to train the proposed model obtain 120-dimensional features for training. Following the
by Eq. 7, which improves the representation ability of the previous method [30], [21], [22], 80% of data with “abnormal”
memory module and eliminates the difference between zquery label in the dataset are regarded as normal samples, and the
and zquery . rest of data labeled with “normal” are regarded as abnormal
3) Feature Sparsity Loss: The previous method uses a large samples.
amount of memory space to store the latent space feature rep-
resentations, in which there are many similar features. In this TABLE II
way, the memory module occupies more memory resources, T HE CONFUSION MATRIX OF USING IN OUR EXPERIMENT.
resulting in abnormal samples that can be represented by

Predicted Attack Predicted Normal
a large number of memory items. To prevent this problem,
Actual Attack True Positive False Negative
we propose to use feature sparsity loss to emphasize the Actual Normal False Positive True Negative
discrimination in memory units as follows:
K X
K
X 2) Evaluation Metric: The F1-score, Precision, Recall and
Lf ea spa = d (mi , mj ) , (8)
Geometric Mean (G-mean) are proposed to demonstrate the
i j6=i
performance and effectiveness of intrusion detection. The
where d (·) represents cosine similarity function and K de- larger value shows the better detection performance. Moreover,
notes the number of memory items. Our loss encourages less these metrics are calculated by the confusion matrix, which is
similarity between memory units, resulting in the use of less used to evaluate the classification performance. In Table II,
memory to represent the large feature space. Different from confusion matrix contains four statistical data, including True
the previous method, the proposed model enables abnormal Positive (TP), True Negative (TN), False Positive (FP) and
features to be represented as normal features by Eq. 8. Exten- False Negative (FN). Besides, TP and TN represent that attack
sive experiments show that the proposed feature sparsity loss class and normal class is correctly classified, respectively. FP
improves the capability of recording diverse normal feature indicates that a normal class is classified as an attack class.
representations. FN shows that an attack class is predicted as a normal class.
TABLE III
I NTRUSION DETECTION RESULTS ON CONTAMINATED TRAINING DATA FROM KDDCUP.
1% 2% 3% 4% 5%
Model\Ratio c
Precision / Recall / F1-score
OC-SVM [19] 0.7129 / 0.6785 / 0.6953 0.6668 / 0.5207 / 0.5847 0.6393 / 0.4470 / 0.5261 0.5991 / 0.3719 / 0.4589 0.1155 / 0.3369 / 0.1720
DCN [20] 0.7585 / 0.7611 / 0.7598 0.7380 / 0.7424 / 0.7402 0.7163 / 0.7293 / 0.7228 0.6971 / 0.7106 / 0.7037 0.6763 / 0.6893 / 0.6827
DSEBM-e [21] 0.6995 / 0.7135 / 0.7065 0.6780 / 0.6876 / 0.6827 0.6213 / 0.6367 / 0.6289 0.5704 / 0.5813 / 0.5758 0.5345 / 0.5375 / 0.5360
DAGMM [22] 0.9201 / 0.9337 / 0.9268 0.9186 / 0.9340 / 0.9262 0.9132 / 0.9272 / 0.9201 0.8837 / 0.8989 / 0.8912 0.8504 / 0.8643 / 0.8573
CMAE (Ours) 0.9212 / 0.9406 / 0.9308 0.9287 / 0.9401 / 0.9344 0.9214 / 0.9350 / 0.9282 0.8950 / 0.9071 / 0.9011 0.8569 / 0.8672 / 0.8620
TABLE IV details of the comparison baselines are listed as follow: One-

C OMPARISION OF DETECTION PERFORMANCE FOR THE DIFFERENT Class support vector machine proposes (OC-SVM) [19] to
STATE - OF - THE - ART METHODS ON THE KDDCUP DATASET (%).
use the kernel method for anomaly detection. Deep clustering
Method\Metric Precision Recall F1-score G-mean network (DCN) [20] is a popular clustering network that
constrains the performance of the autoencoder through the k-
OC-SVM [19] 0.7457 0.8523 0.7954 0.7972
DCN [20] 0.7696 0.7829 0.7762 0.7762 means method. Deep structured energy based model (DSEBM)
DSEBM-r [21] 0.8521 0.6472 0.7328 0.7426 [21] uses energy methods to complete anomaly detection
DSEBM-e [21] 0.8619 0.6446 0.7399 0.7454 task, which is divided into DSEBM-r and DSEBM-e. The
DAGMM [22] 0.9297 0.9442 0.9369 0.9369
AnoGAN [23] 0.8786 0.8297 0.8865 0.8537 autoencoder network and the Gaussian mixture model are
MTQ [24] 0.9622 0.9622 0.9622 0.9622 combined, which is proposed in Deep autoencoding Gaussian
MemAE [30] 0.9627 0.9655 0.9641 0.9640 mixture model (DAGMM) [22], to solve the problem of
ALAD [31] 0.9601 0.9577 0.9501 0.9589
GOAD [32] - - 0.9840 - intrusion detection. AnoGAN [23] is a common method based
CADGMM [33] 0.9601 0.9753 0.9671 0.9676 on Generative Adversarial Networks (GAN). MTQ [24] extend
CMAE (Ours) 0.9667 0.9830 0.9748 0.9748 the univariate quantile function to the multivariate setting.
Memory-based network (MemAE) [30] also uses a memory
model to remember normal pattern. However, MemAE does
3) Implementation Details: In this experiment, we obtain not use feature reconstruction loss and feature sparsity loss,
120-dimensional features after preprocessing the data and taking up a lot of memory space. ALAD [31] is based
resize it to a size of 11 × 11 by padding 0. Both the encoder on bi-directional GANs for anomaly detection by deriving
and decoder are composed of convolutional neural networks. adversarially learned features and uses reconstruction errors.
We define Conv2d(cin , cout , k, s) and ConvT2d(cin , cout , ck , GOAD [32] is a classification-based method, achieving supe-
cs ) to represent a 2D convolutional layer and a 2D deconvolu- rior results on this task. CADGMM [33] is Correlation aware
tional layer, respectively, where cin , cs , k and s denote input unsupervised anomaly detection via deep gaussian mixture
channels, output channels, kernel size and stride size. Besides, model, capturing the complex correlation among data points.
FC(in, out) denotes fully connected networks, where in and
In the first experiment, we used clean data for training
out denote input and output size. The encoder is implemented
according to the previous method [22]. We randomly sample
by: Conv2d(1, 16, 3, 1)-Conv2d(16, 32, 5, 1)-Conv2d(32, 64,
50% of the data from the KDDCUP dataset as the training
5, 1)-FC(64, 16). The decoder is implemented by: FC(16, 64)-
set, and the remaining data as the testing set. Moreover, in the
ConvT2d(64, 32, 5, 1)-ConvT2d(32, 16, 5, 1)-Conv2d(16, 1,
training phase, only data with a normal label in the training
3, 1). The ReLU activation function is adopted in convolution
set are used to train the proposed model. As shown in Table
and deconvolutional layers. However, the Sigmoid activation
IV, we propose to demonstrate the intrusion detection perfor-
function is used in the last deconvolution layer to obtain better
mance of the proposed CMAE method by adopting Precision,
performance. In addition, there is a batch normalize operation
Recall, F1-score and Geometric Mean (G-mean) as evaluation
in front of the ReLU function. We set the number of memory
metric. Table IV reveals that the proposed CMAE model has
K to 10 and the feature dimension D to 16. The initial learning
outperformed the existing state-of-the-art methods in most
rate, batchsize and number of iterations are set to 1e-3, 128 and
metrics. On Recall, the proposed method has obtained about
100, respectively. Meanwhile, we uses the Adam algorithm
1% improvement compared with CADGMM, reaching 0.9830.
to optimize the proposed model in end-to-end training way.
Also on F1-score, has achieved an improvement of about 1%
Besides, λr and λs is set as 1 and 0.1, respectively.
compared to MemAE, reaching 0.9748. However, the proposed
method is slightly inferior to the GOAD method on F1-score.
B. Experimental Results In addition, we have also achieved a small improvement in
In this section, the intrusion detection performance of the the precision metric. Besides, the proposed CMAE method
proposed CMAE model is demonstrated by extensive exper- has achieved the value of 0.9874 on G-mean, outperforming
iments, which are conducted on the KDDCUP dataset. The the existing state-of-the-art methods. The proposed CMAE
experiment results consist of two parts, using clean training method achieves superior intrusion detection performance on
data and contaminated training data respectively. Besides, the the KDDCUP dataset.
100 100
8 rec
99 99 fea_rec
98 98 fea_spa
Average loss per batch

97 97 6
96 96
Score (%)
Score (%)
95 95 4
94 94
93 93
2
92 Precision 92 Precision
Recall Recall
91 F1-score 91 F1-score 0
90 90
4 8 16 32 64 5 10 15 20 25 0 20 40 60 80 100
Latent Dimension Number of Memory Number of iterations
(a) (b) (c)
Fig. 3. The figure (a) is the performance of intrusion detection with different dimensions of latent space features. As the dimension of latent space features
increases, the results on all metrics first increase and then decrease. Figure (b) is the performance of intrusion detection with the different number of memory
items. As the number of memory increases, the results on all metrics first increase and then decrease. When the number of memory cells is in the range
of 10-20, the results on all metrics tend to be stable. Figure (c) is the visualization of the training loss, obtained by the proposed CMAE method, on the
KDDCUP dataset.
For the second experiment setting, we adopted the same TABLE V

experiment setting as the first experiment. However, we fuse T HE PRECISION OF USING DIFFERENT TRAINING METHODS .
c% of samples with an attack label into the training data
DOS Probe U2R R2L
used in the first experiment to pollute training dataset. We
Supervised 0.9894 0.9999 0.0501 0.1023
use the contaminated training data to prove the robustness CMAE (Ours) 0.9905 0.9999 0.9610 0.9325
of the proposed method. In Table III, we select different
values of c to conduct experiment on the KDDCUP dataset.
As shown in Table III, we can observe that the values of all
and reconstruction process. Fig. 3(a) shows that when the
metrics have shown a downward trend, with the increase of
dimension of latent space features is over large or over small,
the contaminated rate c%. When the contaminated rate reaches
the performance of the proposed model for intrusion detection
5%, the intrusion detection performance of the proposed
shows a downward trend in all metrics. The small value of
method has been reduced to the minimum. However, the
dimension does not extract enough features, while the large
proposed CMAE method has achieved a new state-of-the-art
value produces redundant features. In particular, setting the
at a different value of c range 1% to 5%, compared to the
dimension of latent space features to 16 on the KDDCUP
previous method. The proposed method has more advanced
dataset has achieved the best performance for intrusion detec-
detection performance on the contaminated training data and
tion. Moreover, as shown in Fig. 3(b), the performance of the
is less affected by the contamination rate c%. In the face of
proposed model for intrusion detection also shows a downward
contaminated data, the experiment results demonstrate superior
trend in all metrics, when the number of memory items is over
robustness for intrusion detection, achieving superior results.
large or over small In particular, setting the number of memory
Moreover, as shown in Fig. 3(c), the effect of the training
to 10 has achieved the best intrusion detection performance.
procedure of the proposed CMAE method is presented. We
can observe that the training losses, including reconstruction
loss, feature reconstruction loss, and feature sparsity loss, TABLE VI
T HE QUANTITATIVE COMPARISON OF USING DIFFERENT COMPONENTS .
of the proposed CMAE model, can fast convergence with a
small number of training epochs. The training loss fluctuates Memory module Conv Lf ea rec Lf ea spa Precision Recall F1-score
within a range after the proposed model converges. Besides,
% % - - 0.8450 0.8556 0.8592
we compare the proposed method with supervised learning
% " - - 0.8610 0.8756 0.8692
methods. The supervised learning method uses normal, dos " " % % 0.9180 0.9335 0.9258
and probes for training, while the proposed method only uses " " " % 0.9356 0.9514 0.9434
normal data for training. From Table V, we can observe that " " % " 0.9515 0.9676 0.9595
the performance of supervised learning for unknown anomalies " " " " 0.9667 0.9830 0.9748
is worse than the proposed method. Moreover, the proposed
method has better performance on all types of attacks. 2) Analysis on Different Components: We also investigate
the impact of different components presented in this paper for
C. Further Analysis intrusion detection. Table VI shows the performance of dif-
1) Analysis on Parameter Sensitiveness: We conduct the ferent combinations of different components on the KDDCUP
ablation experiment in the dimension of latent space features dataset. We observe that the performance has reduced for intru-
to investigate its influence on the proposed method since sion detection when the proposed model without the memory
the latent space feature directly affects the feature extraction module, as shown in Table VI. The “Conv” represents whether
to use a convolutional neural network as the main structure of ROC Curve

the proposed model. Using a convolutional neural network has 1.0
improved the detection performance of the proposed model by
about 0.15%, 0.2%, and 0.1%, respectively. Besides, Using 0.8
the memory module effectively improves the performance of
the proposed CMAE model for intrusion detection. Moreover,
0.6
compared to using feature reconstruction loss, using feature
TPR
sparsity loss has achieved an improvement of about 0.15%,
0.15%, and 0.16%, respectively. When using all components 0.4
we proposed, the model has been effectively improved on the
three metrics of intrusion detection. 0.2 Tuesday 0.6858
3) Analysis on Memory Module: To show the importance Wednesday 0.8474
Thursday 0.9113
of the feature sparsity loss in more detail, we visualize the 0.0 Friday 0.6232
memory items with a heat map. As shown in Fig. 4, i-th row of 0.0 0.2 0.4 0.6 0.8 1.0
the heat map represents the similarity between the i-th memory FPR
item and other memory items, where (a) is the result of not
Fig. 5. The ROC curve of the proposed model along with the AUC values
using feature sparsity loss and (b) use the loss. From Fig. 4, we on the CICIDS2017 dataset.
observe that there are several memory items in (a) that show
high similarity (marked in orange), while the memory items in
(b) only have a high similarity with itself (marked in red). This network intrusion. In this paper, we proposed a novel model
shows that the proposed feature sparse loss makes the memory termed Cognitive Memory-guided AutoEncoder (CMAE) to
items to be discriminative, improving the performance of the improve the performance of intrusion detection in IoT. We
memory module for intrusion detection. treated the intrusion detection problem as an unsupervised
problem, that is using the memory module to record the
1.0 1.0 latent space feature representations of normal data. Therefore,
network access, which is not well represented by the mem-
0.8 0.8
ory module, is considered an attack. The proposed network
0.6 0.6 proposed to apply convolutional neural networks to build the
0.4 0.4 main structure, which effectively processes structured data. In
addition, we proposed feature reconstruction loss and feature
0.2 0.2
sparsity loss for obtaining an efficient memory module. The
0.0 0.0 feature reconstruction loss improved the representation ability
(a) (b) of the memory module and eliminates the difference between
query feature and retrieve feature. The feature sparsity loss
Fig. 4. The visualization of the memory module. The heat map shows that
i-th row represents the similarity between the i-th memory item and other demanded the memory items be well discriminative and to
memory items. The heat map of (a) and (b) is obtained without and with the record the diversity of normal patterns. Experimental results,
feature sparsity loss, respectively. evaluated on the challenging KDDCUP dataset, demonstrated
that the effectiveness and robustness of the proposed method,
4) Results on CICIDS2017 Dataset: We also conduct ex-
outperforming existing sota intrusion detection methods. In
periments on the CICIDS2017 dataset [34], which is a recently
the future, we will investigate to explore more efficient mem-
proposed IoT dataset. The CICIDS2017 dataset contains be-
ory modules for intrusion detection, and apply the proposed
nign and the most up-to-date common attacks. IoT data is
memory module to more challenging applications.
collected from Monday to Friday. There were no attacks on
Monday, and there were attacks on other days. The data from
R EFERENCES
Monday is used for training, and then the intrusion detection
performance on the data from Tuesday to Friday is evaluated. [1] A. Zielonka, A. Sikora, M. Woźniak, W. Wei, Q. Ke, and Z. Bai, “Intelli-
gent internet of things system for smart home optimal convection,” IEEE
As shown in Fig. 5, the AUROC indicator is used to evaluate Transactions on Industrial Informatics, vol. 17, no. 6, pp. 4308–4317,
the performance of intrusion detection. We observe that the 2020.
proposed model has better intrusion detection performance [2] Q. Chen and R. A. Bridges, “Automated behavioral analysis of malware:
A case study of wannacry ransomware,” in 2017 16th IEEE International
on Wednesday and Thursday, with AUC values of 0.8 and Conference on Machine Learning and Applications (ICMLA). IEEE,
0.9, respectively. However, the ACU values on Tuesday and 2017, pp. 454–460.
Friday are 0.6 and 0.6 respectively. From Fig. 5, the proposed [3] C. Beek, T. Dunton, J. Fokker, S. Grobman, T. Hux, T. Polzer, M. Rivero,
T. Roccia, J. Saavedra-Morales, R. Samani et al., “Mcafee labs threats
method also has competitive intrusion detection performance report: August 2019,” McAfee Labs, 2019.
on CICIDS2017 data. [4] M. Serror, S. Hack, M. Henze, M. Schuba, and K. Wehrle, “Challenges
and opportunities in securing the industrial internet of things,” IEEE
Transactions on Industrial Informatics, vol. 17, no. 5, pp. 2985–2996,
V. C ONCLUSION 2020.
[5] G. Serpen and E. Aghaei, “Host-based misuse intrusion detection using
At present, the cyber security of the industrial Internet of pca feature extraction and knn classification algorithms,” Intelligent Data
Things (IoT) is a key challenge, which brings the problem of Analysis, vol. 22, no. 5, pp. 1101–1114, 2018.
[6] Y. Tian, M. Mirzabagheri, S. M. H. Bamakan, H. Wang, and Q. Qu, [30] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and
“Ramp loss one-class support vector machine; a robust and effective A. v. d. Hengel, “Memorizing normality to detect anomaly: Memory-
approach to anomaly detection problems,” Neurocomputing, vol. 310, augmented deep autoencoder for unsupervised anomaly detection,” in
pp. 223–235, 2018. Proceedings of the IEEE International Conference on Computer Vision,
[7] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 2019, pp. 1705–1714.
5–32, 2001. [31] H. Zenati, M. Romain, C.-S. Foo, B. Lecouat, and V. Chandrasekhar,
[8] P. Li, Z. Chen, L. T. Yang, Q. Zhang, and M. J. Deen, “Deep “Adversarially learned anomaly detection,” in 2018 IEEE International
convolutional computation model for feature learning on big data in conference on data mining (ICDM). IEEE, 2018, pp. 727–736.
internet of things,” IEEE Transactions on Industrial Informatics, vol. 14, [32] L. Bergman and Y. Hoshen, “Classification-based anomaly detection for
no. 2, pp. 790–798, 2017. general data,” 2019.
[9] Y. Ding and Y. Zhai, “Intrusion detection system for nsl-kdd dataset [33] H. Fan, F. Zhang, R. Wang, L. Xi, and Z. Li, “Correlation-aware deep
using convolutional neural networks,” in Proceedings of the 2018 2nd In- generative model for unsupervised anomaly detection,” Advances in
ternational Conference on Computer Science and Artificial Intelligence, Knowledge Discovery and Data Mining, vol. 12085, p. 688, 2020.
2018, pp. 81–85. [34] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating
[10] C. Li, J. Wang, and X. Ye, “Using a recurrent neural network and a new intrusion detection dataset and intrusion traffic characterization.”
restricted boltzmann machines for malicious traffic detection,” Neuro- ICISSp, vol. 1, pp. 108–116, 2018.
Quantology, vol. 16, no. 5, 2018.
[11] V. L. Thing, “Ieee 802.11 network anomaly detection and attack classi-
fication: A deep learning approach,” in 2017 IEEE Wireless Communi-
cations and Networking Conference (WCNC). IEEE, 2017, pp. 1–6.
[12] Y. Yang, K. Zheng, C. Wu, and Y. Yang, “Improving the classification
effectiveness of intrusion detection by using improved conditional vari-
ational autoencoder and deep neural network,” Sensors, vol. 19, 2019. Huimin Lu received the B.S. degree in computer
[13] X. Xu, J. Li, Y. Yang, and F. Shen, “Toward effective intrusion detection science and technology from Yangzhou University,
using log-cosh conditional variational autoencoder,” IEEE Internet of China, in 2009, and the M.S. degrees in electrical
Things Journal, vol. 8, no. 8, pp. 6187–6196, 2020. engineering from the Kyushu Institute of Technol-
[14] R. Tang, Z. Yang, Z. Li, W. Meng, H. Wang, Q. Li, Y. Sun, D. Pei, ogy, Kitakyushu, Japan, and Yangzhou University,
T. Wei, Y. Xu et al., “Zerowall: Detecting zero-day web attacks through Yangzhou, China, in 2011, and the Ph.D. degree
encoder-decoder recurrent neural networks,” in IEEE INFOCOM 2020- in electrical engineering from the Kyushu Institute
IEEE Conference on Computer Communications. IEEE, 2020, pp. of Technology in 2014. From 2013 to 2016, he
2479–2488. was a JSPS Research Fellow. He is currently an
[15] H. Hindy, R. Atkinson, C. Tachtatzis, J.-N. Colin, E. Bayne, and Associate Professor with the Kyushu Institute of
X. Bellekens, “Towards an effective zero-day attack detection using Technology, a Visiting Professor with Shanghai Jiao
outlier-based deep learning techniques,” arXiv e-prints, pp. arXiv–2006, Tong University, China, and an Excellent Young Researcher of the Ministry
2020. of Education, Culture, Sports, Science and Technology, Japan. His current
[16] M. Lichman et al., “Uci machine learning repository,” 2013. research interests include computer vision, robotics, artificial intelligence, and
[17] C. Kruegel and T. Toth, “Using decision trees to improve signature-based ocean observing.
intrusion detection,” in International Workshop on Recent Advances in
Intrusion Detection. Springer, 2003, pp. 173–191.
[18] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
recurrent neural network classifier for intrusion detection,” in 2016 In-
ternational Conference on Platform Technology and Service (PlatCon).
IEEE, 2016, pp. 1–5.
[19] B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, J. C. Platt Tian Wang received the B.S degree from Shaanxi
et al., “Support vector method for novelty detection.” in NIPS, vol. 12. Normal University, China, in 2019. He is currently
Citeseer, 1999, pp. 582–588. pursuing his master’s degree in Center for Fu-
[20] X. Yang, K. Huang, J. Y. Goulermas, and R. Zhang, “Joint learning ture Media and School of Computer Science and
of unsupervised dimensionality reduction and gaussian mixture model,” Engineering, University of Electronic Science and
Neural Processing Letters, vol. 45, no. 3, pp. 791–806, 2017. Technology of China. His research interests include
[21] S. Zhai, Y. Cheng, W. Lu, and Z. Zhang, “Deep structured energy based computer vision and machine learning.
models for anomaly detection,” pp. 1100–1109, 2016.
[22] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and
H. Chen, “Deep autoencoding gaussian mixture model for unsupervised
anomaly detection,” in International conference on learning representa-
tions, 2018.
[23] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and
G. Langs, “Unsupervised anomaly detection with generative adversarial
networks to guide marker discovery,” in International conference on
information processing in medical imaging. Springer, 2017, pp. 146–
157. Xing Xu (M’15) received the B.E. and M.E. degrees
[24] J. Wang, S. Sun, and Y. Yu, “Multivariate triangular quantile maps from Huazhong University of Science and Tech-
for novelty detection,” in Advances in Neural Information Processing nology, China, in 2009 and 2012, respectively, and
Systems, 2019, pp. 5060–5071. the Ph.D. degree from Kyushu University, Japan, in
[25] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural 2015. He is currently with the School of Computer
computation, vol. 9, no. 8, pp. 1735–1780, 1997. Science and Engineering, University of Electronic of
[26] J. Weston, S. Chopra, and A. Bordes, “Memory networks,” arXiv Science and Technology of China. He is the recipient
preprint arXiv:1410.3916, 2014. of six academic awards, including the IEEE Mul-
[27] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end memory net- timedia Prize Paper 2020, Best Paper Award from
works,” Advances in neural information processing systems, vol. 28, pp. ACM Multimedia 2017, and the World’s FIRST
2440–2448, 2015. 10K Best Paper Award-Platinum Award from IEEE
[28] A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv ICME 2017. His current research interests mainly focus on multimedia
preprint arXiv:1410.5401, 2014. information retrieval and computer vision. His current research interests
[29] H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for mainly focus on multimedia information retrieval and computer vision.
anomaly detection,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2020, pp. 14 372–14 381.
Ting Wang received her B.S. and M.S. degree in

automation and control science from Northwestern
Polytechnical University, Xi’an, China in 2003 and
2006. She received her PhD degree in signals and
image processing from Paris-est University, Paris,
France, in 2012. She is currently a post Ph.D.
student at Southeast University, School of Instrument
Science and engineering, Nanjing, China. And she is
also an associate professor at Nanjing Tech Univer-
sity, College of Electrical Engineering and Control
Science, Nanjing, China. Her research interest in-
cludes the tele-operation control of the flexible anthropomorphic arm system.

Cognitive Memory-Guided AutoEncoder For Effective Intrusion Detection in IoT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cognitive Memory-Guided AutoEncoder For Effective Intrusion Detection in IoT

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Cognitive Memory-guided AutoEncoder for

Abstract—With the development of the Internet of Things (IoT)

B. Memory Networks III. P ROPOSED M ODEL

where Lrec , Lf ea rec and Lf ea spa denote reconstruction TABLE I

resulting in abnormal samples that can be represented by

TABLE IV details of the comparison baselines are listed as follow: One-

Average loss per batch

For the second experiment setting, we adopted the same TABLE V

to use a convolutional neural network as the main structure of ROC Curve

Ting Wang received her B.S. and M.S. degree in

You might also like