Deep and Fuzzy - 2021

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2021.3062899, IEEE
Transactions on Fuzzy Systems
1
TFS-2020-0774_R1
The Fusion of Deep Learning and Fuzzy Systems: A State-

of-the-art Survey
Yuanhang Zheng, Zeshui Xu, Fellow, IEEE, and Xinxin Wang
designed to learn the internal rule and the representation level

Abstract—Deep learning presents excellent learning of sample data, and then use the learned rules to construct a
ability in constructing learning model and greatly promotes universal model for predicting the unknown data [2]. Because
the development of artificial intelligence, but its of the limitation of the number of trainable parameters, the
conventional models cannot handle uncertain or imprecise traditional machine learning algorithm can only learn a limited
circumstances. Fuzzy systems, can not only depict uncertain number of features. In some cases, it may not be able to form a
and vague concepts widely existing in the real world, but sufficiently complex function, so its generalization ability of
also improve the prediction accuracy in deep learning complex classification is restricted to some extent [3]. However,
models. Thus, it is important and necessary to go through deep learning, in the form of the multi-layer neural network
the recent contributions about the fusion of deep learning (also called the deep neural network, DNN), can form
and fuzzy systems. At first, we introduce the deep learning sufficiently complex functions for fitting features. In other
into fuzzy community from two perspectives: statistical words, it has strong expression ability. Therefore, deep learning
results of relevant publications and conventional deep has aroused increasingly interest of researchers and been widely
learning algorithms. Then, the fusing framework and utilized in practice, such as pattern recognition [4], speech
graphic form of deep learning and fuzzy systems are recognition [5], computer vision [6], auto-controlling [7],
constructed. Followed by, are the current situations of medical system [8] and financial field [9], etc.
several types of fuzzy techniques used in deep learning, As a special algorithm of machine learning, deep learning is
some reasons why use fuzzy techniques in deep learning, a complex non-linear regression model indeed. With the advent
and the application fields of the fusion, respectively. Finally, of neuron model (also called as MP model, because the persons
some discussions and future challenges are provided who came up with it are McCulloch and Pitts) [10], the
regarding the fusion technology of deep learning and fuzzy transformation from biological neuron to artificial neuron,
systems, the application scenarios of fusing deep learning uncovers the prelude of bionic learning in artificial intelligence.
and fuzzy systems, and some limitations of the current Then, the proposition of perception, also regarded as the single-
fusion, respectively. After summarizing the recent layer neural network [11], leads to the first rise of neural
contributions, we have found that this field is an emerging network. Whereas, perceptron can only do simple linear
research direction and it is increasingly paying much more classification tasks, even is unable to solve simple sorting tasks
attention. Especially, fuzzy systems make great effects on such as XOR. To solve the problem of complex computation
deep learning models in the aspect of classification, required by two-layer neural network, Rumelhar and Hinton
prediction, natural language processing, auto-control, etc., proposed the Backpropagation algorithm [12], which drives the
and the fusion is applied into different fields, like but not research upsurge of neural network in the industry again. After
limited to computer science, natural language, medical ten years, the Support Vector Machines (SVM) algorithm,
system, smart energy management systems and machinery invented by Vapnik et al. [13], was born and soon showed the
industry. advantages compared with neural networks in several ways: no
reference needed, high efficiency, and global optimal solution.
Index Terms—Deep learning; Fuzzy systems; Fusion; However, in the decade since the neural network was
Artificial intelligence. abandoned, several researchers have continued to study it.
When the Deep Belief Network (DBN) was proposed by Hinton
in 2006 [1], the concept of deep learning was created and stood
I. INTRODUCTION the first of all in various tests of research fields. Since then, the
research of deep learning has been very popular. Some
T HE emergence of deep learning, has greatly promoted the
vigorous development of artificial intelligence [1]. In order
to make machines as analytically learning capable as humans,
algorithms, such as Deep Boltzmann Machine (DBM) [14],
Convolutional Neural Network (CNN) [15], Recurrent Neural
Network (RNN) [16], etc., have been developed rapidly.
able to recognize and understand data such as text, images and Recently, other extended deep learning models have been
sounds, the machine learning in artificial intelligence is
The work was supported by the National Natural Science Foundation of China (No. Yuanhang Zheng, Zeshui Xu and Xinxin Wang are with the Business School,
71771155). (Corresponding Author: Zeshui Xu, Xinxin Wang.) Sichuan University, Chengdu 610064, China (e-mails:
yuanhang_zheng@foxmail.com; xuzeshui@263.net; wangxinxin_cd@163.com).
1063-6706 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 22:44:05 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2021.3062899, IEEE
2
established, like Attention-Based Long Short-Term Memory learning algorithms have efficiently improved the natural
(ABLSTM) [17], Deep Spatial-Spectral Representation language processing (NLP), such as the cognition,
Learning (DSSRL) [18] and Deep Residual Shrinkage understanding and generation of natural language. Fortunately,
Networks (DRSN) [19]. the fusing models by integrating deep learning and fuzzy
Apparently, such classic deep learning algorithms are systems can not only manage uncertainty, ambiguity and
calculated by crisp values, while the phenomenon of vagueness in NLP, but also provide intelligible and natural
imprecision, uncertainty, and vagueness is common in the real results for users by using linguistic variables [89].
world. For example, when points belong to or only belong to a Therefore, the fuzzy systems make effects on deep learning
single space unit, the points with similar spatial positions may algorithm and this paper aims to make a comprehensive review
fall into different regions when they need to be divided, of the fusion of deep learning and fuzzy systems. The main
resulting in sharp boundary problems. Moreover, when contributions of the paper are highlighted in the following
executing low-frequency image segmentation, fuzzy aspects: 1) introduce the deep learning into fuzzy community
recognition performs better than precise recognition in from two perspectives: some statistical results of relevant
segmentation. Thus, the classic deep learning algorithms based publications in regard to the fusion of deep learning and fuzzy
on precise mathematics may significantly reduce the similarity systems, conventional deep learning models including
between the points, affect the result of pattern recognition and Restricted Boltzmann Machine (RBM), CNN, RNN; 2) present
model learning ability, and then reduce the accuracy of model a framework and a graphical form of fusing the deep learning
prediction. and fuzzy systems, and analyze the current fusions from three
Fortunately, fuzzy systems, as practical and burgeoning points: the fuzzy techniques used in deep learning, the effects
technology, can effectively handle uncertain and imprecise that fuzzy systems working with deep learning, and the
information, which denotes the theory using the basic concept application fields of the fusion; 3) provide some discussions and
of fuzzy set or continuous membership function. When we refer future challenges about the fusion technology of deep learning
to the concept of “youth”, its connotation is clear, but its and fuzzy systems, the application scenarios of fusing deep
extension, that is, in what age stage people are young, is learning and fuzzy systems, and some limitations of the current
difficult to say. Because there is no definite boundary between fusion.
“young” and “not young”, this is a vague concept. As early as The rest of the paper is constructed as follows: Section 2
the 1920s, the famous philosopher and mathematician Russell introduces the deep learning into fuzzy community. Then,
wrote a paper on “vagueness”. He believed that all natural Section 3 reviews the fusion of deep learning and fuzzy systems.
language is ambiguous, such as “red” and “old” [20]. Professor In addition, Section 4 offers some discussions and challenges
Zadeh from University of California published a famous paper of fusing deep learning and fuzzy systems in the future. Finally,
in 1965 [21], in which he proposed an important concept some conclusions are exhibited in Section 5.
“membership function” to express the fuzziness of things for
the first time, which lays the foundation of fuzzy systems. Then
in 1969, Marinos published a research report on fuzzy logic [22], II. BASIC KNOWLEDGE OF DEEP LEARNING
and in 1974, Zadeh published a paper on fuzzy reasoning [23]. In this section, we introduce the deep learning into fuzzy
The fuzzy set (also called type-1 fuzzy set) [21], type-2 fuzzy community from two perspectives: (1) some statistical results
set [24], and linguistic variable [25], are three main fuzzy of publications in regard to the fusion of deep learning and
techniques used in fuzzy systems researches. Since then, fuzzy fuzzy systems; (2) three conventional deep learning models.
systems have become a hot topic.
There are at least three reasons why fuzzy systems have A. Some statistical results of relevant publications
already demonstrated some advantages in the context of deep
learning: The Web of Science (WOS) is a worldwide authoritative
i) Effectively depict the characteristics of vagueness and database, containing a remarkable treasure of scientific contents
uncertainty in common applications used by deep learning and impact literature. As one of important sub-database in WOS,
algorithm. In general, the classes of objects exist in the real the WOS Core Collection consists of 6 Citation Indexes
physical world are not precisely defined by 0 or 1, while they (Science Citation Index Expanded (SCI-EXPANDED), Social
are classified by the membership degree in the interval  0,1
Sciences Citation Index (SSCI), Arts and Humanities Citation
Index (A&HCI), Conference Proceedings Citation Index-
[21]. The fuzzy state is common, and precise mathematics, Science (CPCI-S), Conference Proceedings Citation Index-
probabilistic theory or stochastic theory cannot completely Social Science & Humanities (CPCI-SSH) and Emerging
resolve the fuzzy problems in practice. Sources Citation Index (ESCI)), and 2 Chemical Indexes
ii) Successfully handle boundary point or low-frequency (Current Chemical Reactions (CCR-EXPANDED) and Index
point in image processing. Computer vision is one of the most Chemicus (IC)).
important application of deep learning algorithm. Verified by a When the search topic was set as “‘deep learning’ and ‘fuzzy’”
large number of experimental comparative analysis [26-29], the at the same time in the WOS Core Collection, 798 publications
improved deep learning algorithm with fuzzy space division were selected from January 1994 to November 2020. Then,
performs much better and gets closer to reality than the based on these papers, some statistical results can be obtained
conventional deep learning models. from two aspects: publications by year and total citations by
iii) Reasonably exhibit the imprecision and ambiguity of year.
natural language by linguistic variable. It is known that deep
3
1) Publications by year publications about the fusion of deep learning and fuzzy
systems are not much popular before 2008, and from then on,
According to the data in WOS, the earliest publication
more and more scholars pay close attention about this research
appeared in 1994. To exhibit the development of deep learning
field. According to the trend of previous citation, the number of
and fuzzy systems, the number of publications by year is
total citations in 2021 is going to be greatly larger than before.
illustrated in Fig. 1.
300 280 B. Conventional models of deep learning
250
1) Restricted Boltzmann Machine and its variants
206 Restricted Boltzmann Machine (RBM), as the building block
200
of Deep Belief Network (DBN), is a stochastic generated neural
network that can learn probabilistic distribution from an input
150 128
data set [60]. Inspired by energy function in statistical physics,
100
a RBM is composed by two layers: a visible layer and a hidden
76
layer. Different from conventional Boltzmann Machine (BM)
50 38 [30], each unit in a layer is connected to all units in another
17 layer while there is no connection among all units in the same
1 1 1 1 0 0 4 3 1 1 1 1 3 4 4 7 3 2 2 4 8 1
0 layer, shown in Fig. 3. Because of the flexibility and efficiency
1999
2018
1994
1995
1996
1997
1998
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2019
2020
2021
of computation, RBM has been extensively applied to control
[31], image classification [27] and fault detection [32], etc.
Fig. 1. The number of publications by year Hidden Hidden
layer layer
From Fig. 1, we can easily see that the number of
publications regarding “deep learning” and “fuzzy” is relatively
small from 1994 to 2013, fluctuating around 2, and then Vision
layer
Vision
layer
increases rapidly from 8 papers in 2014 to 280 papers in 2020.
a) Graphical structure of b) Graphical structure of
It can be inferred that the number of papers in 2021 continues BM model RBM model
to increase, even be more than that of 2020 in terms of growth
Fig. 3. Comparison of graphical structure of a) BM model & b) RBM
trends. In addition, even though the fusion of deep learning and
model
fuzzy systems is a novel and developing direction, the speed of
its development is very rapid and breathtaking.
2) Total citations by year
Let v = ( vi )m1 be a layer of visible units, h = h j ( ) 1 n
be a
layer of hidden units, and  = w,a,b be a set of parameters,
After the appearance of the first publication about “deep
learning” and “fuzzy” in 1994, the first citation came up in 1996. where w = ( wij ) is the weight of each edge between vi and
m n
( )
To show the trend of citation, we collect the annual number of
total citations and its percentage from WOS Core Collection h j , a = ( ai )m1 and b = b j 1 n
are the biases of the visible unit
(“Percentage” refers to the ratio of annual citations to total vi and the hidden units h j , respectively [33]. Therefore, the
citations of all years). The results are shown in Fig. 2.
energy function of the RBM is defined as:
2500 45.00% m n m n
40.00% E ( v, h | θ ) = −ai vi − b j h j − vi wij h j

2000 i =1 j =1 i =1 j =1 . (1)
35.00%
30.00% = −a v − b h − h wv
T T T
1500
25.00%
Based on the above function, the joint probability distribution
20.00% of RBM is presented as:
1000
15.00%
e − E ( v , h |θ )
500 10.00%
P ( v, h | θ ) = , (2)
5.00% Z (θ )
0 0.00%
where Z (θ ) = e
− E ( v , h |θ )
1996 1998 20002002 2004 2006 2008 2010 2012 2014 2016 2018 2020 and Z (θ ) is the partition function,
v,h
Number of total citations Percentage
that is the sum of all the possible energies.
Specially, the probability is the result that the energy of a
Fig. 2. The number and the percentage of total citations by year given state is divided by the sum of all the possible energies.
It is shown that the number and the percentage of total Then, based on energy function and joint probability
citations are both low from 1996 to 2008, and then increase distribution, the conditional probabilities of the RBM can be
rapidly since 2009, especially from 2016 to 2020 with blowout given as:
growth. The probable reason behind the data is that the
4
 m 
P ( h j = 1| v ) = sigmiod   vi wij + b j  , (3) h (1) h (2) h(t )
 i =1 
w (1) ,a (1) ,b (1) w (2) ,a (2) ,b (2) w (t ) ,a (t ) ,b (t )
 n 
P ( vi = 1| h ) = sigmiod   h j wij + ai  , (4)
 j =1  v (1) v (2) v (t )
The training objective of RBM is to maximize the product of
probabilities for a training set V , where V is treated as a Time = 0 1 t
matrix and its each row is regarded as a visible layer v : Fig. 4. CD-1 learning process for an RBM
arg max P( v) . (5)
w
vV Because the classic RBM is only suitable for working on
Or, equivalently, the above function is to maximize the binary-valued data, various developed RBM variants are
expectation of the logarithm probability of V : constructed to handle different situations with multiple types of
  data. Next, two common and wide-used RBM variants are
arg max   log P( v)  . (6) introduced as follows:
w
 vV 
1. For the sake of managing sequential data, Sutskever and
The most widely-used algorithm to train the RBM, that is to Hinton developed the Temporal RBM (TRBM) for high-
optimize the weight matrix w , is the contrastive divergence dimensional data, which can be exhibited as follows [35]:
(CD) algorithm proposed by Hinton, which is first used to train
the “expert product” model [34]. The algorithm uses Gibbs
sampling to update the weights in the process of gradient h (0) h (1) h (2) h(t )
descent, which is similar to the backpropagation algorithm used
in the training process of feedforward neural network. In a (1) a (2) a(t )
convenience, the specific steps of basic one-step contrast w (1) w (2) w (t )
divergence (CD-1) for sampling process can be summarized as b (1) b (2) b(t )
follows:
a) Randomly select a training sample v from V , where v is v (1) v (2) v (t )
also a layer of visible units. Assume t = 1 , we calculate the
Time = 0 1 t
conditional probability of each unit in a hidden layer
 m (1) (1) (1) 
P ( h(1)
j = 1| v ) , that is P ( h j = 1| v ) = sigmiod   vi wij + b j  .
(1) (1) (1)
Fig. 5. The graphic structure of TRBM
 i =1 
2. To tackle with noise data in the process of deep learning,
Then, we select each unit h j  0,1 from P ( h j = 1| v ) to
(1) (1) (1)
the Point-wise Gated RBM (pgRBM) is one of RBM variants
form h (1) = h(1)
j ( ) 1 n
.
to focus on images processing technology, which changes the
network model and the energy function of the RBM [28, 29],
b) Based on h (1) = h(1)
j ( ) 1 n
, we calculate the conditional shown in Fig. 6.
probability of each unit in a visible layer P vi(1) = 1| h(1) , that ( ) Hidden

layer dependent with
classification
Hidden
layer independent
with classification
 n (1) 
is P ( vi(1) = 1| h(1) ) = sigmiod   h(1)
j wij + ai  . Then we select each
(1)
h11 h21 hn11 h12 h22 hn22
 j =1  b 1
w2 b2
unit v  0,1 from P ( v

(1)
i
(1)
i =∣
1 h (1)
) to form v (2)
= (v
(2)
i ) .
w1
m1 a1 a2
c) When t = 2 , the visible layer v and the hidden layer (2)
h can be obtained. Then, the parameters  = w (2) ,a(2) ,b(2) 

(2)
z1 z2 zm v1 v2 vm
can be updated based on v (1) and v (2) , h (1) and h (2) , learning
Conversion Vision
rate should be decreased. layer layer
d) After repeating t times, we get final  = w (t ) ,a(t ) ,b(t )  Fig. 6. The graphic structure of pgRBM
( t −1) (t ) ( t −1) (t )
based on v and v , h and h .
Fig. 4 shows an example of CD-1 learning procedure for an 2) CNN and its variants
RBM: Convolutional neural network (CNN) is a series of
Feedforward Neural Networks with deep structure which
contains convolution computation [36]. It is one of the
representative algorithms of deep learning [37]. The prototype
of CNN was proposed by Waibel et al. in 1989 [38], however,
because of its complex computation and time-consuming, it was
not wide-used. Then, developed by LeCun and his co-
5
researchers, CNNs with the gradient-based learning algorithm, v l −1

successfully solved the handwriting digit classification problem 1 1 1 1 1 wl vl
 −1+1 0 + 1 0 + 1
[39], which kicked off a wide range of researches towards CNN.
-1 0 -3 0 1 -1 0 0 bl 9 7 8
Distinguished with other neural networks, CNN had fewer 0 + 1 0 + 1 0 + 1
parameters and higher computation efficiency [40, 41]. In early 2 1 1

0 + 1
-1
0 + 1
0
1 + 1
 0 0 0  1 = 11 11 13
studies, CNN was generally applied to image classification [42], 0 -1 1 2 1 0 0 1 8 9 9
object recognition [43], gesture recognition [44] and neural
1 2 1 1 1
style transfer [45], etc. Recently, due to the advancement and
validity of CNN algorithm, it has also been widely utilized in Fig. 8. An example of convolution process
natural language processing [37], recommendation systems
[46], remote sensing science [47] and other fields, which has b) Pooling layer. The pooling layer is also called as sub-
obtained excellent achievements. sampling layer. After feature extraction at the convolution layer,
In general, the overall architecture of typical CNN is the output feature map will be transferred to the pooling layer
composed of three types of layers: convolution layer, pooling for feature selection and information filtering, and the number
layer and fully connected layer. The main function of of input and output feature maps does not change in this layer.
convolution layer is to extract features of the provided specific But due to the sub-sampling operation, each size of output
data by various convolution kernel sizes, and pooling layer is feature maps will be reduced. In general, sub-sampling
designed to select features corresponding to the output from its operation is formulated as:
( )
immediate previous layer. Finally, the features learned from the
network are converted into a one-dimensional vector through vl +1 = sub vl , (8)
l +1
fully connected layer. For convenience, the overall architecture where v and v l
are the input and output of the (l + 1)-th
of CNN is exhibited in Fig. 7.
pooling layer, respectively. sub is the sub-sampling operation.
Inputs Outputs As usual, max pooling operation and average pooling operation
are both in long-term use in the design of CNN.
Max 58 32
4 58 8 12 pooling
Convolution
layer
Convolution
layer 13 42
Fig. 7. The overall architecture of CNN 36 14 32 44
In the following, some specific explanations of each layer are 2 4 42 27

given: 28 24
a) Convolution layer. The convolution layer, unique to CNN Average
algorithm, is designed to extract the features of the input data,
5 13 6 13 pooling 6 22
which contains multiple convolution kernels inside. Each
element of the convolution kernel has a weight coefficient and Fig. 9. Examples of max pooling and average pooling operation
a bias vector, which is similar to the neuron of a neural network.
Besides, each neuron in the convolution layer from the previous The pooling layer also contains an activation function to help
layer is connected to those multiple neurons who are closed to express complex features, which is as follows:
its location. Specially, the size of the adjacent area depends on
the size of the convolution kernel, called “receptive field” in the
hl +1 = f vl +1 , (9) ( )
l +1
literature [48]. When the convolution kernel is working, it will where v is the input of activation function in the (l + 1)-th
scan the input features regularly, multiply and sum the matrix convolution layer, h l +1 is the output of the (l + 1)-th
elements of the input features in the receptive field and
superimpose the bias value [36]: convolution layer. The activation function f is usually the
vl =  vl −1  w l  + bl , (7) rectified linear unit (ReLU), or Sigmoid function or hyperbolic
tangent.
where v l −1 is the input of the l -th convolution layer, vl is the c) Fully connected layer. The function of this layer is to make
output after convolution. w l and bl are the weight coefficient the nonlinear combination of extracted features obtained from
and the bias vector of the l -th convolution layer, respectively. convolution layer and pooling layer by a certain activation
In general, the convolution process can be shown below: function, that is, the full connection layer itself is not expected
to extract features, but uses the existing higher-order features to
complete the learning goal. Therefore, the function of fully
connected layer is express as:
vl +1 = ful ( hl +1 ) , (10)
where h l +1 and v l +1 are the input and output of the (l + 1)-th
fully connected layer, respectively. ful is the activation
6
function in this layer. Usually, ReLU, tanh and softmax Output layer o o(t −1) o(t ) o(t +1)
functions are utilized to obtain the final result of CNN. Unfolded
p through time p
Several popular state-to-state CNN architectures are Recurrent
introduced briefly in the following, such as LeNet-5 [39] and Hidden layer h w layer h(t −1) w h (t )
h ( t +1)
GoogLeNet [49]. u
1. LeNet-5 [39]. Although LeNet came into being in 1990s, Input layer x x (t −1) x (t ) x (t +1)
it did not arouse wide concerns until LeNet-5 was proposed by
Lecun et al. in 1998. The structure of LeNet-5 is shown as: (see Fig. 12. Overall architecture of RNN and its unfolded through time
Fig. 10): Two convolution layers, two sub-sampling layers, two
fully connected layers, and an output layer with Gaussian From Fig. 12, we know that x is a vector, where
x = ( xi( t ) ) , denoting the i -th data of input layer at the t -th
connection.
mT
moment; h is a vector, where h = ( h (j t ) ) , denoting the j -th

Convolution layer
Convolution layer
layer (6@14×14)
Full connection
Full connection
layer (16@5×5)
Sub-sampling
Sub-sampling
Inputs:32×32
nT
(16@10×10)
Outputs:10
(6@28×28)
layer (120)
layer (84)
data of hidden layer at the t -th moment; o is a vector, where
o = ( ok( t ) ) , denoting the k -th data of output layer at the
K T
Fig. 10. Graphical architecture of LeNet-5 t -th moment. Moreover, u is a weight matrix from input layer
to hidden layer, p is also a weight matrix, from hidden layer to
2. GoogLeNet [49]. GoogLeNet, the winner of ILSVRC output layer, and w is a weight matrix as well, representing a
2014, was proposed by Christian Szegedy to reduce weight of the previous value in the hidden layer as the input for
computation complexity of CNN, which is constructed as (see this time.
Fig. 11): a previous layer, six convolution layers, one max In what follows, we can clearly see that how the hidden layer
pooling layer and a filter concatenation. of the previous moment affects the hidden layer of the current
moment in RNN.
Previous layer (1×1)
o1(t ) oK(t ) Output layer

w
p,b
h1(t −1) h2(t −1) hn(t −1) h1(t ) h2(t ) hn(t ) Hidden layer
Convolution layer (1×1) Convolution layer (1×1) Max pooling (3×3)
u, b
t - 1 moment
Convolution layer (5×5) Convolution layer (3×3) Convolution layer (1×1) Convolution layer (1×1)
x1(t ) x2(t ) x3(t ) xm(t ) Input layer
t moment
Fig. 13. The hidden layer at the (t − 1)-th moment affects the hidden
Filter concatenation (1×1)
layer at the t -th moment in RNN
Fig. 11. Graphical architecture of GoogLeNet
It is apparent that the way of recursion is:
3) RNN and its variants (
h(t ) = f u  x (t ) + w  h(t −1) + b , ) (11)
Recurrent neural network (RNN) is a class of recursive o ( t ) = g ( p  s ( t ) + b ) , (12)
neural networks that takes sequence data as input, recurses in
the evolutionary direction of the sequence and all its nodes where b and b are the bias vector of the hidden layer and the
(recurrent units) are connected by chain [36]. As people read output layer, respectively. f and g denote the nonlinear
books or listening to music, our understanding of the next word activation function, such as Logistic or Tanh function.
depends largely on our understanding of the previous words or Next, we introduce various types of RNN changed models in
sentences, and traditional RBM and CNN cannot handle this the following:
situation. Starting from Jordan network in 1986 [50] and Elman 1. Long Short-Term Memory (LSTM) [56]. Although RNN
network in 1990 [51], developed by backpropagation learning algorithm has ability to handle time dependent information, it
algorithm, RNN has occupied an important position in deep cannot deal with “long-term dependencies” issues, which is
learning algorithms till now. Because of the characteristics of proven by Bengio et al. in 1994 [57]. Hence, Hochreiter and
memory ability, parameter sharing and Turing completeness, Schmidhuber [56] proposed a RNN variant called LSTM based
RNN has been successfully applied to Natural Language on special units: blocks and gates, whose structure is exhibited
Processing (NLP), such as speech recognition [52], language in Fig. 14.
modeling [53] and machine translation [54], and is also used in
various time series forecasting [55]. Firstly, the overall
architecture of RNN and its unfolded through time is shown in
Fig. 12.
7
III. FUSION OF DEEP LEARNING AND FUZZY SYSTEMS

c (t −1) c (t )
× + In this section, we review the fusion of deep learning and
tanh fuzzy systems from three aspects: several types of fuzzy
techniques used in deep learning, fusing effects and applied
(t )
f i (t ) × o(t )
fields.
× Firstly, the overall framework of fusing deep learning and
c (t )
fuzzy systems is designed after overviewing the most of
σ σ tanh σ
h(t −1) h(t ) existing literature, shown as Fig. 16. As we can see, in Layer 1,
if the inputs are crisp values, then the precise data should be
transformed into the fuzzy data through the fuzzification and be
x (t ) lined up in a row; if not, then the inputs are lined up in a row
Fig. 14. Graphical architecture of LSTM and go to the next layer. Similarly, a judgement also needs to
be performed before the second layer. If the rule learning needs
From Fig. 14, it is easy to see that the LSTM model removes feedback, then it needs to execute the deep learning algorithms
or adds information through three gates (information with short-term memory to get fuzzy results; if not, then it needs
transformation agencies): Input gate ( i (t ) ), Forget gate ( f ( t ) ) to adopt the deep learning algorithms without short-term
and Output gate ( o(t ) ), which can be defined as: memory to get fuzzy results, and go to the next layer. Finally,
( )
in the fuzzy output layer, the fuzzy results are transformed into
f (t ) =  w f   h(t −1) , x (t )  + b f , (13) crisp values through defuzzification and the crisp outputs are
(
i (t ) =  w i   h(t −1) , x  + b ) ,
(t )
i (14)
obtained.
c (t ) = tanh w c   h(t −1) ( , x  + b ) ,

(t )
c (15)
Inputs
Crisp
c (t ) = f (t )  c (t −1) + i (t )  c (t ) , (16) Yes values?
No
o (t )
(
=  w o   h ( t −1)
, x  + bo ,
(t )
) (17) Fuzzification
x1 x2 x3
x1 x2 x3 Layer 1: Fuzzy input layer
h(t ) = o(t )  tanh ( c (t ) ) , (18) x1 x2 x3
where the internal state c ( t ) in the LSTM is for outputting Need

Yes No
information to the external state of the hidden layer h( t ) . c (t −1) feedback?
depicts the memory unit of the last moment and c ( t ) is the Deep learning with short- Deep learning without short-
candidate state.  represents the product of vector elements. term memory term memory
t
w f , w i , w c and w o are the weight coefficient of different t-1
Layer 2: Rule layer
gates, respectively. Similarly, b f , bi , b c and bo are the bias

vector of different gates, respectively.
2. Bidirectional Recurrent Neural Network (Bi-RNN) [58].
Bi-RNN is a type of deep recurrent neural network with two Defuzzification
layers, which are applied to the occasion where the learning y Layer 3: Fuzzy output layer
goals are related to complete input sequence rather than up to y
the current time step. In speech recognition, for example, the
current word corresponding to the pronunciation may have Outputs
relationship with the word behind, so you need to enter the Fig. 16. The framework of fusing deep learning and fuzzy systems
complete voice [36]. The two chain connections of Bi-RNN
recurse in opposite directions is shown as: Then, the whole graphical form of fusing deep learning and
fuzzy systems is constructed in Fig. 17. Some conventional
Output layer o(t −1) o(t ) o(t +1) deep learning models like RBM, CNN and RNN are fusing with
p2(t +1)
some fuzzy techniques such as type-1 fuzzy set, type-2 fuzzy
p2(t −1) p2(t )
( t −1) (t ) ( t +1)
set and linguistic variable. They have some effects on
w w w
Backward layer 2
h(t −1)
2
h(t )
2
h(t +1) classification, prediction, natural language processing and
( t −1)
p p (t )
p1(t +1) automatic control, so that they have been applied in various
1 1
fields such as computer vision, natural language, financial
w1(t −1) w1(t ) w1(t +1) industry and so on.
Forward layer h(t −1) h(t ) h(t +1)
u2 (t −1) u1(t −1) u2 ( t ) u1(t ) u2 (t +1) u1(t +1)

Input layer x (t −1) x (t ) x (t +1)
Fig. 15. Graphical architecture of Bi-RNN
8
Inputs Outputs Output layer o

v
uncertainty in the membership functions, which is more
Deep learning
Hidden layer
Vision layer
Hidden layer h
u
w Recurrent
layer
efficient to express uncertain and imprecise information before
the model construction [61, 62]. Firstly, from the perspective of
Convolution Convolution
layer layer
Input layer x
RBM CNN RNN

DBN or DNN, a new interval type-2 fuzzy activation layer was
A
1
A
1
proposed to improve the learning performances of DNN [79],
A
1
footprint of uncertainty
Membership Membership and a novel adaptive type-2 fuzzy-based deep reinforcement
0 0
learning algorithm, combining the strengths of DNN and type-

Triangular x Trapezoid x 1 1
membership function membership function
A A  A ( x, u ) Low Medium High
Fuzzy theory 1 1 Low Medium High
0 0
 A ( x, u )
0
0
Triangular membership function
x 0
Trapezoid membership function
x
2 fuzzy set, was developed for sensor control, even acting
x
smartly in some complicated circumstances [80]. Some deep
Gaussian x Sigmoid x
membership function membership function
Type-1 fuzzy set Type-2 fuzzy set Linguistic variable

general type-2 fuzzy systems based on RMB were established
to handle unknown time-varying topology in controller [81] and
capture unsupervised temporal features from wind speed data
Effects
[82], which outperforms type-1 fuzzy systems through some
Natural language
simulation experiences. Next, in cooperation with CNN, type-2
Classification Prediction processing Automatic control
fuzzy set and its fuzzy entropy were efficiently used to select
low-frequency sub-images during the image fusion, which is
Computer vision Natural language Financial industry Medical system Manufacturing
industry
superior to various state-of-the-art fuzzy logic-based fusion
Applied fields techniques and exhibits great potential of middle and deep
Computer science
Time series
forcasting
information
systems
Transportation Environment Smart energy
systems layers in performance improvement [83]. Meanwhile, type-2
fuzzy rough set was introduced to develop a type-2 fuzzy rough
Fig. 17. The graphical form of fusing deep learning and fuzzy convolutional neural network for facial emotion recognition
systems [84]. In addition, as for type-2 fuzzy logic and RNN model, a
new chaotic type-2 transient-fuzzy deep neuro-oscillatory
A. Several types of fuzzy techniques used in deep learning network shows extraordinary strength to resolve the massive
data overtraining and deadlock problems in deep learning [85],
1) Type-1 fuzzy set and deep learning and a novel real-time recurrent interval type-2 fuzzy neural
Type-1 fuzzy set can deal with the issues of uncertain system was presented to identity uncertain bounds [86]. Last
information, redundant informative block, boundary point but not the least, a parsimonious learning machine and its
classification by membership function and control over-fitting extension were proposed in both type-1 and type-2 fuzzy
by expressing data in a fuzzy space instead of presenting them systems, showcasing significant improvements in terms of
with exact numbers. Collaborated with DBN, fuzzy logic based computational complexity and predictive accuracy [87].
on type-1 fuzzy set makes contributions in fusing medical 3) Fuzzy linguistic and deep learning
images, outperforming existing image fusion techniques [26].
Moreover, DBN has been applied to model type-1 fuzzy logic In the real world, besides numerical values, vague
relationship in high order fuzzy time series forecasting [69], and information in the form of language may be from human
the sub-DBNs classifier can be aggregated through fuzzy experts. In order to handle this type of information, some deep
membership weight to obtain final classification result [70]. learning models for processing fuzzy linguistic input and output
Type-1 fuzzy integral with RBM and type-1 fuzzy set with are required. Furthermore, the combination of fuzzy linguistic
DBN are also used for big data classification [71] and decision and deep learning, not only enriches the variants of deep
support systems [72]. As for CNN, type-1 fuzzy logic integrated learning, but also helps to natural language process or other
with CNN and RNN provides supporting to multimodal application fields. Similar to feedforward neural network,
sentiment analysis [73] and short-term load prediction [74], continuous bag of words model (CBOW) is a highly efficient
which can handle the partial or mixed sentiments and control shallow neural network algorithm to generate vector
over-fitting in model learning. In addition, a fuzzy Hazard- representations of a language vocabulary. Qiu et al. proposed a
Operability process associated with CNN is used to code multi- fuzzy information retrieval approach based on CBOW
dimensional process data, overcoming the inherent uncertainty combining some techniques of deep learning and fuzzy set
in traditional Hazard-Operability analysis and obtaining exact theory, in which each document in the query language has a
quantitative risk results [75]. Fuzzy integral and CNN are also membership degree to each corresponding word [88]. It is easy
valid for facial emotion recognition [76]. In terms of to see that lots of novel methods have been developed
combination with RNN, a revised RNN based on type-1 fuzzy combining linguistic variables and RNN models [89, 67, 91-92],
set performs a strong stability and exhibits a lower such as a method called Recurrent Fuzzy Neural Network for
computational cost and a higher accuracy than several state-of- the Chinese text-to-speech system [89], a method based on
the-art methods [77], and an improved recurrent type-1 fuzzy fuzzy linguistic reasoning and RNN for locational marginal
inference system is proposed for time series forecasting [78]. prices forecasting [67], the fuzzy recurrent neural network
methods in which the values of fuzzy time series are linguistic
2) Type-2 fuzzy set and deep learning variables [91], and two types of hierarchical fuzzy cerebellar
Compared with type-1 fuzzy set or type-1 fuzzy logic system, model articulation controller for managing linguistic variables
the type-2 fuzzy set or type-2 fuzzy logic system is equipped in nonlinear systems modeling [92]. Takagi-Sugeno-Kang
with a higher degree of flexibility through the footprint of fuzzy system is one of famous fuzzy classifiers [93]. Recently,
two Takagi-Sugeno-Kang fuzzy classifier variants have been
9
proposed for handling fuzzy linguistic information [68,115], convolutional units to handle uncertainty and extract
and a novel deep fuzzy classifier constructed by Takagi- discriminative features, which improved accuracy of
Sugeno-Kang fuzzy subclassifier, where the fuzzy rules are segmentation. Lastly, some ANFIS variants, fuzzy multilayer
generated by random selection of fixed linguistic terms. The clustering and fuzzy C-means clustering methods based on deep
proposed method developed the computation efficiency and learning were applied to the classification field [97, 106, 107,
classification accuracy in contrast to most benchmark methods 145, 146].
[140]. Moreover, the adaptive neuro-fuzzy inference system
2) Prediction
(ANFIS) is a fuzzy system whose membership function
parameters have been tuned by neuro-adaptive learning Prediction is to measure the situation or figure out the result
methods [59]. A LSTM based deep learning prediction model in advance, and it follows some steps including feature
and linguistic variable based ANFIS prediction model were extraction and learning model construction. Because in the real
both presented for missing sensor data problem in IoT world, individuals tend to have partial or mixed opinion about
ecosystem [96], and further, multiple kernel learning with a target. Assessment information may be imprecise and massive,
ANFIS based deep learning method was proposed for heart so that fuzzy systems combined with deep learning methods can
disease diagnosis [97]. In addition, an intelligent straddle provide a creative perspective for better inference. Firstly, in
trading system was proposed for financial volatility trading via economic and financial prediction, a chaotic type-2 transient-
a set of linguistic if-then semantic fuzzy rules [98]. fuzzy deep neuro-oscillatory network with retrograde signaling
model was conducted for worldwide financial prediction,
B. When and why fuzzy systems work with deep learning successfully resolving the problems of large amounts of data
overtraining and deadlock [85]. Four methods including DBN,
1) Classification
ANFIS were effectively employed to forecast the hotel room
Classification is a set of actions that divides things according prices [109]. A new deep genetic hierarchical network
to different characteristics to make them more regular. Fuzzy methodology was proposed to predict the credit scoring,
systems jointed with deep learning used for classification, not comprising four types of learner, including support vector
only improves the accuracy of classification through deep-level machines, k-nearest neighbors, probabilistic neural networks
and practical feature extraction models, but also settles the and fuzzy systems [110]. Next, in sentiment analysis and
boundary point classification and promotes knowledge prediction, numbers of complex emotions hidden from English
representation and reasoning under uncertainty. Specially, text or other languages or other modalities such as
image classification is one of the most important missions in electrocardiogram signals, can be recognized and inferred
classification, and CNN has been employed remarkably for through CNN, RNN and fuzzy logic system. The computational
picture segmentation [99]. By the means of setting complexity of the combined model was found to be much lower
regularization parameters, CNN with fuzzy logic was used to than the traditional fuzzy classifier [73]. A new approach fusing
learn features for Colonoscopic Polyp detection and had great Takagi-Sugeno-Kang fuzzy rules and deep learning models was
performance [100]. A fuzzy deep model based on fuzzy RBM created to improve the Granger causality, which is one of
was proposed for high-dimensional data classification [141], fundamental principles for calculating brain effective
which improved the classification accuracy when handling connectivity [111]. In the aspect of transportation forecasting, a
directly high-dimensional raw images in comparison to most fuzzy hybrid framework combining CNN, LSTM and fuzzy
established methods. Similarly, in soft tissue sarcoma logic system was conducted to assess the traffic flow in advance
classification, low frequency sub-images were selected by type- [112], and an improved fuzzy neural network was also
2 fuzzy entropy and the features were learned by CNN with developed to forecast travel speed considering periodic
stochastic gradient descent [83]. Other image classification in characteristic [113]. As for the hydropower and wind speed
medical detection using DNN and fuzzy systems can be referred forecasting, a grey wolf optimization method coupled with
to [101]. Furthermore, human actions and emotions are another ANFIS and an interval probability distribution learning
interesting research in image classification. A data processing framework based on RBM and type-2 fuzzy set was developed
technique motived by human perception was exploited based [114, 82]. In the research field of ecosystem, some deep
on neural network and fuzzy function, decreasing the learning and ANFIS hybrid machine-learning models were
classification error in comparison to most of the existing applied for forecasting missing sensor data and dust source,
methods [142]. Bendre et al. [102] coupled the Spatio- which performed better than some conventional prediction
Temporal LSTM with fuzzy logic to remedy uncertainty in methods [96, 117]. Finally, many techniques about deep
human action recognition and generate the weights of the learning algorithm combined with fuzzy time series were
classification model, Nguyen et al. [103] presented a novel constantly developed for time series forecasting [118, 119].
multimodal emotion understanding framework based on CNN
3) Natural language processing
and fuzzy logic to extract high-level emotion features, and
Jamal et al. [104] also used CNN and fuzzy logic to improve Known as the pearl on the crown of artificial intelligence,
the accuracy of human smile detection. Moreover, Yeganejou natural language processing (NLP) explores how to deal with
et al. [143] proposed an architecture combining CNN and fuzzy and use natural language, mainly consisting of the cognition,
clustering to feature extractor and derivation, which performed understanding and generation of natural language, in which the
better than most of benchmark methods in terms of cognition and understanding are to let the computer turn the
classification accuracy. Guan et al. [144] established a novel input language into interesting symbols and relations, and then
deep fuzzy neural network based on fuzzy units and to process the natural language generation system according to
10
converting computer data into natural language. Deep learning with the aid of fuzzy logic system and deep learning algorithm
technology has successfully solved many problems in NLP, and to improve the accuracy of the code [131].
the combination of fuzzy system largely improves the fuzzy
identification and fuzzy logic judgment in morphology, C. Application fields of fuzzy techniques working with deep
complex language translation and other missions. When talking learning
about text retrieval, fuzzy inference system and RNN
In this section, the detailed summary of reports for the
framework helped to exploit a new web blog searching
application fields of fuzzy techniques working with deep
technique, which increased accuracy by 94% compared to
conventional techniques [120]. Deep learning model based on learning are exhibited in TABLE I.
fuzzy matching of topic words was constructed for exploiting
information from Twitter [121]. From the perspective of
sentiment analysis or text classification, Yang et al. [122]
proposed an integrated algorithm based on fuzzy mathematics
and genetic algorithm, called evolutionary fuzzy deep belief
networks with incremental rules, and Wang et al [123]
developed a sophisticated technique based on DBN, fuzzy
clustering and information geometry, which both performed
better than the existing methods for solving sentiment analysis.
In the field of text-to-SQL (that is natural language to logical
form), a fuzzy semantic deep network query model was
proposed based on fuzzy decision and some deep learning
algorithm, such as LSTM, Word2Vec embedding technology
and attention mechanism [124]. As for the machinery
translation, a model based on DNN and fuzzy logic was
proposed for better extracting reordering rules of sentence
structure [125]. In order to make the most use of the
unstructured nursing notes in the clinical decision support
systems, a fuzzy token and deep learning-based approach was
presented for healthcare analysis and disease group prediction
[126].
4) Automatic control
Automatic control refers to the use of additional equipment
or devices in the absence of direct participation, so that the
machine, equipment or production process of a certain working
state or parameters automatically run according to a
predetermined law. Deep learning has shown great superiority
and potential in feature extraction and parameter fitting, the
fusion of fuzzy systems has also resulted in better performance
under uncertainty in control. Firstly, in literature review [127],
various state-in-state methodologies used for data-driven
control were recalled in detail, including SVM, multiple least
square support vector machine, neural network, deep learning,
fuzzy logic, probabilistic latent variable models. Secondly, in
the aspect of robot or intelligent system, an intelligent fuzzy
sliding mode control method based on deep learning and fuzzy
logic was proposed for constructing a complex robot system
under disturbances [128], and a robust adaptive control scheme
using RBM and type-2 fuzzy system was designed for the
fractional-order multi-agent system [81], an online deep fuzzy
learning for control of nonlinear systems combining fuzzy logic
and deep learning was developed with two training stages [147].
All of them displayed better performance than existing control
models. Thirdly, in the machinery field, a new adaptive learning
controller constituted by interval type-2 fuzzy system and deep
learning was established for air-feed on proton exchange
membrane fuel cell plants [80], a proportional-integral-
derivative controller based on fuzzy system and deep learning
was developed for improving the working efficiency of hydro-
turbines [130], and a logic-driven autoencoder was presented
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2021.3062899, IEEE Transactions on Fuzzy Systems
11
TABLE I. SUMMARY OF REPORTS FOR THE APPLICATION FIELDS OF FUZZY TECHNIQUES WORKING WITH DEEP LEARNING
Application Article Aim/Nature of Fuzzy technique Deep learning Benchmark methods for comparison Superiority of the
field problem algorithm proposed method
Computer vision [26] Medical images Type-1 fuzzy DBN Sparse unmixing by variable splitting and augmented Better edge strength,
fusing logic Lagrangian (SUnSAL), Joint sparsity model (JSM), ability to efficiently
Dual-tree complex wavelet transform (DCWT), conserve the informative
pixelwise Convolutional neural network (pCNN), blocks from input images
Probability filtering and region correction (PFRC), and overcome the issue of
Point spread functions (PSF), Texture and edge redundant informative
enhanced images for fusion (TEF), Hidden Markov block
model (HMM), Deep learning (DL)
[66] Improving accuracy Linguistic rules Deep learning FH-GBML-C), GFS-AdaBoost-C, zero-order-TSK-FC, Promising performance
without losing L2-TSK-FC, FS-FCSVM, DBN, t-HFC, and high interpretability
interpretability of LIBSVM(Linear), LIBSVM(Gaussian), HID-TSK-FC
the Takagi–Sugeno–
Kang fuzzy
classifiers
[68] Fuzzy temporal Linguistic RNN Takagi–Sugeno–Kang (TSK)-type recurrent fuzzy Higher generalization
sequence processing information, network (TRFN), Time-delay neural network (TDNN) ability, better results on
and gesture Takagi–Sugeno– both training and test
recognition Kang (TSK)- patterns
type fuzzy
systems
[76] Facial emotion Fuzzy integral CNN AlexNet, GoogLeNet, and LeNet Higher recognition
recognition accuracy
[83] Soft tissue sarcoma Type-2 fuzzy CNN Feed-forward Neural Network (FFNN), Time Delay More robust deep
classification in MR entropy Neural Network (TDNN), Nonlinear Autoregressive features, less maximum
images Neural Network (NARNN), stacked auto-encoder absolute error for hourly
(SAE), DBN, E-GA-APSO-WNN, ELM-HBSA, predictions
Persistence, Support Vector Regression (SVR)
[100] Colonoscopic Type-1 fuzzy CNN CNN Higher sensitivity and
images segmentation logic accuracy to polyp
detection
[101] Image classification Intuitionistic DNN K-nearest neighbor (KNN), Neural networks (NN), Higher prediction rate,
in bone cancer fuzzy rank Multi-perceptron layer network (MLP), Radial basis lower error rate along
identification correlation function (RBF), Back propagation neural networks with the effective
(BPN), DNN segmentation process
[102] Human action Type-1 fuzzy Spatio-Temporal Raw Skeleton, Joint Feature, CHARM, Hierarchical Higher accuracy of action
recognition logic LSTM RNN, Deep LSTM, Deep LSTM+Co-occurrence, recognition
Clips+CNN+Concatenation, Clips+CNN+Pooling,
12
Clips+CNN+MTLN, ST-LSTM (w/o Attention), ST-
LSTM (w/o Attention)+Trust Gate
[103] High-level emotion Type-1 fuzzy CNN CNN Better performance in
features extraction logic feature extraction
from text, audio, and
visual modalities
[104] Face smile detection Type-1 fuzzy CNN Machine learning, deep learning Higher accuracy for face
logic recognition
Natural [73] Multimodal Type-1 fuzzy CNN， RNN Transfer Deep Network (TDN), Rule3(R3), SVM, Higher classification
language sentiment analysis logic Naive Bayes (NB) accuracy
[88] Fuzzy information Type-1 fuzzy set Word embedding Continuous bag of words model (CBOW), feedforward Better performance in
retrieval and continuous bag- neural network language model (FFNNLM) information retrieval
of-word model
[89] Chinese Linguistic RNN Time-domain pitch synchronous overlap add (TD- Better performance in
text-to-speech variable PSOLA) method intelligibility and
system naturalness of the
synthetic speech
[120] Intelligent Type-2 fuzzy RNN Deep auto-encoder, DNN, and Artificial neural Enhanced classification
personalized web inference system networks accuracy
blog searching
[122] Customers’ Fuzzy DBN Transductive SVM (TSVM), Fuzzy deep belief Significant improvement
sentiment mathematics networks (FDBN) in classification accuracy
classification
[123] Sentiment Fuzzy clustering DBN TSVM, FDBN, deep belief network using information Significant improvement
classifications geometry (NGL-DBN) in classification accuracy
[124] Text-to-SQL (that is Fuzzy decision LSTM, Word2Vec sequence-to-sequence (seq2seq), SQLNet, TypeSQL, Faster convergence and
natural language to embedding IncSQL, SyntaxSQLNet better performance
logical form) technology and
attention
mechanism
Financial [67] Locational marginal Fuzzy linguistic RNN Back-propagation neural network Better accuracy
industry prices forecasting reasoning
[85] Worldwide financial Type-2 fuzzy RNN Feed-forward backpropagation networks (FFBPN), Reliable daily forecast
prediction logic support vector machine (SVM), Deep neural network- with a relatively stable
principal component analysis (DNN-PCA), Interval degree of accuracy
type-2 fuzzy-neuro network (IT2FNN), chaotic type-1
fuzzy neuro-oscillatory network (CT1FNON)
13
[98] Financial volatility Linguistic if- Neural-fuzzy 3-layered back-propagation trained feed-forward neural Better modeling and
trading then semantic semantic network network with 20 hidden nodes (FFNN-BP), radial basis forecasting performance
fuzzy rules function (RBF) network, cerebellar model articulation
controller (CMAC), Hybrid neural fuzzy inference
system (HyFIS), ANFIS
[125] Reordering rules of Fuzzy logic DNN Rule based Part of speech (POS) tagging, cosine Better performance in
sentence structure sililarity based POS tagging case of verb, pronoun,
extracting adverb and conjunction
[126] Automated Fuzzy similarity Term weighting, Regression and hybrid methods based on Neural Better performance in
classification of word embedding network, deep learning, fuzzy logic, probabilistic latent prediction
unstructured clinical (Doc2Vec), and variable model, SVM
nursing notes coherence-based
topic modeling
(Latent Dirichlet
Allocation)
Medical system [109] Hotel room price Fuzzy system DBN, ANFIS Deep belief, ANFIS, polynomial smooth support vector Outstanding
forecasting machine (PSSVM) performances in the field
of forecasting hotel room
prices in the hospitality
and tourism sector
[110] Credit scoring Fuzzy system probabilistic neural Multicriteria Convex Quadric Programming (MCQP), Better prediction
prediction networks SVM, Homo. Classifier, Clustering-launched accuracy
classification (CLC), Weight-adjusted boosting
ensemble method (WABEM)-SVM, Vertical bagging
decision trees, Artificial immune classifier based on the
artificial immune network (AINE-based classifier),
Heterogeneous hybrid ensemble with consensus
approach combination rule, MLP, Fuzzy rule-based
classifier with multi-objective evolutionary
optimization, Gaussian mixture model, iFair-b
Manufacturing [72] Clinical decision type-1 fuzzy set Deep learning Fuzzy Inference System (FIS), DBN, DNN, Sequential Better classification
industry support systems framework Minimal Optimization (SMO), CNN, Fully Connected accuracy
Layer First (FCLF), Fuzzy-Rough Nearest Neighbor
(FRNN), Wavelet transformation (WT), interval type-2
fuzzy logic system (IT2FLS), Data-driven Fuzzy
Decision Support System (DDFCDSS), Random
Sampling (RS), Cross Validation (CV)
Time series [75] Real-time risk Fuzzy hazard- CNN Multi-layer Perceptron (MLP), CNN, Ladder network Better generalization
forecasting warning of process operability (LN), Generative adversarial network (GAN) abilities and better
industries process learning accuracy
14
especially for the
unbalanced samples
[97] Heart disease Fuzzy logic Multiple kernel Least Square with Support Vector Machine (LS with Higher sensitivity, higher
diagnosis learning with SVM), General Discriminant Analysis and Least Square specificity and less Mean
ANFIS Support Vector Machine (GDA with LS-SVM), Square Error
Principal Component Analysis with Adaptive Neuro-
Fuzzy Inference System (PCA with ANFIS), Latent
Dirichlet Allocation with Adaptive Neuro-Fuzzy
Inference System (LDA with ANFIS)
smart energy [69] High order fuzzy Type-1 fuzzy DBN, LSTM Multilayer perceptron (MLP), LSTM, DBN and SVM Statistical superior
management time series logic performance
systems forecasting
[74] Short-term load Type-1 fuzzy CNN Seasonal Autoregressive Integrated Moving Average Better performance
prediction logic (SARIMA), Probabilistic Weighted Fuzzy Time Series
(PWFTS), Weighted Fuzzy Time Series (WFTS),
Integrated Weighted Fuzzy Time Series (IWFTS),
LSTM
Computer [70] Intrusion detection Fuzzy DBN KNN, Multinomial Naive Bayes (MultinomialNB), Higher overall accuracy,
science system building membership Random Forest (RF), SVM, ANN, DBN recall, precision and F1-
information weight score, better performance
systems in terms of accuracy,
detection rate and false
positive rate
[80] Fuel cell air-feed Type-2 fuzzy set DNN PI, Model-Free SMC, SIT2-FPI, DDPG based SIT2-FPI Superior robustness
sensors control performance and
outstanding learning
ability
[81] Fractional-order type-2 fuzzy RBM Type-1 fuzzy system (T1FS), type-1 fuzzy systems Good performance under
multi-agent systems systems based on a structure evolving algorithm (T1FS-SE), time-varying topology,
controlling radial basis function (RBF), RBF based on a clustering unknown dynamics and
method (RBF2) and linguistic modeling approach external disturbances
[82] Wind speed Type-2 fuzzy RBM Feed-forward Neural Network (FFNN), Time Delay More robust deep features
forecasting systems Neural Network (TDNN), Nonlinear Autoregressive can be obtained
Neural Network (NARNN), Stacked Auto-Encoder
(SAE), DBN
[86] Uncertain bounds Type-2 fuzzy RNN Type-1 FNN schemes, real-time recurrent interval type- Similar identification
identification logic system 2 FNN scheme, type-reduced interval type-2 FNN performance with faster
structure real-time identification
15
[87] Data stream Type-1 fuzzy Self-adaptive DFNN, GDFNN, FAOSPFNN, eTS, simp_eTS, Significant improvements
regression system, type-2 neuro-fuzzy GENEFIS, PANFIS, pRVFLN in terms of computational
fuzzy system systems complexity and number
of required parameters,
while attaining
comparable and often
better predictive accuracy
[92] Nonlinear systems Linguistic RNN Recurrent fuzzy cerebellar model articulation controller Lesser training time,
modeling variables (R-FCMAC) and RNN greater convergence
speed, lesser parameters,
simpler structure
[96] Missing sensor data Linguistic ANFIS, LSTM Deep learning, GP-ANFIS, SC-ANFIS, FCM-ANFIS, Higher prediction
prediction in IoT variable GKR, SVR accuracy and more
stability of missing sensor
data
[114] Hydropower Type-2 fuzzy set RBM, ANFIS ANFIS Ability of forecasting the
generation hydropower generation
prediction satisfactorily
Transportation [71] Big data Type-1 fuzzy DBN MR-RBM Better performance on
classification integral, type-1 testing accuracy
fuzzy set
[111] Non-linear effective Takagi–Sugeno– DBN Linear Granger Superiority in detection of
connectivity Kang fuzzy rules effective connectivity
estimation
environment [112] Short-term traffic Fuzzy logic CNN, LSTM Sub-models improved KNN algorithm (ST-KNN), Higher accuracy and
flow prediction system improved deep learning algorithm (ST-DL), common robustness
time series prediction algorithm (ARIMA), LSTM,
FNNs, Gated recurrent unit (GRU), and static fuzzy
hybrid method (DCMix)
[113] Traffic speed Gaussian fuzzy Evolving fuzzy Back Propagation Neural Network (BPNN), Nonlinear Strong learning ability
prediction membership neural network Autoregressive with Exogenous Inputs Neural Network
considering periodic function (NARXNN), Evolving Fuzzy Neural Network (EFNN),
characteristic Support Vector Machine Radial Basis Function (SVM-
RBF), ARIMA, VAR
[117] Dust source Fuzzy set ANFIS ANFIS-BA, ANFIS-CA, ANFIS-DE Better prediction
modeling and accuracy
prediction
16
From TABLE I, the fusion of deep learning and fuzzy systems the direction of hybridization of neural networks and fuzzy
is applied into various fields, like but not limited to computer models that add layers for data fuzzification and for learning
vision, natural language, financial industry, medical systems, fuzzy rules [149], called as fuzzy deep neural network [150] or
manufactory industry, time series forecasting, smart energy neuro-fuzzy network [151]. The advantages of these models
management systems, computer science information systems, could help to define the poorly data that are incompleteness,
transportation and environment. Moreover, many types of deep uncertainty, or vagueness, and further improve the performance.
learning models are working with fundamental fuzzy Although the efficacy of the fusion models of deep learning and
techniques, which not only broadens the application fields of fuzzy systems depends on datasets and architectures, it seems
fuzzy systems, but also improves the learning capability of deep to be generally applicable as features for the strategy of passing
learning algorithm. outputs from networks, like CNN [143]. The other way is to
establish fuzzy neural network with complicated replication and
a specific usage that is disregarded by the wider community, as
IV. DISCUSSIONS AND FUTURE CHALLENGES
described in [152, 153]. Each module follows a feature map and
A. Fusion technology of deep learning and fuzzy systems aims to build a connection between the features and the result.
Under such a situation, models perform well and could achieve
Due to that fuzzy methods could handle uncertain and a robust and accurate result in general, especially for
complex information in deep learning framework, the fusion of predictions based on data dependencies, like time series, but the
deep learning and fuzzy systems is unequivocally superior to limitations are also obvious in image proceeding [154]. The
the fuzzy systems for clustering algorithms, providing a more fuzzy module could be seamlessly integrated into the deep
robust result, especially in image recognition [144]. Some learning network, and satisfy the end-to-end learning. Therefore,
comments about the fusion technology of fuzzy systems and it is a key technique to integrate with advanced deep learning
deep learning are conducted as follows: algorithms. Currently, the conventional deep learning
1) Considering fuzzy techniques in deep learning models, algorithms, some advanced methodologies, such as seq2seq
type-1 fuzzy sets, type-2 fuzzy sets and linguistic variable [137], attention mechanism [138], bidirectional encoder
forms are most used in the models. Amongst them, type-1 fuzzy representation from transformers (BERT) [139], etc., nearly
systems and type-2 fuzzy systems, with specific membership have no integration with fuzzy systems. Even as the most
functions, have been widely integrated with RBM, DBN, CNN, popular algorithm used in NLP, bidirectional encoder
and RNN algorithms [69, 71, 73, 77, 79, 81, 83, 85]. Type-2 representation from transformers, has not been extended in the
fuzzy system is somewhat an advanced version of type-1 fuzzy fuzzy context for NLP now, according to the results of zero
system with two dimensions to depict the uncertain information, publication searched in WOS when the topic was “‘fuzzy’,
but more complex computation and more parameters will be ‘bidirectional encoder representation from transformers’ and
brought into the deep learning models at the same time. ‘natural language processing’” in WOS. Moreover, ANFIS is a
Linguistic variable forms frequently handle NLP problems with creative method, combining the advantages of fuzzy system and
RNN or LSTM rather than RBM, DBN, and CNN [67, 96]. The neural network. If more novel methods are created by
differences exist among three fuzzy techniques are membership combining advanced fuzzy techniques and deep learning
degrees. It is a crisp value in type-1 while type-2 fuzzy set algorithms, that will be better. Whatever, how to combine fuzzy
expresses more uncertainty through multiple values in its technique with deep models is still an open subject for scholar
membership function, and linguistic variable is designed to to further explore.
elaborate quantitative information for fuzzy systems. Actually,
there are several extensions of type-1, type-2 and linguistic B. Application scenarios of fusing deep learning and fuzzy
variable that have not been generally used in deep learning systems
models. For example, interval-valued fuzzy sets [90],
Fuzzy systems have played an important role in constructing
intuitionistic fuzzy set [94], hesitant fuzzy set [95], and fuzzy
deep learning framework, like but not limited to classification,
multisets [108], are four acceptable variants of type-1 fuzzy set, prediction, NLP and auto-control. The major application
since they have superiorities that permit users to depict some scenarios of fusing deep learning and fuzzy systems can be
uncertainties on the membership functions of the elements. In
divided into the following categories:
terms of type-2 fuzzy sets, interval type-2 fuzzy sets [24, 62],
(1) Image processing. There are various kinds of
general type-2 fuzzy set [116], constrained type-2 fuzzy set
uncertainties in images, and many useful characteristics or rules
[148] have been greatly appropriate for various situations. The
are fuzzy in nature according to human experience, such as the
same case as fuzzy linguistic-based information, hesitant fuzzy shape, color, and pixels like patches [144]. The fusion of deep
linguistic term set [63, 64, 129, 133], probabilistic linguistic learning and fuzzy systems could overcome these uncertain and
term set [132, 134] double hierarchy hesitant fuzzy linguistic
complex information to provide a more robust segmentation
set [105], and nested probabilistic-numerical linguistic term set
result. The objective of the fuzzy learning model is to learn the
[135] are usual forms. Because fuzzy systems are popular in
intrinsic, complex rules between the feature map and the
fusing with deep learning, they may be as the extended research
corresponding segmentation result. Particularly, fuzzy
directions at the level of fuzzy techniques in fusing deep classifier has been popular used for image classification, such
learning and fuzzy systems. as CNN with fuzzy c-means clustering [155], and Takagi-
2) At present, there are two main ways to truly embrace the
Sugeno-Kang (TSK) fuzzy classifier [140]. In addition, the
fuzzy approach in artificial neural networks. One is to move in
fusion of deep learning and fuzzy systems have been applied to
17
various types of image processing. For example, person computer science [70], NLP [89], medical system [109], smart
reidentification, as one of the complex pedestrian images, are energy management systems [69], machinery industry [72] and
suitable for the fusion methods because features in the original so on. The main impacts of utilizing fuzzy systems in deep
space is highly nonlinear and inseparable. Zhang et al. [145] leaning are as follows: i) effectively describing the fuzzy
proposed an unsupervised person reidentification to learn a characteristic in common applications of deep learning
suitable feature space and produce fuzzy labels for unlabeled algorithm. The fuzzy state is common, and precise mathematics,
pedestrian images to reduce the risk of overfitting. There is also probabilistic theory and stochastic theory cannot resolve the
great room for it to grow and develop: i) Characteristic fuzzy problems in real world [21]; ii) assisting humans to
description and selection are important missions in learning express uncertain or imprecise information in the form of
model construction. Although some fuzzy techniques have been linguistic variable. With the emergence of more and more
utilized to depict the uncertainty in image segmentation, the complex problems in practice, qualitative information becomes
extension of computing time and the increasing complexity of ubiquitous and indispensable in the decision-making process.
computing space become urgent challenges to be solved due to Besides, linguistic information is an effective way to represent
the introduction of more parameters in the learning process. qualitative information because of the limitation of thinking or
Thus, some novel fusion algorithms with high computational artificial characteristics; iii) being more interpretable than
efficiency need to be initiated to handle complex issues in traditional deep learning models when solving some certain
practice; ii) Ambiguity is one of important features of image problems, because of the membership function that assigns a
processing. In the framework of image proceeding, the same fuzzy linguistic term label for the feature points with deep
image may denote different meanings in different contexts, and learning algorithm [144], which can be further interpreted by
the same image means different things under different cases, so heat maps clearly in the guided back-propagation [143].
that the label and interpretation are ambiguous. It is an
important challenge to tackle with the fuzziness and the C. Limitations of the current fusion
ambiguity.
Deep leaning, although it is a powerful and useful technique
(2) Decision making. It is an important subject referred to
to establish prediction model, it is usually criticized as “black
computer science, management science, and operational
research. Deep learning can be used in various stages of box” due to the difficulty to explain, and it often faces some
decision making, including data fusion [95], evaluation [96] criticisms such as if training data is limited or algorithm is
complex, and if poor-quality data or unsuitable parameters
and optimization [97]. There are some thoughts considering
tuning lead to biased and unsatisfied prediction results (please
deep learning and fuzzy methods in decision making: i)
see [156, 157] for more criticisms regarding deep learning). On
Membership functions and fuzzy rules are key issues in fuzzy
the other hand, fuzzy systems topics are often faced with some
systems. They are always selected by experience or the intuition
of decision makers in previous study of fuzzy system. Whereas, criticisms from outside the community, please see [65] for more
with the help of deep learning, suitable membership functions details. The fusion of deep learning and fuzzy systems, although
it indeed improves the prediction or classification accuracy
or fuzzy rules will be decided according to the pre-built models.
compared with traditional deep learning models, some potential
Furthermore, the whole fuzzy system may be constructed
drawbacks still exist:
through learning in advance; ii) The determination of attribute
1) Membership function may lead to uncertain computational
or scheme significance is one of the most important missions in
fuzzy decision-making. Previously, the weights of attributes or complexity. Under the same environment, some membership
schemes are calculated by mathematical programming methods functions enable the fuzzy learning to have less running time
while some lead to more complicated algorithm, like LSFC
or optimistic algorithms [136], which are determined through
[144]. The structure becomes more complex and the
some limited data of a decision-making case. The weights will
computation costs more time, compared with conventional deep
be altered if the decision-making case changes, but at the most
time, they are not affected by the decision-making context. learning networks [158-160]. The deep learning models have
Thus, it is necessary to obtain general weights for attributes or already had complex architectures, and fuzzy systems will bring
more parameters to be tuned such as membership function and
schemes through deep learning models; iii) Researches about
fuzzy rules into the fusion models. Thus, the computational cost
large-scale group decision making or large-scale group
is considerably high due to complicated structure and learning
consensus reaching process arouse interests of more and more
style. For example, ANFIS is one of typical models in the fusion
scholars. Especially, with the rapid development of block chain,
the study of large-scale group consensus reaching methodology of deep learning and fuzzy systems, which also faces some
becomes a popular research direction. In the block chain, people limitations [160, 161].
2) The interpretability has not been improved much if fuzzy
are willing to provide qualitative opinion in the form of
idea is added to original deep networks in the most current
linguistic variable or quantitative assessment information by
models [143, 160], and the model interpretability and accuracy
crisp values. Thus, it is important to construct a self-adaptive
are often not guaranteed simultaneously [162]. As we can see,
decision-making model to decide the consensus reaching
mechanism in block chain, and deep learning algorithms can the interpretability is one of the most important drawbacks that
make a difference. traditional deep learning models have. Although the work
combined with fuzzy systems and CNN improves some
(3) Other fields. Because the fusion of deep learning and
interpretability of previous CNN to some degree through a case
fuzzy systems has successfully added the accuracy of prediction
study, the results, however, cannot logically be generalized
and classification, or handled the fuzzy characteristics in
through more benchmark datasets or simulation experiments
learning model construction, they have also been applied to
18
[143]. Moreover, the “interpretability” is not rigorously defined prediction or classification accuracy compared with traditional
beyond the intuitive use [143]. deep learning models, some limitations still exist with respect
3) It is hard to determine so many units in the fusion models to computational complexity, interpretability, parameters
of deep learning and fuzzy systems [159]. In traditional deep setting and original data restricted, which not only provides a
learning networks and the models combined with fuzzy systems, comprehensive knowledge of the fusion of deep learning and
the number of units in hidden layer and the number of hidden fuzzy systems, but also may extend to some research directions
layers in the structure need to be determined. Furthermore, in the future.
many parameters in the deep architecture should be set, such as To sum up, the fusion of deep learning and fuzzy systems
Gibbs step, learning rate, and batch learning, which will affect shows a broad research towards the space of theory and
the performance of the methods [159]. application. With the development of science and technology,
4) When the input lies in the interval  0，
1 , it is hard to work the deep learning with fuzzy systems will play more important
role in artificial intelligence.
with fuzzified data, and it is not be so straightforward to just for
a fuzzification in artificial neural networks [142]. Although
fuzzy methods have been widely applied in the field of pattern REFERENCES
recognition [140, 142], it is also rare for a preprocessing stage [1] G. E. Hinton, R. R. Salakhutdinov, “Reducing the dimensionality of data
with neural networks,” Science, vol. 313, pp. 504-507, 2006.
for traditional neural networks combined with fuzzy [2] M. I. Jordan, T. M. Mitchell, “Machine learning: Trends, perspectives,
information. There is a similar work in [140, 142], but it is a and prospects,” Science, vol. 349, pp. 255-260, 2015.
tightly restricted to work with a specific dataset (such as [3] Y. LeCun, Y. Bengio, G. E. Hinton, “Deep learning,” Nature, vol. 521,
MNIST [155]) and cannot be used for other datasets. Therefore, pp. 436-444, 2015.
[4] Z. R. Wang, J. Wang, Y. R. Wang, “An intelligent diagnosis scheme
it may need a long time to develop for fuzzy-inspired data based on generative adversarial learning deep neural networks and its
preprocessing, as well as an open research issue. application to planetary gearbox fault pattern recognition,”
Neurocomputing, vol. 310, pp. 213-222, 2018.
[5] E. Arisoy, A. Sethy, B. Ramabhadran, et al., “Bidirectional recurrent
V. CONCLUSIONS neural network language models for automatic speech recognition,”
IEEE International Conference on Acoustics, Speech and Signal
Deep learning presents excellent learning ability in Processing, pp. 5421-5425, 2015.
constructing learning model and greatly promotes the [6] S. A. Bello, S. S. Yu, C. Wang, “Review: deep learning on 3D point
development of artificial intelligence. Meanwhile, fuzzy clouds,” Remote Sensing, doi:10.3390/rs12111729, 2020.
[7] M. Roopaei, P. Rad, M. Jamshidi, “Deep learning control for complex
systems play an important role in depicting uncertain and and large scale cloud systems,” Intelligent Automation and Soft
imprecise concepts widely existing in the real world. More Computing, vol. 23, pp. 389-391, 2017.
specially, the researches in regard to the fusion of deep learning [8] Y. Y. Xu, Z. Liu, Y. J. Li, et al., “Feature data processing: Making
and fuzzy systems combine both the advantages of deep medical data fit deep neural networks,” Future Generation Computer
Systems, vol. 109, pp. 149-157, 2020.
learning algorithms and fuzzy systems. They not only establish [9] J. Huang, J. Y. Chai, S. Cho, “Deep learning in finance and banking: A
the excellent learning or prediction model with high accuracy, literature review and classification,” Frontiers of Business Research in
but also dispose the circumstance of uncertainty and vagueness. China, doi:10.1186/s11782-020-00082-6, 2020.
In order to present the advanced analysis, we have gone [10] W. S. McCulloch, W. Pitts, “A logical calculus of the ideas immanent in
nervous activity,” Bulletin of Mathematical Biology, vol. 5, pp. 115–133,
through the recent contributions about the fusion of deep 1943.
learning and fuzzy systems. Firstly, the deep learning has been [11] F. Rosenblatt, “The perceptron: a probabilistic model for information
introduced into fuzzy community from two aspects: statistical storage and organization in the brain,” Psychological Review,
results of relevant publications about the fusion of deep learning doi:10.1037/h0042519, 1958.
[12] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations
and fuzzy systems, and conventional deep learning algorithms. by back-propagating errors”, Nature, vol. 323, 533-536, 1986.
We have found that this field is an emerging research direction [13] V. N. Vapnik, “The nature of statistical learning theory,” Springer,
and has been increasingly paid much attention. Then, the fusion doi:10.1007/978-1-4757-3264-1, 1995.
of deep learning and fuzzy systems has been comprehensively [14] R. R. Salakhutdinov, G. E. Hinton, “An efficient learning procedure for
deep boltzmann machines,” Neural Computation, vol. 24, pp. 1967-
reviewed. We have constructed the fusing framework and the 2006, 2012.
graphic form for overview, and analyzed the current situation [15] Y. LeCun, Y. Bengio, “Convolutional networks for images, speech, and
of several types of fuzzy techniques used in deep learning, some time series”, Handbook of Brain Theory & Neural Networks, 1995.
reasons why fuzzy techniques are used in deep learning, and the [16] S. E. Hihi, M. Q. Hc-J, Y. Bengio, “Hierarchical recurrent neural
networks for Long-Term dependencies,” Advances in Neural
application fields of the fusing. After analyzing the recent Information Processing Systems, vol. 8, pp. 493-499, 1995.
contributions, it is concluded that fuzzy systems make great [17] Y. Wang, M. Huang, L. Zhao, et al., “Attention-based LSTM for aspect-
effects on deep learning framework in the aspect of level sentiment classification,” Proceedings of conference on empirical
classification, prediction, NLP, auto-control, etc., and the methods in natural language processing, pp. 606-615, 2016.
[18] Q. Q. Yuan, Q. Zhang, J. Li, et al., “Hyperspectral image denoising
fusion is applied into different fields, like but not limited to employing a spatial-spectral deep residual convolutional neural
computer science, natural language, medical system, smart network,” IEEE Transactions on Geoscience and Remote Sensing, vol.
energy management systems and machinery industry. In 57, pp. 1205-1218, 2019.
addition, we have discussed the fusion technology of deep [19] M. H. Zhao, S. S. Zhong, X. Y. Fu, et al., “Deep residual shrinkage
networks for fault diagnosis,” IEEE Transactions on Industrial
learning and fuzzy systems and the application scenarios of the Informatics, vol. 16, pp. 4681-4690, 2020.
fusion methods in details, and then provided some future [20] B. Russell, “Principles of social reconstruction,” Allen & Unwin,
challenges, respectively. We have also found that although the doi:10.4324/9780203714393, 1920.
fusion of deep learning and fuzzy systems indeed improves the [21] L. A. Zadeh, “fuzzy set,” Information and Control, vol. 8, pp. 338-353,
1965.
19
[22] P. N. Marinos, “Fuzzy logic and its application to switching systems,” [47] W. X. Zhang, C. Witharana, A. K. Liljedahl, et al., “Deep convolutional
IEEE Transactions on Computers, vol. 18, pp. 343-348, 1969. neural networks for automated characterization of arctic ice-wedge
[23] L. A. Zadeh, “Fuzzy logic and its application to approximate reasoning,” polygons in very high spatial resolution aerial imagery,” Remote
Information Processing, vol. 74, pp. 591-594, 1974. Sensing, vol. 10, pp. 1487, 2018.
[24] J. M. Mendel, R. I. John, F. Liu, “Interval type-2 fuzzy logic systems [48] M. Lan, Y. P. Zhang, L. F. Zhang, et al., “Global context based automatic
made simple,” IEEE Transactions on Fuzzy Systems, vol. 14, pp. 808- road segmentation via dilated convolutional neural network,”
821, 2006. Information Sciences, vol. 535, pp. 156-171, 2020.
[25] L. A. Zadeh, “The concept of a linguistic variable and its application to [49] C. Szegedy, W. Liu, Y. Q. Jia, et al., “Going Deeper with Convolutions,”
approximate reasoning-I,” Information Sciences, vol. 8, pp. 199-249, IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9,
1975. 2015.
[26] M. Kaur, D. Singh, “Fusion of medical images using deep belief [50] M. I. Jordan, “Serial order: A parallel distributed processing approach
networks,” Cluster Computing, doi:10.1007/s10586-019-02999-x, 2019. (Tech. Rep. No. 8604),” San Diego: University of California, Institute
[27] S. Feng, C. L. P. Chen, C. Y. Zhang, “A fuzzy deep model based on fuzzy for Cognitive Science, 1986.
restricted boltzmann machines for high-dimensional data classification,” [51] J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, pp.
IEEE Transactions on Fuzzy Systems, vol. 28, pp. 1344-1355, 2020. 179-211, 1990.
[28] Y. Tang, R. R. Salakhutdinov, G. E. Hinton, “Robust boltzmann [52] J. Wang, “Speech recognition in English cultural promotion via
machines for recognition and denoising,” IEEE Computer Society recurrent neural network,” Personal and Ubiquitous Computing, vol. 24,
Conference on Computer Vision and Pattern Recognition, pp. 2264- pp. 237-246, 2020.
2271, 2012. [53] H. M. Noaman, S. S. Sarhan, M. A. A. Rashwan, “Enhancing recurrent
[29] B. K. Sohn, G. Zhou, C. Lee, et al., “Learning and selecting features neural network-based language models by word tokenization,” Human-
jointly with point-wise gated Boltzmann machines,” International Centric Computing and Information Sciences, doi:10.1186/s13673-018-
Conference on Machine Learning, pp. 217-225, 2013. 0133-x, 2018.
[30] D. Krefl, S, Carrazza, B. Haghighat, et al., “Riemann-theta boltzmann [54] S. K. Mahata, D. Das, S. Bandyopadhyay, “MTIL2017: Machine
machine,” Neurocomputing, vol. 388, pp. 334-345, 2020. translation using recurrent neural network on statistical machine
[31] J. F. Deng, V. P. K. Miriyala, Z. F. Zhu, et al., “Voltage-controlled translation,” Journal of Intelligent Systems, vol. 28, pp. 447-453, 2019.
spintronic stochastic neuron for restricted boltzmann machine with [55] W. Waheeb, R. Ghazali, “A novel error-output recurrent neural network
weight sparsity,” IEEE Electron Device Letters, vol. 41, pp. 1102-1105, model for time series forecasting,” Neural Computing & Applications,
2020. vol. 32, pp. 9621-9647, 2020.
[32] T. Y. Pan, J. L. Chen, J. Pan, “A deep learning network via shunt-wound [56] S. Hochreiter, J. Schmidhuber, “Long short-term memory,” Neural
restricted boltzmann machines using raw data for fault detection,” IEEE Computing, vol. 9, pp. 1735-1780, 1997.
Transactions on Instrumentation and Measurement, vol. 69, pp. 4852- [57] Y. Bengio, P. Simard, P. Frasconi, “Learning long-term dependencies
4862, 2020. with gradient descent is difficult,” IEEE Transactions on Neural
[33] N. Zhang, S. Ding, J. Zhang, et al., “An overview on restricted Networks, vol. 5, pp. 157-166, 1994.
boltzmann machines,” Neurocomputing, [58] M. Schuster, K. K. Paliwal, “Bidirectional recurrent neural networks,”
doi:10.1016/j.neucom.2017.09.065, 2017. IEEE Transactions on Signal Processing, vol. 45, pp. 2673-2681, 1997.
[34] G. E. Hinton, “Training products of experts by minimizing contrastive [59] J. S. R. Jang. “ANFIS: Adaptive-Network-Based Fuzzy Inference
divergence,” Neural Computation, vol. 14, pp. 1711-1800, 2000. System,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 23,
[35] I. Sutskever, G. E. Hinton, “Learning multilevel distributed pp. 665-685, 1993.
representations for high-dimensional sequences,” International [60] N. L. Roux, Y. Bengio, “Representational power of restricted boltzmann
Conference on Artificial Intelligence and Statistics, pp. 544-551, 2007. machines and deep belief networks,” Neural Computation, vol. 20, pp.
[36] I. Goodfellow, Y. Bengio, A. Courville, “Deep learning (Vol. 1),” 1631-1649, 2008.
Cambridge: MIT press, pp. 326-366, 2016. [61] I. Qasim, M. Alam, S. Khan, et al., “A comprehensive review of type-2
[37] J. X. Gu, Z. H. Wang, J Kuen, et al., “Recent advances in convolutional fuzzy Ontology,” Artificial Intelligence Review, vol. 53, pp. 1187-1206,
neural networks,” Pattern Recognition, doi: 2020.
10.1016/j.patcog.2017.10.013, 2015. [62] C. Oscar, A. A. Leticia, R. C. Juan, et al., “A comparative study of type-
[38] A. Waibel, T. Hanazawa, G. E. Hinton, et al., “Phoneme recognition 1 fuzzy logic systems, interval type-2 fuzzy logic systems and
using time-delay neural networks,” IEEE Transactions on Acoustics, generalized type-2 fuzzy logic systems in control problems,”
Speech, and Signal Processing, vol. 37, pp. 328-339, 1989. Information Sciences, vol. 354, pp. 257-274, 2016.
[39] Y. LeCun, L. Bottou, Y. Bengio, et al., “Gradient-based learning applied [63] Y. H. Zheng, Z. S. Xu, Y. He, et al., “Severity assessment of chronic
to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278- obstructive pulmonary disease based on hesitant fuzzy linguistic
2324, 1998. COPRAS method,” Applied Soft Computing, vol. 69, pp. 60-71, 2018.
[40] R. Y. Xin, J. Zhang, Y. T. Shao, “Complex network classification with [64] Y. H. Zheng, Y. He, Z. S. Xu, et al., “Assessment for hierarchical medical
convolutional neutral network,” Tsinghua Science and Technology, vol. policy proposals using hesitant fuzzy linguistic analytic network
25, pp. 447-457, 2020. process,” Knowledge-Based Systems, vol. 161, pp. 254-267, 2018.
[41] H. Fujita, D. Cimr, “Computer aided detection for fibrillations and [65] V. Cherkassky, “Fuzzy Inference Systems: A Critical Review”, doi:
flutters using deep convolutional neural network,” Information Sciences, 10.1007/978-3-642-58930-0_10, 1998.
vol. 486, pp. 231-239, 2019. [66] Y. P. Zhang, H. Ishibuchi, S. T. Wang, “Deep Takagi–Sugeno–Kang
[42] A. Abdalla, H. Y. Cen, L. Wan, et al., “Fine-tuning convolutional neural fuzzy classifier with shared linguistic fuzzy rules,” IEEE Transactions
network with transfer learning for semantic segmentation of ground- on Fuzzy Systems, vol. 26, pp. 1535-1549, 2018.
level oilseed rape images in a field with high weed pressure,” Computers [67] Y. Y. Hong, C. F. Lee, “A neuro-fuzzy price forecasting approach in
and Electronics in Agriculture, doi:10.1016/j.compag.2019.105091, deregulated electricity markets,” Electric Power Systems Research, vol.
2019. 73, pp. 151-157, 2005.
[43] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, et al., “STDP-based [68] C. F. Juang, K. C. Ku, “A recurrent fuzzy network for fuzzy temporal
spiking deep convolutional neural networks for object recognition,” sequence processing and gesture recognition,” IEEE Transactions on
Neural Networks, vol. 99, pp. 56-67, 2018. Systems, Man, Cybernetics-Cybernetics, vol. 35, pp. 646-658, 2005.
[44] Y. F. Zhang, L. Shi, Y. Wu, et al., “Gesture recognition based on deep [69] S. Panigrahi, H. S. Behera, “A study on leading machine learning
deformable 3D convolutional neural networks,” Pattern Recognition, techniques for high order fuzzy time series forecasting,” Engineering
doi: 10.1016/j.patcog.2020.107416, 2020. Applications of Artificial Intelligence, vol. 87, pp. 103245, 2020.
[45] A. Selim, M. Elgharib, L. Doyle, “Painting style transfer for head [70] Y. Q. Yang, K. F. Zheng, C. H. Wu, et al., “Building an effective
portraits using convolutional neural networks,” ACM Transactions on intrusion detection system using the modified density peak clustering
Graphics, vol. 35 pp. 129, 2016. algorithm and deep belief networks,” Applied Sciences,
[46] M. G. Zhang, Z. Y. Yang, “GACOforRec: Session-based graph doi:10.3390/app9020238, 2019.
convolutional neural networks recommendation model,” IEEE Access, [71] J. Zhai, X. Zhou, S. Zhang, et al., “Ensemble RBM-based classifier
vol. 7, pp. 114077-114085, 2019. using fuzzy integral for big data classification,” International Journal of
20
Machine Learning and Cybernetics, doi:10.1007/s13042-019-00960-3, [94] K. T. Atanassov, “Intuitionistic fuzzy sets,” Fuzzy Sets & Systems, vol.
2019. 20, pp. 87-96, 1986.
[72] H. Julio, P. Guevara, N. Bernal, et al., “Framework for the development [95] V. Torra, “Hesitant fuzzy sets,” International Journal of Intelligent
of data-driven mamdani-type fuzzy clinical decision support systems,” Systems, vol. 25, pp. 529-539, 2010.
Diagnostics, vol. 9, pp. 52, 2019. [96] M. Guzel, I. Kok, D. Akay, et al., “ANFIS and deep learning based
[73] I. Chaturvedi, R. Satapathy, S. Cavallari, et al., “Fuzzy commonsense missing sensor data prediction in IoT,” Concurrency and Computation
reasoning for multimodal sentiment analysis,” Pattern Recognition Practice and Experience, doi:10.1002/cpe.5400, 2019.
Letters, vol. 125, pp. 264-270, 2019. [97] G. Manogaran, R. Varatharajan, M. K. Priyan, “Hybrid recommendation
[74] H. J. Sadaei, D. L. E. S. P. Candid, F. G. Guimaraes, et al., “Short-term system for heart disease diagnosis based on multiple kernel learning
load forecasting by using a combined method of convolutional neural with adaptive neuro-fuzzy inference system,” Multimedia Tools &
networks and fuzzy time series,” Energy, vol. 175, pp. 365-377, 2019. Applications, vol. 77, pp. 4379-4399, 2018.
[75] R, He, X. Li, G. Chen, et al., “Generative adversarial network-based [98] W. L. Tung, C. Quek, “Financial volatility trading using a self-
semi-supervised learning for real-time risk warning of process organising neural-fuzzy semantic network and option straddle-based
industries,” Expert Systems with Applications, approach,” Expert Systems with Applications, vol. 38, pp. 4668-4688,
doi:10.1016/j.eswa.2020.113244, 2020. 2011.
[76] C. J. Lin, C. H. Lin, S. H. Wang, et al., “Multiple convolutional neural [99] A. Puchkov, M. Dli, M. Kireyenkova, “Fuzzy classification on the base
networks fusion using improved fuzzy integral for facial emotion of convolutional neural networks,” Advances in Artificial Systems for
recognition,” Applied Sciences, doi:10.3390/app9132593, 2019. Medicine and Education II, doi:10.1007/978-3-030-12082-5_35, 2020.
[77] F. Colace, V. Loia, S. Tomasiello, “Revising recurrent neural networks [100] M. Hwang, D. Wang, W. C. Jiang, et al., “An adaptive regularization
from a granular perspective,” Applied Soft Computing, approach to Colonoscopic Polyp detection using a cascaded structure of
doi:10.1016/j.asoc.2019.105535, 2019. encoder-decoders,” International Journal of Fuzzy Systems, vol. 21, pp.
[78] N. Tak, A. A. Evren, M. Tez, “Recurrent type-1 fuzzy functions 2091-2101, 2019.
approach for time series forecasting,” Applied Intelligence, vol. 48, pp. [101] T. Altameem, “Fuzzy rank correlation-based segmentation method and
68-77, 2018. deep neural network for bone cancer identification,” Neural Computing
[79] A. Beke, T. Kumbasar, “Learning with Type-2 Fuzzy activation and Applications vol. 32, pp. 805-815, 2020.
functions to improve the performance of Deep Neural Networks,” [102] N. Bendre, N. Ebadi, J. Prevost, “Human action performance using deep
Engineering Applications of Artificial Intelligence, vol. 85, pp. 372-38, neuro-fuzzy recurrent attention model,” IEEE Access, vol. 8, pp. 57749-
2019. 57761, 2020.
[80] M. Gheisarnejad, J. Boudjadar, M. H. Khooban, “A new adaptive type- [103] T. L. Nguyen, S. Kavuri, M. Lee, “A multimodal convolutional neuro-
II fuzzy-based deep reinforcement learning control: fuel cell air-feed fuzzy network for emotion understanding of movie clips,” Neural
sensors control,” IEEE Sensors Journal, vol. 19, pp. 9081-9089, 2019. Networks, vol. 118, pp. 208-219, 2019.
[81] A. Mohammadzadeh, O. Kaynak, “A novel general type-2 fuzzy [104] K, M. Jamal, S. A. Diwan, Z. A. Abdulhussein, “Smile detection using
controller for fractional-order multi-agent systems under unknown time- convolutional neural network and fuzzy logic,” Journal of Information
varying topology,” Journal of the Franklin Institute, vol. 356, pp. 5151- Science and Engineering, vol. 36, pp. 269-278, 2020.
5171, 2019. [105] X. J. Gou, Z. S. Xu, H. C. Liao, et al. “Double hierarchy hesitant fuzzy
[82] M. Khodayar, J. H. Wang, M. Manthouri, “Interval deep generative linguistic MULTIMOORA method for evaluating the implementation
neural network for wind speed forecasting,” IEEE Transactions on status of haze controlling measures,” Information Fusison, vol. 38, pp.
Smart Grid, vol. 10, pp. 3974-3989, 2019. 22-34, 2017.
[83] H. Hermessi, O. Mourali, E. Zagrouba, “Deep feature learning for soft [106] X. X. Zhang, J. Z. Zhou, W. R. Chen, “Data-driven fault diagnosis for
tissue sarcoma classification in MR images via transfer learning,” PEMFC systems of hybrid tram based on deep learning,” International
Expert Systems with Applications, vol. 120, pp. 116-127, 2019. Journal of Hydrogen energy, vol. 45, pp. 13483-13495, 2020.
[84] X. J. Chen, D. Li, P. X. Wang, “A deep convolutional neural network [107] M. M. Yamunadevi, S. S. Ranjani, “Efficient segmentation of the lung
with fuzzy rough sets for FER,” IEEE Access, vol. 8, pp. 2772-2779, carcinoma by adaptive fuzzy–GLCM (AF-GLCM) with deep learning
2020. based classification,” Journal of Ambient Intelligence and Humanized
[85] R. S. T. Lee, “Chaotic type-2 transient-fuzzy deep neuro-oscillatory Computing, doi:10.1007/s12652-020-01874-7, 2020.
network (CT2TFDNN) for worldwide financial prediction,” IEEE [108] A. Ramer, C. C. Wang, “Fuzzy multisets,” Fuzzy Systems Symposium,
Transactions on Fuzzy Systems, vol. 28, pp. 731-745, 2020. doi:10.1109/AFSS.1996.583653, 2002.
[86] T. C. Lin, C. H. Kuo, V. E. Balas, “Real-time recurrent interval type-2 [109] M. A. Shehhi, A. Karathanasopoulos, “Forecasting hotel room prices in
fuzzy-neural system identification using uncertainty bounds,” IEEE selected GCC cities using deep learning,” Journal of Hospitality and
World Congress on Computational Intelligence, doi:10.1109/FUZZ- Tourism Management, vol. 42, pp. 40-50, 2020.
IEEE.2012.6251356, 2012. [110] P. Pławiak, M. Abdar, J. Pławiak, et al., “DGHNL: A new deep genetic
[87] M. M. Ferdaus, M. Pratama, S. G. Anavatti, et al., “PALM: An hierarchical network of learners for prediction of credit scoring,”
incremental construction of hyperplanes for data stream regression,” Information Sciences, vol. 516, pp. 401–418, 2020.
IEEE Transactions on Fuzzy Systems, vol. 27, pp. 2115-2129, 2019. [111] M. Rahimi, R. Davoodi, M. H. Moradi, “Deep fuzzy model for non-
[88] D. Qiu, H. H. Jiang, S. Q. Chen, “Fuzzy information retrieval based on linear effective connectivity estimation in the intuition of consciousness
continuous bag-of-words model,” Symmetry, correlates,” Biomedical Signal Processing and Control,
doi:10.3390/sym12020225, 2020. doi:10.1016/j.bspc.2019.101732, 2020.
[89] C. T. Lin, R. C. Wu, J. Y. Chang, et al., “A novel prosodic-information [112] D. Ma, B. Sheng, S. Jin, et al., “Fuzzy hybrid framework with dynamic
synthesizer based on recurrent fuzzy neural network for the Chinese weights for short-term traffic flow prediction by mining spatio-temporal
TTS system,” IEEE Transactions on Systems, Man, and Cybernetics- correlations,” IET Intelligent Transport Systems, 10.1049/iet-
Cybernetics, vol. 34, pp. 309-324, 2004. its.2019.0287, 2019.
[90] M. B. Gorzalczany, “A method of inference in approximate reasoning [113] J. J. Tang, F. Liu, Y. J. Zou, et al., “An improved fuzzy neural network
based on interval valued fuzzy sets,” Fuzzy Sets and Systems, vol. 21, for traffic speed prediction considering periodic characteristic,” IEEE
pp. 1-17, 1987. Transactions on Intelligent Transportation Systems, vol. 18, pp. 2340-
[91] R. A. Aliev, B. Fazlollahi, R. R. Aliev, et al., “Linguistic time series 2350, 2017.
forecasting using fuzzy recurrent neural network,” Soft Computing, vol. [114] M. Dehghani, H. Riahi-Madvar, F. Hooshyaripor, et al., “Prediction of
12, pp. 183-190, 2008. hydropower generation using grey wolf optimization adaptive neuro-
[92] Y. Wen, F. O. Rodriguez, M. A. Moreno-Armendariz, “Hierarchical fuzzy inference system,” Energies, doi:10.3390/en12020289, 2019.
fuzzy CMAC for nonlinear systems modeling,” IEEE Transactions on [115] Y. P. Zhang, H. Ishibuchi, S. T. Wang, “Deep Takagi-Sugeno-Kang fuzzy
Fuzzy Systems, vol. 16, pp. 1302-1314, 2008. classifier with shared linguistic fuzzy rules,” IEEE Transactions on
[93] D. R. Wu, Y. Ye, Y. H. Tan, “Optimize TSK fuzzy systems for regression Fuzzy Systems, vol. 26, pp. 1535-1549, 2020.
problems: minibatch gradient descent with regularization, dropRule, and [116] J. M. Mendel, “General Type-2 fuzzy logic systems made simple: a
adaBound (MBGD-RDA),” IEEE Transactions on Fuzzy Systems, vol. tutorial,” IEEE Transactions on Fuzzy Systems, vol. 22, pp. 1162-1182,
28, pp. 1003-1010, 2020. 2014.
21
[117] O. Rahmati, M. Panahi, S. S. Ghiasi, et al., “Hybridized neural fuzzy [140] S. Gu, F. L. Chung, S. Wang, “A novel deep fuzzy classifier by stacking
ensembles for dust source modeling and prediction,” Atmospheric adversarial interpretable TSK fuzzy sub-classifiers with smooth gradient
Environment, doi:10.1016/j.atmosenv.2020.117320, 2020. information,” IEEE Transactions on Fuzzy Systems, vol. 28, pp. 1369-
[118] H. J. Sadaei, P. C. D. L. E. Silva, F. G. Guimaraes, et al., “Short-term 1382, 2020.
load forecasting by using a combined method of convolutional neural [141] S. Feng, C. L. P. Chen, C. Y. Zhang, “A fuzzy deep model based on fuzzy
networks and fuzzy time series,” Energy, vol. 175, pp. 365-377, 2019. restricted boltzmann machines for high-dimensional data classification,”
[119] R. Asadi, A. C. Regan, “A spatio-temporal decomposition based deep IEEE Transactions on Fuzzy Systems, vol. 28, pp. 1344-1355, 2020.
neural network for time series forecasting,” Applied Soft Computing, [142] P. Hurtik, V. Molek, J. Hula, “Data preprocessing technique for neural
doi:10.1016/j.asoc.2019.105963, 2020. networks based on image represented by a fuzzy function,” IEEE
[120] H. Khatter, A. K. Ahlawat, “An intelligent personalized web blog Transactions on Fuzzy Systems, vol. 28, pp. 1195-1204, 2020.
searching technique using fuzzy-based feedback recurrent neural [143] M. Yeganejou, S. Dick, J. Miller, “Interpretable deep convolutional
network,” Soft Computing, vol. 24, pp. 9321-9333, 2020. fuzzy classifier,” IEEE Transactions on Fuzzy Systems, vol. 28, pp.
[121] C. D. Zhang, S. Z. Lu, C. M. Zhang, “A novel hot topic detection 1407-1419, 2020.
framework with integration of image and short text information from [144] C. Guan, S. Wang, W. C. Liew, “Lip image segmentation based on a
twitter,” IEEE Access, vol. 7, pp. 9225-9231, 2019. fuzzy convolutional neural network,” IEEE Transactions on Fuzzy
[122] P. Yang, D. Wang, X. L. Du, et al., “Evolutionary DBN for the customers' Systems, vol. 28, pp. 1242-1251, 2020.
sentiment classification with incremental rules,” Industrial Conference [145] Z. Zhang, M. Huang, S. Liu, et al., “Fuzzy multilayer clustering and
on Data Mining. Springer, doi:10.1007/978-3-319-95786-9_9, 2018. fuzzy label regularization for unsupervised person reidentification,”
[123] M. Wang, Z. H. Ning, T. Li, et al., “Information geometry enhanced IEEE Transactions on Fuzzy Systems, vol. 28, pp. 1356-1368, 2020.
fuzzy deep belief networks for sentiment classification,” International [146] L. Chen, W. Su, M. Wu, et al., “A fuzzy deep neural network with sparse
Journal of Machine Learning and Cybernetics, doi:10.1007/s13042- autoencoder for emotional intention understanding in human-robot
018-00920-3, 2019. interaction,” IEEE Transactions on Fuzzy Systems, vol. 28, pp. 1252-
[124] Q. Li, L. Li, Q. Li, et al., “A comprehensive exploration on spider with 1264, 2020.
fuzzy decision text-to-SQL model,” IEEE Transactions on Industrial [147] A. Sarabakha, E. Kayacan, “Online deep fuzzy learning for control of
Informatics, vol. 16, pp. 2542-2550, 2020. nonlinear systems using expert knowledge,” IEEE Transactions on
[125] S. P. Singh, A. Kumar, H. Darbari, et al., “Extract reordering rules of Fuzzy Systems, vol. 28, pp. 1492-1503, 2020.
sentence structure using neuro-fuzzy machine learning system,” [148] J. M. Garibaldi, S. Guadarrama, “Constrained type-2 fuzzy sets,” IEEE
International Conference on Smart Technologies for Smart Nation, Symposium on Advances in Type-2 Fuzzy Logic Systems, pp. 66-73,
doi:10.1109/SmartTechCon.2017.8358364 IEEE, 2017. 2011.
[126] T. Gangavarapu, A. Jayasimha, G. S. Krishnan, et al., “TAGS: Towards [149] Y. Deng, Z. Ren, Y. Kong, et al., “A hierarchical fused fuzzy deep neural
automated classification of unstructured clinical nursing notes,” Natural network for data classification,” IEEE Transactions on Fuzzy Systems,
Language Processing and Information Systems, vol. 11608, pp. 195-207, vol. 25, pp. 1006-1012, 2017.
2019. [150] S. Rajurkarand, N. K. Verma, “Developing deep fuzzy network with
[127] X. Zhu, K. U. Rehman, B. Wang, et al., “Modern soft-sensing modeling Takagi Sugeno fuzzy inference system,” IEEE International Conference
methods for fermentation processes,” Sensors, doi:10.3390/s20061771, on Fuzzy Systems, pp. 1-6, 2017.
2020. [151] L. Fortuna, G. Rizzotto, M. Lavorgna, et al., “Neuro-fuzzy networks,”
[128] K. M. Zheng, Y. M. Hu, B. Wu, “Intelligent fuzzy sliding mode control in Proceeding of Soft Computing, pp. 169-178, 2001.
for complex robot system with disturbances,” European Journal of [152] L. Y. Shyu, Y. H. Wu, W. Hu, “Using wavelet transform and fuzzy neural
Control, vol. 51, pp. 95-109, 2020. network for VPC detection from the holter ECG,” IEEE Transactions on
[129] R. M. Rodriguez, L. Martinez, F. Herrera, “Hesitant fuzzy linguistic Biomedical Engineering, vol. 51, pp. 1269-1273, 2004.
term sets for decision making,” IEEE Transactions on Fuzzy Systems, [153] P. Liu, “Representation of digital image by fuzzy neural network,”
vol. 20, pp. 109-119, 2012. Fuzzy Sets Systems, vol. 130, pp. 109-123, 2002.
[130] J. W. Perng, Y. C. Kuo, K. C. Lu, “Design of the PID controller for [154] J. S. R. Jang, C. T. Sun, E. Mizutani, “Neuro-fuzzy and soft computing:
hydro-turbines based on optimization algorithms,” International Journal A computational approach to learning and machine intelligence,” IEEE
of Control, Automation and Systems, vol. 18, pp. 1-13, 2020. Transactions on Automatic Control, vol. 42, pp. 1482-1484, 1997.
[131] R. Al-Hmouz, W. Pedrycz, A. Balamash, “Logic-driven autoencoders,” [155] M. Yeganejou, S. Dick, “Classification via deep fuzzy c-means
Knowledge-Based Systems, doi:10.1016/j.knosys.2019.104874, 2019. clustering,” IEEE International Conference on Fuzzy Systems, pp. 1-6,
[132] Q. Pang, H. Wang, Z. S. Xu, “Probabilistic linguistic term sets in multi- 2018.
attribute group decision making,” Information Sciences, vol. 369, pp. [156] K. Chayakrit, K. W. Johnson, R. S. Rosenson, et al., “Deep learning for
128-143, 2016. cardiovascularmedicine: a practical primer,” European Heart Journal,
[133] Y. H. Zheng, Z. S. Xu, Y. He, et al., “A hesitant fuzzy linguistic bi- vol. 00, pp. 1-15, 2019.
objective clustering method for large-scale group decision-making,” [157] P. P. Groumpos, “Artificial Intelligence: Issues, Challenges,
Expert Systems with Applications, doi: 10.1016/j.eswa.2020.114355, Opportunities and Threats,” Creativity in Intelligent Technologies and
2020. Data Science, 2019.
[134] Y. H. Zheng, Z. S. Xu, Y. He, “A novel weight-derived method and its [158] S. Zhang, Z. Sun, M. Wang, et al., “Deep fuzzy echo state networks for
application in graduate students' physical health assessment,” machinery fault diagnosis,” IEEE Transactions on Fuzzy Systems, vol.
International Journal of Intelligent Systems, doi: 10.1002/int.22297, 28, pp. 1205-1218, 2020.
2020. [159] C. L. P. Chen, C. Y. Zhang, L. Chen, et al., “Fuzzy restricted boltzmann
[135] X. X. Wang, Z. S. Xu, X. J. Gou, “Nested probabilistic-numerical machine for the enhancement of deep learning,” IEEE Transactions on
linguistic term sets in two-stage multi-attribute group decision making,” Fuzzy Systems, vol. 23, pp. 2163-2173, 2015.
Applied Intelligence, vol. 49, pp. 2582-2602, 2019. [160] M. N. M. Salleh, N. Talpur, and K. Hussain, “Adaptive neuro-fuzzy
[136] Z. S. Xu, “Priority weight intervals derived from intuitionistic inference system: overview, strengths, limitations, and solutions,”
multiplicative preference relations,” IEEE Transactions on Fuzzy Second International Conference on Data Mining and Big Data, 2017.
Systems, vol. 21, pp. 642-654, 2013. [161] O. Ciftcioglu, M. S. Bittermann, I. S. Sariyildiz, “A Neural Fuzzy
[137] I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to sequence learning with System for Soft Computing,” Nafips Meeting of the North American
neural networks,” Proceedings of the 27th International Conference on Fuzzy Information Processing Society. pp. 489-495, 2007.
Neural Information Processing Systems, vol. 2, pp. 3104-3112, 2014. [162] S. Jean, K. Cho, R. Memisevic, et al., “On using very large target
[138] D. Bahdanau, K. Cho, Y. Bengio, “Neural machine translation by jointly vocabulary for neural machine translation,” in Proceedings of 53rd
learning to align and translate,” Proceedings of International Conference Annual Meeting Association for Computational Linguistics 7th
on Learning Representations (ICLR), arXiv: 1409.0473, 2015. International Joint Conference on Natural Language Processing, pp. 1-
[139] J. Devlin, M. W. Chang, K. Lee, et al., “BERT: pre-training of deep 10, 2015.
bidirectional transformers for language understanding,” Proceedings of
the 17th annual conference of the north American chapter of the
association for computational linguistics: human language technologies,
vol. 41, pp. 71-86, 2019.

Deep and Fuzzy - 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep and Fuzzy - 2021

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

The Fusion of Deep Learning and Fuzzy Systems: A State-

designed to learn the internal rule and the representation level

40.00% E ( v, h | θ ) = −ai vi − b j h j − vi wij h j

probability of each unit in a visible layer P vi(1) = 1| h(1) , that ( ) Hidden

unit v  0,1 from P ( v

h can be obtained. Then, the parameters  = w (2) ,a(2) ,b(2) 

researchers, CNNs with the gradient-based learning algorithm, v l −1

parameters and higher computation efficiency [40, 41]. In early 2 1 1

In the following, some specific explanations of each layer are 2 4 42 27

moment; h is a vector, where h = ( h (j t ) ) , denoting the j -th

o1(t ) oK(t ) Output layer

III. FUSION OF DEEP LEARNING AND FUZZY SYSTEMS

c (t ) = tanh w c   h(t −1) ( , x  + b ) ,

h(t ) = o(t )  tanh ( c (t ) ) , (18) x1 x2 x3

where the internal state c ( t ) in the LSTM is for outputting Need

gates, respectively. Similarly, b f , bi , b c and bo are the bias

u2 (t −1) u1(t −1) u2 ( t ) u1(t ) u2 (t +1) u1(t +1)

Fig. 15. Graphical architecture of Bi-RNN

Inputs Outputs Output layer o

RBM CNN RNN

learning algorithm, combining the strengths of DNN and type-

Type-1 fuzzy set Type-2 fuzzy set Linguistic variable

You might also like