2015 Fuzzy Restricted Boltzmann Machine For The Enhancement of Deep Learning

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TFUZZ.2015.2406889, IEEE Transactions on Fuzzy Systems
IEEE TRANSACTIONS ON FUZZY SYSTEMS 1
Fuzzy Restricted Boltzmann Machine to Enhance

Deep Learning
C. L. Philip Chen, Fellow, IEEE, Chun-Yang Zhang, Long Chen, Member, IEEE, Min Gan
Abstract—In recent years, deep learning caves out a research knowledge of the RBM and its deep architectures can be found
wave in machine learning. With outstanding performance, more in [11] [12].
and more applications of deep learning in pattern recognition,
image recognition, speech recognition and video processing have
been developed. Restricted Boltzmann machine (RBM) plays an
important role in current deep learning techniques, as most
of existing deep networks are based on or related to it. For
regular RBM, the relationships between visible units and hidden
units are restricted to be constants. This restriction will certainly
downgrades the representation capability of the RBM. To avoid
this flaw and enhance deep learning capability, the fuzzy re-
stricted Boltzmann machine (FRBM) and its learning algorithm
are proposed in this paper, in which the parameters governing
the model are replaced by fuzzy numbers. In this way, the
original RBM becomes a special case in the FRBM, when there Fig. 1: Restricted Boltzmann Machine
is no fuzziness in the FRBM model. In the process of learning
FRBM, the fuzzy free energy function is defuzzified before the Most researchers in deep learning field focus on deep net-
probability is defined. The experimental results based on Bar-
And-Stripe benchmark inpainting and MNIST handwritten digits
work design and corresponding fast learning algorithms. Some
classification problems show that the representation capability research works try to improve the deep learning technique from
of FRBM model is significantly better than traditional RBM. the model representation. For example, Gaussian restricted
Additionally, the FRBM also reveals better robustness property Boltzmann machines (GRBMs) with Gaussian linear units are
compared with RBM when the training data are contaminated proposed to learn representations from real-valued data [13].
by noises. It improves the RBM by replacing binary-valued visible units
Keywords—Deep learning, Fuzzy restricted Boltzmann machine, with Gaussian ones. The deep networks based on GRBM are
RBM, Fuzzy deep networks, Image inpainting, Image classification. also developed in recent years, such as Gaussian-Bernoulli
Deep Boltzmann Machine [14] [15]. Conditional versions of
RBMs have also been developed for collaborative filtering
I. I NTRODUCTION [16], and temporal RBMs and recurrent RBMs are proposed to
model high dimensional sequence data such as motion capture
R ESTRICTED Boltzmann machine (RBM), as illustrated
in Fig. 1, is a stochastic graph model that can learn
a joint probability distribution over its n visible units x =
data [17] and video sequences [18].
For regular RBMs and their existing variants, the parameters
that represent the relationships between units in visible and
[x1 , · · · , xn ] and m hidden feature units h = [h1 , · · · , hm ]. hidden layers are restricted to be constants. This structure
The model is governed by parameters θ denoting connection design will surely lead to many problems. Firstly, it will
weights and biases between cross-layer units. The RBM [1] constrain the representation capability, since the variables often
was invented in 1986, however, it received a new birth when interact in some uncertain ways. Secondly, the RBMs are not
G. Hinton and his partners recently proposed several deep very robust when the training data samples corrupted by noises.
networks and corresponding fast learning algorithms, including Thirdly, the parameter learning process of the RBMs is con-
deep autoencoder [2], deep belief networks [3], and deep fined in a relatively small space. This is contrary with merits of
Boltzmann machine [4]. The RBM and its deep architectures deep learning. All of these constraints will be reflected by the
have a large number of applications [5], such as dimensionality fitness of the target joint probability distribution. To overcome
reduction [6], classification [7], collaborative filtering [8], these disadvantages and reduce the inaccuracy and distortion
feature learning [9], and topic modelling [10]. A detailed deduced by linearization of the relationship between cross-
C. L. Philip Chen, Chun-Yang Zhang and Long Chen are with the layer units, this paper proposes the fuzzy restricted Boltz-
Department of Computer and Information Science, Faculty of Science and mann machines (FRBM) and corresponding learning algo-
Technology, University of Macau, Macau, China. rithm, where the parameters governing the models are all fuzzy
Min Gan is with the School of Electrical Engineering and Automation, numbers. After the vagueness in the relationship between the
Hefei University of Technology, Hefei, China.
This research is sponsored by Macau Science and Technology development visible units and their high level features is introduced into the
fund under Grant No. 008/2010/A1, UM Multiyear Research Grants, and the model, the fuzzy RBMs demonstrate competitive performances
National Natural Science Foundation of China under Grant No. 61203106. in both data representation capability and robustness to cope
1063-6706 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
with noises. These similar merits also can be initially found in

fuzzy regression [19] and fuzzy support vector machine [20].
The fuzzy RBMs, which are also designed to boost the
development of deep learning from the building component
(RBMs) of deep networks, has never been introduced before.
The stochastic gradient descent method integrated with Monte
Carlo Markov Chain (MCMC) approach is employed to train
the proposed fuzzy RBMs. This kind of learning methods
are commonly used in training RBMs, and proved to be
very efficient [12] [21] [22]. Other learning approaches like
Bayesian estimation methods have also been developed to
learn RBM in [23] [24]. Gaussian RBMs, conditional RBMs,
temporal RBMs, and recurrent RBMs are developed through
modifying the structure of RBMs. Alternatively, fuzzy RBMs
are proposed from different perspective of extending the rela-
tionships between visible and hidden units. Therefore, fuzzy
RBMs can also be further developed by taking consideration Fig. 2: Symmetric Triangular Fuzzy Number
of others variants of RBMs.
The rest of the paper is organized as follows. In section
II, the preliminaries about fuzzy sets, fuzzy functions, and 1) Extension Principle: The membership function deduced
their notations are presented. The proposed FRBM and its from the extension principle can be expressed as
learning algorithm are introduced in section III. After that,
the outstanding performance of the FRBM model is verified Y (y) = sup{min(W 1 (W1 ), · · · , W n (Wn ))|f (x, W) = y},
W
by conducting experiments on Bar-And-Stripe benchmark in- (4)
painting and MNIST handwritten digits classification in section where W = (W1 , · · · , Wn )T , and W = (W 1 , · · · , W n )T .
IV. Finally, the conclusion and remarks are drawn in section 2) Alpha-Cuts of Y : If f is continuous, the α−cut of Y ,
V. i.e., Y [α] = [Y 1 (α), Y 2 (α)], has following expression

II. P RELIMINARIES Y 1 (α) = min{Y (W, x)|W ∈ W[α]},
(5)
A. Fuzzy Number Y 2 (α) = max{Y (W, x)|W ∈ W[α]}.
Restricting the discussion on symmetric triangular fuzzy
number [25], W = [W 1 , · · · , W n ]T is regarded as a vector of D. Interval Arithmetic
them. The membership function for j−th fuzzy number W j
can be defined as For two intervals [a, b] and [c, d], that are two subsets of the
real domain, the fundamental operations of interval arithmetic
|w − wj |
W j (w) = max 1 − ,0 , (1) [27] are defined as follows,
wbj
where wj is the centre of the fuzzy number, and w bj is the [a, b] + [c, d] = [a + c, b + d],
width of the fuzzy number as illustrated in Fig. 2. A fuzzy set [a, b] − [c, d] = [a − c, b − d],
is specified by placing a “bar” over a capital letter, such as [a, b] × [c, d] = [min(a × c, a × d, b × c, b × d),
W j . W j (w) represents the membership value at w. max(a × c, a × d, b × c, b × d)],
[a, b] ÷ [c, d] = [min(a ÷ c, a ÷ d, b ÷ c, b ÷ d),
B. Alpha-Cuts
max(a ÷ c, a ÷ d, b ÷ c, b ÷ d)], 0 ∈
/ [c, d].
If A is a fuzzy set, the α−cut of A, denoted by A[α], is
defined as To calculate the membership function by using extension
A[α] = {x ∈ Ω|A(x) ≥ α}, (2) principle (3) and (4) is infeasible, since it involves the max-
imization and minimization of the original function. There
where 0 < α ≤ 1. is another efficient way to extend the function to be the
corresponding fuzzy one. That is to use α−cuts and interval
C. Fuzzy Function arithmetic
The fuzzy function f , which is extended from real-value Y [α] = f (x, W[α]), (6)
function f : Y = f (x, W), is defined by where W[α] are intervals that are easy to calculate. Interval
Y = f (x, W) (3) arithmetic then can be employed to finish the computation of
membership function of the fuzzy function. However, interval
where Y is the dependent fuzzy output set, W and W are arithmetic may become NP hard problems when f is very
parameters in the two functions [26]. complex [28].
III. F UZZY R ESTRICTED B OLTZMANN M ACHINE AND I TS accordance with extension principle as follow
L EARNING A LGORITHM T
E(x, h, θ) = −b x − cT h − hT Wx, (10)
A. Fuzzy Restricted Boltzmann Machine
where E(x, h, θ) is a fuzzified energy function, and θ =
The proposed novel fuzzy restricted Boltzmann machine {b, c, W} are fuzzy parameters. Correspondingly, the fuzzy
(FRBM) is illustrated in Fig. 3, in which the connection free energy F , which marginalize hidden units and map Eqn.
weighs and biases are fuzzy parameters denoted by θ. There (7) into a simpler one, is deduced as
are several merits of the FRBM model. The first one is that X
the FRBM has much better representation than the regular F (x, θ) = − log e−E(x,h̃,θ) , (11)
RBM in modelling probabilities over visible and hidden units. h̃
Specifically, the RBM is only a special case of the FRBM
when no fuzziness exists in the FRBM model. The second where F is extended from crisp free energy function F :
one is that the robustness of the FRBM model surpasses RBM X
model. The FRBM shows out more robustness when it comes F (x, θ) = − log e−E(x,h̃,θ) . (12)
to the fitting of the model with noisy data. All these advantages h̃
spring from the fuzzy extension of the relationships between
cross-layer variables, and inherit the characteristics of fuzzy If the fuzzy free energy function is directly employed to
models. define the probability, it leads to a fuzzy probability [27].
Finally, the optimization in learning process turns into a fuzzy
maximum likelihood problem. However, this kind of problem
is quite intractable because the fuzzy objective function is non-
linear and the membership function is difficult to compute,
since the computation of its alpha-cuts become NP-hard prob-
lems [29]. Therefore, it is necessary to transform the problem
into regular maximum likelihood problem by defuzzifying the
fuzzy free energy function (11). The centre of area (centroid)
method [30] is employed to defuzzify the fuzzy free energy
function F (x). Then the likelihood function can be defined by
the defuzzified fuzzy free energy function. Consequently, the
fuzzy optimization problem becomes real-valued problem, and
Fig. 3: Fuzzy Restricted Boltzmann Machine conventional optimization approaches can be directly applied
to find the optimal solutions. The centroid of fuzzy number
Since the FRBM is an extension of the RBM model, so the F (x) is denoted by Fc (x), and has the following form
discussion starts from a brief introduction on the RBM model. R
θF (x, θ)dθ
An RBM is an energy-based probabilistic model, in which the Fc (x, θ) = R , θ ∈ θ. (13)
probability distribution is defined through an energy function. F (x, θ)dθ
Its probability is defined as Naturally, after the fuzzy free energy is defuzzified, the prob-
−E(x,h,θ) ability can be defined as
e
P (x, h, θ) = , (7) X
Z
XX e−Fc (x;θ)
−E(x̃,h̃,θ) Pc (x, θ) = , Z= e−Fc (x̃,θ) . (14)
Z = e , (8) Z
x̃
x̃ h̃
In the fuzzy RBM model, the objective function is the negative
where E(x, h, θ) is the energy function, θ are the parameters log-likelihood, which is given by
governing the model, Z is the normalizing factor which is X
called the partition function, x̃ and h̃ are two vector variables L(θ, D) = − log Pc (x, θ), (15)
representing visible and hidden units that are used to traverse x∈D
and summarize all the configurations of units on the graph. where D is the training dataset.
The energy function for the RBM is defined by
E(x, h, θ) = −bT x − cT h − hT Wx, (9) The learning problem is to find optimal solutions for param-
eters θ that minimize the objective function L(θ, D), i.e.,
where bj and ci are the offsets, and Wij is the connection
weight between j-th visible unit and i-th hidden unit, and θ = min L(θ, D) (16)
{b, c, W}. θ
To establish fuzzy restricted Boltzmann machine, it is nec- In the following subsection, the detailed procedure to address
essary to firstly define the fuzzy energy function for the model. the dual problem of maximum likelihood by utilizing stochas-
The fuzzy energy function can be extended from Eqn. (9) in tic gradient descent method will be investigated.
B. FRBM Learning Algorithm respect to θR is given by

∂ log Pc (x, θ) ∂Fc (x, θ R ) ∂Fc (x, θ R )
In order to solve the optimization problem Eqn. (16), it − = − EP . (21)
∂θR ∂θR ∂θR
is required to firstly finish the defuzzification process of the
fuzzy free energy function in some viable ways. However, it is It is usually difficult to compute theseh gradientsi analytically,
infeasible to defuzzify the fuzzy free energy function by using ∂F (x,θ)
as it involves the computation of EP ∂θ (θ = θ L or
Eqn. (13) which involves integrals. Alternatively, the centroid θ R ). This is nothing less than an expectation over all possible
is calculated by employing a discrete form, which associates configurations of the input x. An estimation of the expectation
with a number of alpha−cuts of the fuzzy function. Therefore, can be obtained by using a fixed number of model samples.
the alpha−cuts of the fuzzy free energy function and interval Assume there are N samples that sampled from approximate
arithmetic are firstly investigated to obtain an approximation distribution, then we have
of the centroid.
∂ log Pc (x, θ) ∂Fc (x, θ) 1 X ∂Fc (x̃, θ)
1) Alpha−Cuts of Fuzzy Free Energy Function: As sup- − ≈ − . (22)
posed, θ is a vector of symmetric triangular fuzzy numbers, ∂θ ∂θ |N | ∂θ
x̃∈N
and its α-cut is θ[α] = [θ L , θR ], where θL and θR are
lower and upper bounds of the interval with respect to α, 4) Conditional Probability: For the RBM, the conditional
respectively. F (x, θ) is often a triangular shaped fuzzy number energy-based probabilities [11] are defined as
for non-linear functions [27]. However, the fuzzy free energy e−E(x,h)
is monotonic decreasing function with respect to parameters P (h|x) = P −E(x,h̃) , (23)
θ when x and h are non-negative. Therefore, according to h̃ e
interval arithmetic, the α−cut of F (x, θ) can be given by e−E(x,h)
P (x|h) = P −E(x̃,h) . (24)
F (x, θ)[α] = F (x, θ[α]) (17) x̃ e
= [F (x, θ R ), F (x, θ L )]. In the commonly studied case of the RBM with binary units,
where xj and hi ∈ {0, 1}, the probabilistic version of the usual
2) Approximation of Centroid: An approximation of the neuron activation functions can be derived from conditional
centroid of the fuzzy free energy function is provided by probabilities. They have the following affine forms
discretizing the fuzzy output F (x, θ) and calculating M eci +Wi x
number of its α−cuts. By combining with these α−cuts, the P (hi = 1|x) = = σ(ci + Wi x), (25)
approximate centroid is given by 1 + eci +Wi x
T
PM ebj +W.j h
αi [F (x, θiL ) + F (x, θiR )] P (xj = 1|h) = T = σ(bj + W.jT h), (26)
Fc (x, θ) ≈ i=1 PM , (18) 1 + ebj +W.j h
2 i=1 αi
where Wi and W.j denote the i−th row and j−th column of
where α = (α1 , · · · , αN ), α ∈ [0, 1]N and θ[αi ] = W , respectively, σ is logistic sigmoid function
[θiL , θiR ]. ex 1
As all the α−cuts are bounded intervals [31], for con- σ(x) = = .
ex + 1 1 + e−x
venience, we only consider a special case, in which fuzzy
numbers are degraded into intervals (α = 1). Let θ = [θ L , θ R ]. For fuzzy RBM, the conditional probabilities also become
According to Eqn. (18), the free energy function can be written fuzzy and can be extended from Eqn. (25) and (26) as
as P (hi = 1|x) = σ(ci + W i x),
1
Fc (x, θ) ≈ [F (x, θ L ) + F (x, θR )] . (19) T
2 P (xj = 1|h) = σ(bj + W .j h).
After defuzzifying the fuzzy free energy, the probability de- After defuzzifying the objective function, Monte Carlo
fined on it returns to Eqn. (14). Then the problem is trans- Markov Chain (MCMC) method can be employed to sample
formed into regular optimization problem which can be solved from these conditional distributions. This process is crucial to
by the gradient descend based stochastic maximum likelihood approximate the objective function due to the difficulties to
method. calculate the expectations. For the predefined α, the α-cut of
3) Gradient Related Optimization: The gradients of negative fuzzy conditional probabilities are consistent with the targeting
log-probability with respect to θL then have a particularly form probability distribution. They are given as follows
(see Appendix A): P (hi = 1|x)[α] = [PL (hi = 1|x), PR (hi = 1|x)],

∂ log Pc (x, θ) ∂Fc (x, θ L ) ∂Fc (x, θ L ) P (xj = 1|h)[α] = [PL (xj = 1|h), PR (xj = 1|h)],
− = − EP , (20)
∂θL ∂θL ∂θL
where PL (hi |x), PR (hi |x), PL (xj |h) and PR (xj |h) are the
where EP (·) means the expectation over the target probability conditional probabilities with respect to the lower bounds and
distribution P . Similarly, the gradients of Eqn. (16) with upper bounds of the parameters governing the model. They
have following forms running two Markov chains to convergence, using Gibbs sam-
pling as the transition operator. One is for the lower bounds,
PL (hi |x) = P (hi |x; θL ) = σ(cL L
i + Wi x), and the other is for the upper bounds. Gibbs sampling of the
(27)
PR (hi |x) = P (hi |x; θR ) = σ(cR R
i + Wi x), joint distribution over N random variables S = (S1 , ..., SN )
is done through a sequence of N sampling sub-steps of the
and
form Si ∼ P (Si |S−i ), where S−i contains the N − 1 other
PL (xj |h) = P (xj |h; θL ) = σ(bL L
j + W.j h), random variables in S excluding Si .
(28) For both the RBM and fuzzy RBM model, S consists
PR (xj |h) = P (hj |h; θR ) = σ(bR R
j + W.j h).
of the set of visible and hidden units. However, since they
There are six kinds of parameters for visible unit j and are conditionally independent, one can perform block Gibbs
hidden unit i in the FRBM model, i.e., lower bound of sampling. A step in the Markov chain, visible units are sampled
connection weight WijL , visible bias bL L
j , hidden bias ci and given hidden units, hidden units are sampled given visible
their upper bounds WijR , bR R
j , and ci . For simplicity, the energy
units, as illustrated in Fig. 4.
function is denoted by a sum of terms associated with only one
h(k+1) ∼ P (h(k+1) |x(k) ); (32)
hidden unit
Xm x(k+1) ∼ P (x(k+1) |h(k+1) ). (33)
E(x, h) = −µ(x) − φi (x, hi ), (29) As k → ∞, samples (x(k) , h(k) ) are guaranteed to be
i=1 accurate samples of P (x, h). However, Gibbs sampling is very
where time-consuming as k needs to be large enough. An efficient
learning approach, called contrastive divergence (CD) [32]
µ(x) = bT x, φi (x, hi ) = −hi (ci + Wi x). (30) learning was proposed in 2002. It shows that learning process
Then, the free energy of RBM with binary units can be further still perform very well even though only a number of steps are
simplified explicitly to (see Appendix B) run in Markov chain [33]. The CD learning uses two tricks to
m
speed up the sampling process. The first on it to initialize the
X Markov chain with a training example, and the second one is
F (x) = −bT x − log(1 + e(ci +Wi x) ). (31) to obtain samples after only k−steps of Gibbs sampling. This
i=1
is regarded as CD−k learning algorithm. A lot of experiments
The gradients of the free energy can be calculated explicitly show that the performances of the approximations are still very
when the RBM has binary units and the energy function has good when k = 1.
form Eqn. (29).
No matter whether the fuzzy parameters in fuzzy RBM are
symmetric or not, their α-cuts are always intervals with lower
and upper bounds that need to be learned in the training phase.
According to Eqn. (19), (20) and (21), all the gradients (see
details in Appendix C) can be obtained. Associate with Eqn.
(20) and (31), it is easy to get the following negative log-
likelihood gradients for the fuzzy RBM with binary units
∂ log Pc (x)
− = EP [PL (hi |x) · xL L
j ] − PL (hi |x) · xj ,
∂WijL
∂ log Pc (x)
− = EP [PL (hi |x)] − PL (hi |x),
∂cLi
∂ log Pc (x)
− = EP [PL (xj |h)] − xL
j, Fig. 4: k−step Gibbs Sampling
∂bLj
∂ log Pc (x) Contrastive divergence is a different function compared with
− = EP [PR (hi |x) · xR R
j ] − PR (hi |x) · xj ,
∂WijR Kullback-Leibler (KL) divergence to measure the difference
∂ log Pc (x) between approximated distribution and true distribution. Why
− = EP [PR (hi |x)] − PR (hi |x), is CD learning efficient? It is because that CD learning
∂cR
i provides an approximation of the log-likelihood gradient that
∂ log Pc (x) has been found to be a successful update rule for training
− = EP [PR (xj |h)] − xR
j ,
∂bR
j
probabilistic models. Variational justification can provide a
theoretical proof to the convergence of the learning processes
where Pc (x) is the centroid probability defined though Eqn. [11] [34]. Conducting CD−1 learning by using Eqn. (25) and
(14) combining with Eqn. (18) and (19). (26), namely x = x(0) → h(0) → x(1) → h(1) , it is easy to
5) Contrastive Divergence: When to approximate the ex- get the updating rules for all the parameters (θL and θR ) in the
pectations, samples of PL (x) and PR (x) can be obtained by FRBM model. The pseudo-code is demonstrated in Algorithm
1. process by inspecting the negative samples in BAS bench-

mark dataset. For the MNIST handwritten digits recognition
Algorithm 1 Updating rules for FRBM problem, the classification error rate of the testing dataset is
also one of significant criteria except the observation of the
Input: x(0) is a training sample from the training distribution
negative samples. All the experiments in this paper are run on
for the RBM;
Matlab (R2009a) in 32−bit Windows 7 OS. The computational
ǫ is the learning rate for updating the parameters;
platform has two 3.40GHz CPUs and 4.00GB memory. These
WL and WR are the visible-hidden connection weight
experimental results are demonstrated in the following.
matrix;
bL and bR are the bias vectors for input units;
cL and cR are the bias vectors for hidden units; A. Bar-And-Stripe Benchmark Inpainting
Output: The updated parameters in the RBM: WL , bL , cL ,
In this part, the practical aspects of training RBM and
WR , bR , cR .
for all hidden units i do FRBM based on BAS benchmark dataset [35] will be dis-
L(0) R(0) cussed. There are 60, 000 BAS benchmarks created randomly,
compute PL (hi = 1|x(0) ) and PR (hi = 1|x(0) ) by
and each of them has 5×5 units. Every benchmark is generated
using Eqn. (27);
L(0) L(0) by randomly choosing a direction (row or column) with equal
sample hi ∈ {0, 1} from PL (hi |x(0) ); probability first, and then set the states for all units of the
R(0) R(0)
sample hi ∈ {0, 1} from PR (hi |x(0) ); row or column uniformly at random. Therefore, the dataset
end for consists of 64 different BAS benchmarks. One example for
for all visible units j do BAS benchmarks is illustrated in right part of Fig. 5. The RBM
L(1) R(1)
compute PL (xj = 1|hL(0) ) and PR (xj = 1|hR(0) ) and FRBM models are used to learn the joint probability over
by using Eqn. (28); the BAS benchmark.
L(1) L(1)
sample xj ∈ {0, 1} from PL (xj |hL(0) );
R(1) R(1)
sample xj ∈ {0, 1} from PR (xj |hR(0) );
end for
for all hidden units i do
L(1) R(1)
compute PL (hi = 1|xL(1) ) and PR (hi = 1|xR(1) )
by using Eqn. (27);
L(1) L(1)
sample hi ∈ {0, 1} from PL (hi |xL(1) );
R(1) R(1)
sample hi ∈ {0, 1} from PR (hi |xR(1) );
end for
WL = WL + ǫ(x(0) · PL (hL(0) = 1|x(0) ) − xL(1) ·
PL (hL(1) = 1|xL(1) ));
bL = bL + ǫ(x(0) − xL(1) );
cL = cL + ǫ(PL (hL(0) = 1|x(0) ) − PL (hL(1) = 1|xL(1) ));
WR = WR + ǫ(x(0) · PR (hR(0) = 1|x(0) ) − xR(1) ·
PR (hR(1) = 1|xR(1) )); Fig. 5: Bar-and-stripe Benchmark
bR = bR + ǫ(x(0) − xR(1) );
cR = cR + ǫ(PR (hR(0) = 1|x(0) ) − PR (hR(1) = 1|xR(1) )); As generative models, fuzzy RBM and RBM learn the prob-
return WL , bL , cL ,WR , bR , cR ; ability distribution over visible variables and hidden features.
Once they are well trained, they can generate samples from
the probability distribution P (x). For the BAS benchmark
inpainting problem, it is supposed that there are l known
IV. E XPERIMENTAL R ESULTS visible units x1 , · · · , xl , while xl+1 , · · · , xn are unknown
In this section, the representation capabilities of the RBM observations. Then one can evaluate the unknown observations
and FRBM will be examined on two datasets. One is Bar-And- from conditional probability P (x1 , · · · , xl |xl+1 , · · · , xn ). In
Stripe (BAS) benchmark dataset, and another one is MNIST this process, the inferences can be easily implemented, such
handwritten digits dataset. The RBM and proposed FRBM will as in image classification and inpainting.
be trained in an unsupervised way on BAS dataset to recovery For the image inpainting problem, only a part of the image is
incomplete images. The training on noisy BAS dataset is also known. The task is to infer the unknown part. In Fig. 5, the left
considered to compare the robustness of the two models. To benchmark can be regarded as an incomplete image in which
compare the classification performances of the two models in the pixels in the first row are clamped. The right benchmark
experiments based on MNIST handwritten digits dataset, both is the target image that is required. For a clear illustration,
RBM and FRBM are trained in a supervised manner. the grey and black squares is used to denote binary pixel
On account of the partition function Z in Eqn. (14), it is values. The objective is to recovery the benchmark by using the
very tricky to track the RBM and FRBM training process in FRBM and RBM models. Firstly, there are 60, 000 Bar-and-
unsupervised learning. However, one can observe the learning Stripes benchmarks generated to train the two models. Then
one can recover the benchmark by conducting Gibbs sampling conclude that the FRBM has stronger representation capability
over P (x1 , · · · , xl |xl+1 , · · · , xn ). The unknown observations than the RBM. This advantage springs from the fuzzy rela-
are initially resigned to be 0.5. tionship between the visible and hidden units. Intuitively, the
In the following experiments conducted to evaluate FRBM parameters’ searching space of RBM can be regarded as a sub-
on different energy functions and different types of fuzzy space of the FRBM’s searching space when the relationships
numbers for RBMs’ parameters, the energy function defined are extended.
as Eq. (10) is regarded as type-1 energy function (EF-1), and
the energy of a joint configuration ignoring biases terms is
regarded as type-2 energy function (EF-2). The fuzzy RBM
with symmetric triangular fuzzy number is denoted as FRBM-
STFN. Accordingly, fuzzy RBM with Gaussian membership
function is denoted as FRBM-GMF. In the experiments, the
FRBM and RBM model have 25 visible units and 50 hidden
units. For the two different models, the mini-batch learning
scheme (100 samples each batch) and CD-1 learning approach
are employed to speed up the training processes. The biases b
and c are initialized with zero values, and connection weights
are generated randomly from standard Gaussian distribution in
the initial stages. The learning rates for updating W , b and c
are set to be 0.05. The MSEs (the summation over reconstruc-
tion error of all the 60, 000 training samples) produced in the
two learning phases (50 hidden units) are shown in Fig. 6. It is
easy to see that the FRBM model generates less reconstruction
errors than the RBM model, which means the FRBM can learn
the probability distribution more accurate than the traditional
RBM model. Fig. 7: BAS Benchmark Inpainting Based on RBM
Fig. 6: Learning processes of RBM and FRBM based on

BAS dataset Fig. 8: BAS Benchmark Inpainting Based on FRBM
The comparative recovery results for the RBM and FRBM

models are demonstrated in Fig. 7 and Fig. 8, respectively,
where k denotes the Gibbs step in sampling. As the prob- B. Noisy BAS Benchmark Inpainting
abilistic model has randomness and uncertainty, two stable To verify the robustness of the FRBM model, there are
results are collected and shown in Fig. 7 and Fig. 8. These two 10% and 20% training samples replaced with noisy training
figures show that the FRBM model needs fewer Gibbs steps to samples. Each noisy sample is generated by reversing all the
recovery the image than the RBM model, since the distribution pixel values in a row or column. Then, the FRBM and RBM are
learnt by the FRBM is much closer to the real distribution. trained with these noisy samples to investigate the robustness
Otherwise, the FRBM also produces less negative predictions of the two different models. The inpainting results are shown
of pixels than the RBM in each Gibbs step. Therefore, one can in Fig. 9 and Fig. 10. The same conclusion is that the FRBM
model learn a more accurate distribution than the regular RBM both RBM and FRBM models are much better when the biases
model. terms are taken into account in energy functions for this case.
The energy functions for current variants of RBM are designed
by considering the structures of the variants. Otherwise, the
membership function of fuzzy parameters also has effect on
representation capability of the FRBM model. From the results
in TABLE I, the FRBM model with Gaussian membership
function has better representation of BAS dataset than that
with symmetric triangular membership function. The learning
time for FRBM model raises accordingly as the number of
the parameters increases when fuzziness is introduced into the
model.
C. Handwritten Digits Recognition

In this experiment, the supervised learning is carried out for
the FRBMs and RBMs on MNIST handwritten digits dataset
which contains 60, 000 training and 10, 000 test handwritten
digit images [36]. Each image representing 0 to 9 is con-
structed with 28 × 28 pixels whose intensities range from 0 to
255. This dataset is useful to examine a large number of learn-
Fig. 9: BAS Benchmark Inpainting With 20% Noises Based ing techniques and pattern recognition methods on real-world
on RBM data. One fact should be clarified that it is not expected that the
performances of the FRBM and RBM in this experiment can
outperform other sophisticated machine learning algorithm,
such as support vector machine (SVM), because they are not
deep networks. However, DBM, DBN, and deep auto-encoder
have already surpassed the SVM approach in this experiment.
Herein, the objective is to evaluate the representation capability
of the FRBM model. Its deep architectures and corresponding
pre-training and fine-tuning algorithms will be discussed in
the future. It is optimistic that fuzzy deep auto-encoder, fuzzy
deep belief networks, and fuzzy deep Boltzmann machine,
that are all consists of the fuzzy RBMs, will have significant
improvements of their original versions.
The learning processes of RBM and FRBM for MNIST
dataset are shown in Fig. 11. There are 60, 000 samples in
training dataset, and 100 hidden units for both RBM and
FRBM. One epoch means one learning process over all training
samples that are divided into 600 mini-batches, and all the pa-
rameters are updated one time after each mini-batch is finished.
The MSEs in Fig. 11 are the total square reconstruction errors
over all the training samples. It is clear that FRBM produces
Fig. 10: BAS Benchmark Inpainting With 20% Noises Based less MSE than RBM model, which means FRBM can learn the
on FRBM representation of the MNIST dataset better than the original
RBM model.
As there exists the randomness and uncertainty in the In Fig. 12 and Fig. 13, 100 handwritten digits and their
probabilistic models, it is necessary to analyse the statistical corresponding reconstructions are demonstrated. The recon-
property of the MSE and recovery accuracy (error rate). The structed samples are generated after 20 epoch learning on the
results are demonstrated in TABLE I. The MSEs are the training dataset, where CD−1 learning was employed on the
error summation over 16 generated patterns (each has 25 network with 100 hidden units. These two figures show that
pixel values) from 100 times of Gibbs sampling. The recovery the two-layers FRBM can learn the probability over all the
error rate is percentage of negative recovery pixel values. The pixels very well. In other words, each reconstructed image
results show that the proposed FRBM makes a significant approximates the original image closely.
improvement to the regular RBM, also for the noisy case. The error rates of the classification for the two models are
Additionally, the results show that the energy function plays an shown in TABLE II. From the results, it is reasonable to
important role in model establishment and should be defined conclude that the FRBM outperforms the regular RBM model,
according to the specific graph model. The performance of as its representation capability to model the probability is
EF Type Items Noise Level RBM FRBM-STFN FRBM-GMF

0% 1271 256 248
MSE 10% 6060 1944 1859
20% 8777 4381 4197
EF-1 0% 3.18% 0.64% 0.62%
RER 10% 15.15% 4.86% 4.65%
20% 21.94% 10.95% 10.49%
Learning time(s) - 54 91 117
0% 5921 3451 3293
MSE 10% 8921 5022 4741
20% 11093 8134 7584
EF-2 0% 14.80% 8.63% 8.24%
RER 10% 22.30% 12.56% 11.85%
20% 27.73% 20.34% 18.96%
Learning time(s) - 45 72 91
TABLE I: BAS Benchmark Inpainting in Testing Process:

Mean Square Error (MSE), Recovery Error Rate (RER)
Fig. 11: Learning Processes of RBM and FRBM Based on Fig. 12: MNIST Samples
MNIST Dataset
V. C ONCLUSION
In order to improve the representation capability and robust-
ness of the RBM model, a novel FRBM model is proposed, in
increased by relaxing restrictions on the relationships between which the connection weighs and biases between visible and
cross-layer units. When the number of hidden units are 100, hidden units are fuzzy numbers. It means that the relationship
the fuzzy RBM improves the recognition accuracy by 1.44%. It between input and its high level features (hidden units in the
is a significant advancement in the MNIST digits recognition, graph) are extended. After relaxing the relationship between
as the state-of-the-art machine learning recognition algorithms the random variables by introducing fuzzy parameters, the
only produce 0.5% improvement compared with the previous RBM is only a special case of the FRBM when no fuzziness
approaches. As the number of hidden units increasing, the exists in the fuzzy model. In other words, the parameter
fuzzy RBMs produce less error rate than the RBM but the searching space of RBM is only a sub-space of the FRBM’s
improvement of the fuzzy RBM decreases to 1.06%. It is searching space. Therefore, the FRBM has more powerful
because that both the RBMs and fuzzy RBMs reach their representation capability than the regular RBM. On the other
capacities to model the data. As every image in MNIST dataset hand, the robustness of the FRBM is also more powerful than
is different and with low resolution (noises already exist, see the RBM model. These merits attribute to the fuzziness of the
Fig. 12), it does not need to factitiously add noises into the FRBM model.
images. Both the RBM and FRBM can be trained in both supervised
m Criterion RBM FRBM-STFN FRBM-GMF

Error rate 667/10,000 523/10,000 486/10,000
m=100
Learning time(s) 705 1138 1365
Error rate 412/10,000 281/10,000 254/10,000
m=400
Error rate 367/10,000 249/10,000 236/10,000
m=800
Error rate 361/10,000 245/10,000 230/10,000
m=1000
TABLE II: MNIST Classification: error rate is the percentage of the negative recognition for 10, 000 testing samples, m is the
number of hidden units
after conforming the improvement of the FRBM, one can

have enough confidence to develop its deep architectures and
corresponding learning algorithms in the coming future, such
as fuzzy deep auto-encoder, fuzzy deep belief networks, and
fuzzy deep Boltzmann machine.
A PPENDIX A
In the following, the detailed process for calculating the

gradients of the negative log-likelihood function is presented
as
Fig. 13: Reconstructed MNIST Samples after 20 epoch with ∂ log P (x, θ)
100 hidden units −
∂θ
∂Fc (x, θ) 1 X −Fc (x̃,θ) ∂Fc (x̃, θ)
= − e
∂θ Z ∂θ
and unsupervised way, the performances of the FRBM are x̃
∂Fc (x, θ) X ∂Fc (x̃, θ)
verified in the experiments conducted on BAS benchmark in- = − P (x̃)
painting problem and MNIST handwritten digits classification ∂θ ∂θ
x̃
problem. The powerful robustness of the FRBM model is also
∂Fc (x, θ) ∂Fc (x, θ)
examined on BAS benchmarks with different levels of noises. = − EP . (34)
From those experiments, it is reasonable to conclude that the ∂θ ∂θ
proposed FRBM has more representation capability to model
the probability distribution than the RBM. In the meantime,
the FRBM also shows out more powerful robustness that is
necessary to address the noisy problem in training datasets
and variational real-valued applications.
There still exists many open problems for the RBM and A PPENDIX B
FRBM, such as model selection and developments of deep
networks based on them. For the model selection problem, it
is very hard to determine how many units in hidden layer,
and how many hidden layers in deep architectures. Other As the energy function is denoted by
setting problems, such as Gibbs step, learning rate, and batch X
learning, also impact the performance of the models. On E(x, h) = −µ(x) + φi (x, hi ), (35)
the other hand, the computational cost of training their deep i
architectures is so high that needs to be speed up in some where
more efficient optimization approaches [37]. These are the
common challenges for training deep networks. However, µ(x) = bT x, φi (x, hi ) = −hi (ci + Wi x), (36)
then, the likelihood function can be given by all the gradients can be expressed as
P (x) ∂Fc (x)
= −PL (hi |x) · xL
j,
1 −E(x) ∂WijL
= e
Z ∂Fc (x)
1 X −E(x,h̃) = −PL (hi |x),
= e ∂cL
i
Z
h̃ ∂Fc (x)
1 X X X µ(x)−Pm = −xL
j,
= ··· e i=1 φi (x,hi ) ∂bL
j
Z (41)
h1 h2 hm ∂Fc (x)
= −PR (hi |x) · xR
j ,
m
1 X X X µ(x) Y −φi (x,hi ) ∂WijR
= ··· e e
Z ∂Fc (x)
h1 h2 i=1 hm = −PR (hi |x),
∂cR
eµ(x) X −φ1 (x,h1 ) X −φ2 (x,h2 ) i
= e e ∂Fc (x)
Z = −xR
j .
X
h1 h2 ∂bR
j
··· e−φm (x,hm )
hm
R EFERENCES
m
eµ(x) Y −φi (x,hi ) [1] P. Smolensky, “Parallel distributed processing: Explorations in the
= e . (37) microstructure of cognition.” MIT Press, 1986, vol. 1, pp. 194–281.
Z i=1 [2] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
layer-wise training of deep networks,” Advances in neural information
For the RBM model with binary value units, its free energy processing systems, vol. 19, p. 153, 2007.
can be written as [3] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554,
F (x) = − log P (x) − log Z 2006.
m
X X [4] R. Salakhutdinov and G. Hinton, “An efficient learning procedure for
= −µ(x) − log e−φi (x,hi ) deep boltzmann machines,” Neural Computation, vol. 24, no. 8, pp.
i=1 hi 1967–2006, 2012.
m
X [5] R. Salakhutdinov, J. Tenenbaum, and A. Torralba, “Learning with
= −bT x − log(1 + e(ci +Wi x) ). (38) hierarchical-deep models,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 35, no. 8, pp. 1958–1971, Aug 2013.
i=1
[6] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,
2006.
[7] J. Schmidhuber, “Multi-column deep neural networks for image classi-
fication,” in Proceedings of the 2012 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), ser. CVPR ’12. IEEE
A PPENDIX C Computer Society, 2012, pp. 3642–3649.
[8] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
“Stacked denoising autoencoders: Learning useful representations in a
deep network with a local denoising criterion,” The Journal of Machine
As the free energy of RBM model is explicitly deduced as Learning Research, vol. 11, pp. 3371–3408, 2010.
m
X [9] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
F (x) = −bT x − log(1 + e(ci +Wi x) ), review and new perspectives,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
i=1
[10] R. Salakhutdinov and G. Hinton, “Replicated softmax: an undirected
corresponding to Eqn. (19) and (30), it is easily to get topic model,” in In Advances in Neural Information Processing Systems,
2010.
1 1
Fc (x) = − µ(x; θL ) − µ(x; θR ) [11] Y. Bengio, “Learning deep architectures for ai,” Foundations and Trends
2 2 in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
m
1X L L L [12] G. Hinton, “A practical guide to training restricted boltzmann ma-
− log(1 + eφi (x,hi ,ci ,Wi ) ) chines,” UTML TR 2010-003, 2010.
2 i=1
[13] N. Wang, J. Melchior, and L. Wiskott, “Gaussian-binary restricted
m
1X R R R boltzmann machines on modeling natural image statistics,” CoRR, vol.
− log(1 + eφi (x,hi ,ci ,Wi ) ). (39) abs/1401.5900, 2014.
2 i=1 [14] G. W. Taylor, G. E. Hinton, and S. T. Roweis, “Two distributed-
state models for generating high-dimensional time series,” Journal of
In term of the following donations Machine Learning Research, vol. 12, pp. 1025–1068, 2011.
PL (hi |x) = σ(cL L [15] K. H. Cho, T. Raiko, and A. Ilin, “Gaussian-bernoulli deep boltzmann
i + Wi x)), machine,” in Neural Networks (IJCNN), The 2013 International Joint
(40)
PR (hi |x) = σ(ci + WiR x)),
R Conference on, Aug 2013, pp. 1–7.
[16] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann C. L. Philip Chen (S’88-M’88-SM’94-F’07) re-
machines for collaborative filtering,” in Proceedings of the 24th In- ceived the M.S. degree in electrical engineering from
ternational Conference on Machine Learning, ser. ICML ’07. New the University of Michigan, Ann Arbor, in 1985
York, NY, USA: ACM, 2007, pp. 791–798. and the Ph.D. degree in electrical engineering from
[17] G. W. Taylor, G. E. Hinton, and S. Roweis, “Modeling human motion Purdue University, West Lafayette, IN, in 1988.
using binary latent variables,” in Advances in Neural Information After having worked at U.S. for 23 years as a
Processing Systems. MIT Press, 2007, pp. 1345–1352. tenured professor, as a department head and associate
dean in two different universities, he is currently
[18] I. Sutskever and G. E. Hinton, “Learning multilevel distributed repre- the Dean of the Faculty of Science and Technology,
sentations for high-dimensional sequences,” in International Conference
University of Macau, Macau, China and a Chair
on Artificial Intelligence and Statistics, ser. JMLR Proceedings, vol. 2. Professor of the Department of Computer and In-
JMLR.org, 2007, pp. 548–555. formation Science.
[19] H.-F. Wang and R.-C. Tsaur, “Insight of a fuzzy regression model,” Dr. Chen is a Fellow of the IEEE and the AAAS. After being the IEEE
Fuzzy Sets and Systems, vol. 112, no. 3, pp. 355–369, 2000. SMC Society President (2012-2103), currently, he is the Editor-in-Chief of
[20] P.-Y. Hao and J.-H. Chiang, “Fuzzy regression analysis by support IEEE Transactions on Systems, Man, and Cybernetics: Systems (2014-) and
vector learning approach,” IEEE Transactions on Fuzzy Systems, vol. 16, associate editors of several IEEE Transactions. He is also the Chair of
no. 2, pp. 428–441, April 2008. TC 9.1 Economic and Business Systems of IFAC. His research areas are
[21] A. Fischer and C. Igel, “Training restricted boltzmann machines: An systems, cybernetics, and computational intelligence. In addition, he is a
introduction,” Pattern Recognition, vol. 47, no. 1, pp. 25–39, 2014. program evaluator for Accreditation Board of Engineering and Technology
Education (ABET) in Computer Engineering, Electrical Engineering, and
[22] C.-Y. Zhang and C. Chen, “An automatic setting for training restricted Software Engineering programs.
boltzmann machine,” in Systems, Man and Cybernetics (SMC), 2014
IEEE International Conference on, 2014, pp. 4037–4041.
[23] M. Aoyagi, “Learning coefficient in bayesian estimation of restricted
boltzmann machine,” Journal of Algebraic Statistics, vol. 4, no. 1, pp.
31–58, 2013.
[24] M. Aoyagi and K. Nagata, “Learning coefficient of generalization
error in bayesian estimation and vandermonde matrix-type singularity,”
Neural Computation, vol. 24, no. 6, pp. 1569–1610, 2012.
[25] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and ChunYang Zhang received the B.S. degree in Math-
Applications. Prentice-Hall, 1995. ematics from Beijing Normal University Zhuhai,
[26] S. Dutta and M. Chakraborty, “Fuzzy relation and fuzzy function over China, in 2010 and M.S. degree in Mathematics
fuzzy sets: a retrospective,” Soft Computing, pp. 1–14, 2014. from University of Macau, Macau, in 2012. He is
currently working toward the Ph.D. degree in the
[27] J. J. Buckley, Fuzzy Probabilities: New Approach And Applications, ser. Department of Computer and Information Science,
Studies in fuzziness and soft computing. Springer, 2009. Faculty of Science and Technology, University of
[28] W. Pedrycz, A. Skowron, and V. Kreinovich, Handbook of Granular Macau,Macau, China.
Computing. Wiley-Interscience, 2008. His research interests include computational intel-
[29] W. A. Lodwick and J. Kacprzyk, Fuzzy Optimization: Recent Advances ligence, data mining and machine learning algorithm.
and Applications. Springer, 2010.
[30] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,”
Information Sciences, vol. 132, no. 14, pp. 195 – 220, 2001.
[31] O. Castillo and P. Melin, Type-2 Fuzzy Logic: Theory and Applications.
Springer, 2008.
[32] G. Hinton, “Training products of experts by minimizing contrastive
divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002.
[33] M. A. C. Perpinan and G. E. Hinton, “On contrastive divergence
learning,” Articial Intelligence and Statistics, 2005. Long Chen (M’11) received the B.S. degree in in-
[34] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An formation sciences from Peking University, Beijing,
introduction to variational methods for graphical models,” Machine China in 2000, the M.S.E. degree from Institute
Learning, vol. 37, no. 2, pp. 183–233, 1999. of Automation, Chinese Academy of Sciences in
[35] C. M. Bishop, Pattern Recognition and Machine Learning. Secaucus, 2003, the M.S. degree in computer engineering, from
NJ, USA: Springer-Verlag New York, Inc., 2006. University of Alberta, Canada in 2005, and the Ph.D.
degree in electrical engineering from the University
[36] G. E. Hinton, “Learning multiple layers of representation,” Trends in of Texas at San Antonio, USA in 2010. From 2010 to
Cognitive Sciences, vol. 11, no. 10, pp. 428 – 434, 2007. 2011 he was a postdoctoral fellow in the University
[37] C. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, of Texas at San Antonio. Dr. Chen currently is an
techniques and technologies: A survey on big data,” Information Sci- assistant professor with the department of Computer
ences, vol. 275, no. 0, pp. 314 – 347, 2014. and Information Science, University of Macau, China.
His current research interests include computational intelligence, Bayesian
methods, and other machine learning techniques and their applications. Dr.
Chen has been working in publication matters for many IEEE conferences
and was the Publications Co-chair of the IEEE International Conference on
Systems, Man and Cybernetics (SMC) 2009, 2012, and 2014.
Min Gan received the B.S. degree in computer

science and engineering from Hubei University of
Technology,Wuhan, China, in 2004, and the Ph.D.
degree in control science and engineering from Cen-
tral South University, Changsha, China, in 2010. He
is currently an associate professor in the School
of Electrical Engineering and Automation, Hefei
University of Technology, Hefei, China.
His current research interests include neural net-
works, system identification and nonlinear time se-
ries analysis.

2015 Fuzzy Restricted Boltzmann Machine For The Enhancement of Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2015 Fuzzy Restricted Boltzmann Machine For The Enhancement of Deep Learning

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

IEEE TRANSACTIONS ON FUZZY SYSTEMS 1

Fuzzy Restricted Boltzmann Machine to Enhance

IEEE TRANSACTIONS ON FUZZY SYSTEMS 2

with noises. These similar merits also can be initially found in

IEEE TRANSACTIONS ON FUZZY SYSTEMS 3

IEEE TRANSACTIONS ON FUZZY SYSTEMS 4

B. FRBM Learning Algorithm respect to θR is given by

IEEE TRANSACTIONS ON FUZZY SYSTEMS 5

IEEE TRANSACTIONS ON FUZZY SYSTEMS 6

1. process by inspecting the negative samples in BAS bench-

IEEE TRANSACTIONS ON FUZZY SYSTEMS 7

Fig. 6: Learning processes of RBM and FRBM based on

The comparative recovery results for the RBM and FRBM

IEEE TRANSACTIONS ON FUZZY SYSTEMS 8

C. Handwritten Digits Recognition

IEEE TRANSACTIONS ON FUZZY SYSTEMS 9

EF Type Items Noise Level RBM FRBM-STFN FRBM-GMF

TABLE I: BAS Benchmark Inpainting in Testing Process:

IEEE TRANSACTIONS ON FUZZY SYSTEMS 10

m Criterion RBM FRBM-STFN FRBM-GMF

after conforming the improvement of the FRBM, one can

In the following, the detailed process for calculating the

IEEE TRANSACTIONS ON FUZZY SYSTEMS 11

IEEE TRANSACTIONS ON FUZZY SYSTEMS 12

IEEE TRANSACTIONS ON FUZZY SYSTEMS 13

Min Gan received the B.S. degree in computer

You might also like