Professional Documents
Culture Documents
Introduction
As promised in my last blog today I am going to discuss about another type Neural Network -
Restricted Boltzmann Machine (RBM). It is an unsupervised machine learning technique. It is scary
looking math formulas are there behind RBM. Here I am trying to give a simple, easy to understand
explanation of RBM without much deep dive into the math.
Restricted Boltzmann Machine is neural network. This belongs to Energy Based Models. RBM may
not be a very well-known name like CNN or RNN but it gained popularity recently. In Netflix Prize
competition RBM achieved state of the art performance.
Restricted Boltzmann Machine is a special type of Boltzmann Machine. So first let’s see what is
Boltzmann Machine.
Boltzmann Machine has an input layer (as visible from outside so referred to as the visible layer) and
one or several hidden layers (referred to as the hidden layer).
Boltzmann Machine is a neural network. We have multiple neurons in every layer. All neurons are
connected not only to other neurons in other layers but also to neurons within the same layer.
Connections are bidirectional. In Boltzmann Machine all neurons are same. There is no difference
between input layer and hidden layer neuron.
Suppose you are having a factory with many sophisticated machines. And you are concern about safety
of workers and factory. There are certain parameters that you monitor regularly, like presence of
smoke, temperature, air quality etc. If all these parameters are in certain range, then we can consider
the factory is safe. You want a system which can warn you as soon as there is some change in normal
state of any of these parameters or combination of parameters. Now we cannot go for Supervised
learning in the case. As we will not have data for unusual/ hazardous states. We are here trying to
figure out something that is not happened yet but can happen.
We should be able to detect when the system is going into hazardous state even if not seen such a state
before. This could be done by building a model of a normal state and noticing when new state is
different from the normal states. Boltzmann Machine can help in this. It uses training data as input and
adjusts its weights. Using the input, it learns what are the possible relation between all these
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 1/6
1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
parameters, how do they influence each other. It resembles normal state of any system (here the
factory).
Now Boltzmann Machine can be used monitor our factory and warn in case of any unusual state.
Boltzmann Machine learns how the system works in its normal state through number of good
examples.
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 2/6
1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
Training RBM
Forward Pass:
There can be multiple inputs. All inputs are multiplied by the weights and then added to the bias. Then
the result is passed through sigmoid activation function.
Reconstruction Phase:
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 3/6
1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
h1: hidden layer vector, v0: input vector, W: weight matrix, a: hidden layer bias vector, S: sigmoid
activation function, b: visible layer bias
The difference (v0-v1) can be considered as the reconstruction error that we need to reduce in
subsequent steps of the training process. The weights are adjusted in each iteration to minimize this
error.
Let us now go deeper on how it reduces the error at each step. Suppose we have two normal
distributions, one from the input data (p(x))and one from the reconstructed input (q(x)). The difference
between these two distributions is our error (in the graphical sense) and our goal is to minimize it. KL-
divergence (Kullback-Leibler divergence) measures the non-overlapping areas under the two graphs.
Restrictive Boltzmann Machines are Energy-based models. Now what is energy-based model?
As we know energy is associated with physics, so now why energy is coming into Deep Learning?
Energy based models are probabilistic model governed by an energy function. This function describes
the probability of certain state. Now if we go back to our factory example — our energy-based model
can provide us the probability of normal or hazardous state. RBM used the idea of energy as metric for
the measurement of model’s quality.
E is the energy-based model for hidden vector h and input vector v. Our aim is to minimize the energy
during training. Gibbs Sampling and Contrastive Divergence is used to train RBM.
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 4/6
1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
Gibbs sampling is a Markov chain Monte Carlo algorithm for obtaining a sequence of observations
which are approximately from a specified multivariate probability distribution. This is used where
direct sampling is difficult.
Here, input is represented by v and hidden value by h. p(h|v) is the prediction when the hidden values
is known. p(v|h) is used for prediction of regenerated input values.
Using CD, weights are being updated in RBM. First gradient (delta) is calculated from reconstructed
input. Then delta is added to old weights to get new weights.
For detailed equations or how they are derived, refer to the Guide on training RBM by Geoffrey
Hinton.
Practical Examples
Below are few cases where RBM is used,
· Recommender System: Amazon product suggestions or Netflix movie recommendations, all these are
recommender systems. Good recommender systems are very valuable in today’s World.
· Handwritten digit recognition: This is a very common problem these days. It is used in a variety of
applications like criminal evidence, office computerization, check verification, data entry applications
etc.
Advantages:
· Expressive enough to encode any distribution and computationally efficient.
· Faster than traditional Boltzmann Machine due to the restrictions in terms of connections between
nodes.
· Activations of the hidden layer can be used as input to other models as useful features to improve
performance
Disadvantages:
· Training is more difficult. It is difficult to calculate the Energy gradient function.
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 5/6
1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
· CD-k algorithm (CD-k represents a CD algorithm with k Gibbs Sampling iterations for each data
point) used in RBMs is not as familiar as the backpropagation.
Conclusion
To summarize, Restricted Boltzmann Machines are unsupervised two layered neural models that learn
from the input distribution using CD-k. This is used for many real time applications. Over the period of
time many improvements are made in the classic RBM. Example : Fuzzy RBM, Infinite RBM etc.
I hope this blog helped you to understand and get an idea about this awesome generative algorithm.
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 6/6