Study Materials - Restricted Boltzmann Machine

1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine
Overview of Restricted Boltzmann Machine
Introduction
As promised in my last blog today I am going to discuss about another type Neural Network -
Restricted Boltzmann Machine (RBM). It is an unsupervised machine learning technique. It is scary
looking math formulas are there behind RBM. Here I am trying to give a simple, easy to understand
explanation of RBM without much deep dive into the math.
Restricted Boltzmann Machine is neural network. This belongs to Energy Based Models. RBM may
not be a very well-known name like CNN or RNN but it gained popularity recently. In Netflix Prize
competition RBM achieved state of the art performance.
Restricted Boltzmann Machine is a special type of Boltzmann Machine. So first let’s see what is
Boltzmann Machine.
What is the Boltzmann Machine (BM)?

Boltzmann Machine was first invented in 1985 by Geoffrey Hinton. It is a generative unsupervised
model. Now what is generative? Yes, you are right. It generates. Now what it’s generates? It generates
new data which it has not seen before. It learns from provided input and generates output based on its
understanding from input data.
Boltzmann Machine has an input layer (as visible from outside so referred to as the visible layer) and
one or several hidden layers (referred to as the hidden layer).
Boltzmann Machine is a neural network. We have multiple neurons in every layer. All neurons are
connected not only to other neurons in other layers but also to neurons within the same layer.
Connections are bidirectional. In Boltzmann Machine all neurons are same. There is no difference
between input layer and hidden layer neuron.
Suppose you are having a factory with many sophisticated machines. And you are concern about safety
of workers and factory. There are certain parameters that you monitor regularly, like presence of
smoke, temperature, air quality etc. If all these parameters are in certain range, then we can consider
the factory is safe. You want a system which can warn you as soon as there is some change in normal
state of any of these parameters or combination of parameters. Now we cannot go for Supervised
learning in the case. As we will not have data for unusual/ hazardous states. We are here trying to
figure out something that is not happened yet but can happen.
We should be able to detect when the system is going into hazardous state even if not seen such a state
before. This could be done by building a model of a normal state and noticing when new state is
different from the normal states. Boltzmann Machine can help in this. It uses training data as input and
adjusts its weights. Using the input, it learns what are the possible relation between all these
read://https_medium.com/?url=https%3A%2F%2Fmedium.com%2F%40nibeditadas9%2Foverview-of-restricted-boltzmann-machine-a1981feb37f2%2… 1/6
parameters, how do they influence each other. It resembles normal state of any system (here the
factory).
Now Boltzmann Machine can be used monitor our factory and warn in case of any unusual state.
Boltzmann Machine learns how the system works in its normal state through number of good
examples.
What is Restricted Boltzmann Machine (RBM)?

RBM are special type of Boltzmann Machine. They are called ‘Restricted’ because no two nodes in
same group are connected to each other like in original Boltzmann Machine.
How do RBM works?

RBM is a Stochastic Neural Network which means that each neuron will have random behavior when
activated. There are two layers of bias units (hidden bias and visible bias) in an RBM. This is what
makes RBMs different from autoencoders. The hidden bias RBM produce the activation on the forward
pass. The visible bias helps RBM to reconstruct the input during a backward pass. The reconstructed
input is always different from the actual input. As there are no connections among the visible units and
therefore, no way of transferring information among themselves.
Training RBM
Forward Pass:
There can be multiple inputs. All inputs are multiplied by the weights and then added to the bias. Then
the result is passed through sigmoid activation function.
Reconstruction Phase:
It is like the forward pass but in the opposite direction.
h1: hidden layer vector, v0: input vector, W: weight matrix, a: hidden layer bias vector, S: sigmoid
activation function, b: visible layer bias
The difference (v0-v1) can be considered as the reconstruction error that we need to reduce in
subsequent steps of the training process. The weights are adjusted in each iteration to minimize this
error.
Reconstruction is different from regression or classification. In Reconstruction it estimates the

probability distribution of the original input. In classification or regression, we associate a
continuous/discrete value to an input. So, in reconstruction we are trying to guess multiple values at the
same time. This is known as generative learning as opposed to discriminative learning that happens in
a classification problem (mapping input to labels/classes).
Let us now go deeper on how it reduces the error at each step. Suppose we have two normal
distributions, one from the input data (p(x))and one from the reconstructed input (q(x)). The difference
between these two distributions is our error (in the graphical sense) and our goal is to minimize it. KL-
divergence (Kullback-Leibler divergence) measures the non-overlapping areas under the two graphs.
Restrictive Boltzmann Machines are Energy-based models. Now what is energy-based model?
As we know energy is associated with physics, so now why energy is coming into Deep Learning?
Energy based models are probabilistic model governed by an energy function. This function describes
the probability of certain state. Now if we go back to our factory example — our energy-based model
can provide us the probability of normal or hazardous state. RBM used the idea of energy as metric for
the measurement of model’s quality.
E is the energy-based model for hidden vector h and input vector v. Our aim is to minimize the energy
during training. Gibbs Sampling and Contrastive Divergence is used to train RBM.
Gibbs sampling is a Markov chain Monte Carlo algorithm for obtaining a sequence of observations
which are approximately from a specified multivariate probability distribution. This is used where
direct sampling is difficult.
Here, input is represented by v and hidden value by h. p(h|v) is the prediction when the hidden values
is known. p(v|h) is used for prediction of regenerated input values.
Contrastive Divergence is an approximate Maximum-Likelihood learning algorithm. It is used to

approximate the graphical slope representing the relationship between a network’s weights and its
error, called the gradient. It is used in a situation where we can’t evaluate a function or set of
probabilities directly. Some inference model is needed to approximate the algorithm’s learning gradient
and decide which direction to move towards.
Using CD, weights are being updated in RBM. First gradient (delta) is calculated from reconstructed
input. Then delta is added to old weights to get new weights.
For detailed equations or how they are derived, refer to the Guide on training RBM by Geoffrey
Hinton.
Practical Examples
Below are few cases where RBM is used,
· Recommender System: Amazon product suggestions or Netflix movie recommendations, all these are
recommender systems. Good recommender systems are very valuable in today’s World.
· Handwritten digit recognition: This is a very common problem these days. It is used in a variety of
applications like criminal evidence, office computerization, check verification, data entry applications
etc.
Advantages and Disadvantages of RBM
Advantages:
· Expressive enough to encode any distribution and computationally efficient.
· Faster than traditional Boltzmann Machine due to the restrictions in terms of connections between
nodes.
· Activations of the hidden layer can be used as input to other models as useful features to improve
performance
Disadvantages:
· Training is more difficult. It is difficult to calculate the Energy gradient function.
· CD-k algorithm (CD-k represents a CD algorithm with k Gibbs Sampling iterations for each data
point) used in RBMs is not as familiar as the backpropagation.
Conclusion
To summarize, Restricted Boltzmann Machines are unsupervised two layered neural models that learn
from the input distribution using CD-k. This is used for many real time applications. Over the period of
time many improvements are made in the classic RBM. Example : Fuzzy RBM, Infinite RBM etc.
I hope this blog helped you to understand and get an idea about this awesome generative algorithm.

Study Materials - Restricted Boltzmann Machine

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Study Materials - Restricted Boltzmann Machine

Uploaded by

Copyright:

Available Formats

1/30/24, 3:21 AM Overview of Restricted Boltzmann Machine

Overview of Restricted Boltzmann Machine

What is the Boltzmann Machine (BM)?

What is Restricted Boltzmann Machine (RBM)?

How do RBM works?

It is like the forward pass but in the opposite direction.

Reconstruction is different from regression or classification. In Reconstruction it estimates the

Contrastive Divergence is an approximate Maximum-Likelihood learning algorithm. It is used to

Advantages and Disadvantages of RBM

You might also like