Module - 4 - Boltzmann Machines, Energy Based Models, RBM

Boltzmann Machines
https://www.geeksforgeeks.org/types-of-boltzmann-machines/
Deep Learning models are broadly classified into supervised and
unsupervised models.
Supervised DL models:
• Artificial Neural Networks (ANNs)
• Recurrent Neural Networks (RNNs)
• Convolutional Neural Networks (CNNs)
Unsupervised DL models:
• Self Organizing Maps (SOMs)
• Boltzmann Machines
• Autoencoders
Boltzmann Machines:
• It is an unsupervised DL model in which every node is connected to
every other node.
• That is, unlike the ANNs, CNNs, RNNs and SOMs, the Boltzmann
Machines are undirected (or the connections are bidirectional).
• Boltzmann Machine is not a deterministic DL model but
a stochastic or generative DL model.
• It is rather a representation of a certain system.
• There are two types of nodes in the Boltzmann Machine — Visible
nodes — those nodes which we can and do measure, and the Hidden
nodes – those nodes which we cannot or do not measure.

• Although the node types are different, the Boltzmann machine
considers them as the same and everything works as one single
system.
• The training data is fed into the Boltzmann Machine and the weights
of the system are adjusted accordingly.
• Boltzmann machines help us understand abnormalities by learning
about the working of the system in normal conditions.
Boltzmann Machine
Energy-Based Models:
Boltzmann Distribution is used in the sampling distribution of the
Boltzmann Machine. The Boltzmann distribution is governed by the
equation –
(- i/kT)/ ∑e(-∈j/kT)
Pi = e ∈
Pi - probability of system being in state i

∈i - Energy of system in state i
T - Temperature of the system(The term
"temperature" in these models is used as a
metaphorical or mathematical concept to
control the level of randomness or
uncertainty in the system)
k - Boltzmann constant
∑e(-∈j/kT) - Sum of values for all possible states
of the system
• Boltzmann Distribution describes different states of the system
and thus Boltzmann machines create different states of the
machine using this distribution.
• From the above equation, as the energy of system increases, the
probability for the system to be in state ‘i’ decreases.
• Thus, the system is the most stable in its lowest energy state (a
gas is most stable when it spreads).
• Here, in Boltzmann machines, the energy of the system is defined
in terms of the weights of synapses.
• Once the system is trained and the weights are set, the system
always tries to find the lowest energy state for itself by adjusting
the weights.
Climbing a Hill
Imagine you are on a hiking trip, and you come across a range of hills. Each
hill represents a different state, and the height of the hill represents the
energy of that state. The higher the hill, the more energy a state has.
Now, let's relate this to the Boltzmann distribution:
1. Pi (Probability): Pi represents the likelihood that you are at a specific
hill. This probability is like the chance that you find yourself on a
particular hill.
2. ∈i (Energy): The energy in this scenario corresponds to the height of
the hill. Higher hills have more energy, while lower hills have less
energy.
3. T (Temperature): Temperature stands for the weather conditions on
your hiking trip. On a hot day, you are more likely to find yourself on
higher hills (higher energy states), while on a cold day, you'd prefer to
stay on lower hills (lower energy states).
4. k (Boltzmann Constant): The Boltzmann constant acts as a rule that
links energy and temperature. It determines how much the weather
affects your choice of hills.
5. Σ (Summation): Just like before, this symbol (∑) means adding up the
heights of all possible hills you could climb.
In this hiking analogy:
• The probability (Pi) that you are on a specific hill is determined by how
high that hill is (its energy) relative to the heights of other hills, all
based on the temperature (weather).
• If a hill is very high (high energy), you are less likely to be there on a
cold day (low temperature). You prefer to stay on lower hills, where it's
more stable. But on a hot day (high temperature), you might be more
likely to climb higher because you have more energy to reach the
summit.
• Your goal is to find the most stable hill (the lowest energy state). You
adjust your hiking path based on the temperature and the heights of
other hills until you discover the best hill to explore.
In this way, just as you would choose your hill to climb based on the weather
and the energy of different hills, Boltzmann Machines help a system find its
most stable state (lowest energy) by adjusting its "weights" (connections) in
response to the temperature and the energy of other possible states.
Types of Boltzmann Machines:
• Restricted Boltzmann Machines (RBMs)
• Deep Belief Networks (DBNs)
• Deep Boltzmann Machines (DBMs)
Restricted Boltzmann Machines (RBMs):
In a full Boltzmann machine, each node is connected to every other node
and hence the connections grow exponentially. This is the reason we use
RBMs. The restrictions in the node connections in RBMs are as follows –
• Hidden nodes cannot be connected to one another.
• Visible nodes connected to one another.
Energy function example for Restricted Boltzmann Machine –
E(v, h) = -∑ aivi - ∑ bjhj - ∑∑ viwi,jhj

a, v - biases in the system - constants
vi, hj - visible node, hidden node

P(v, h) = Probability of being in a certain state
P(v, h) = e(-E(v, h))/Z
Z - sum of values for all possible states

Suppose that we are using our RBM for building a recommender system
that works on six (6) movies. RBM learns how to allocate the hidden nodes
to certain features. By the process of Contrastive Divergence, we make the
RBM close to our set of movies that is our case or scenario. RBM identifies
which features are important by the training process. The training data is
either 0 or 1 or missing data based on whether a user liked that movie (1),
disliked that movie (0) or did not watch the movie (missing data). RBM
automatically identifies important features.
Contrastive Divergence:
• RBM adjusts its weights by this method.
• Using some randomly assigned initial weights, RBM calculates the
hidden nodes, which in turn use the same weights to reconstruct the
input nodes.
• Each hidden node is constructed from all the visible nodes and each
visible node is reconstructed from all the hidden node and hence, the
input is different from the reconstructed input, though the weights are
the same.
• The process continues until the reconstructed input matches the
previous input. The process is said to be converged at this stage.
• This entire procedure is known as Gibbs Sampling.

Gibb’s Sampling
The Gradient Formula gives the gradient of the log probability of the
certain state of the system with respect to the weights of the system. It is
given as follows –
d/dwij(log(P(v0))) = <vi0 * hj0> - <vi∞ * hj∞>

v - visible state, h- hidden state
<vi0 * hj0> - initial state of the system
<vi∞ * hj∞> - final state of the system
P(v0) - probability that the system is in state v0
wij - weights of the system
The above equations tell us – how the change in weights of the system will
change the log probability of the system to be a particular state. The system
tries to end up in the lowest possible energy state (most stable). Instead of
continuing the adjusting of weights process until the current input matches
the previous one, we can also consider the first few pauses only. It is
sufficient to understand how to adjust our curve so as to get the lowest
energy state. Therefore, we adjust the weights, redesign the system and
energy curve such that we get the lowest energy for the current position.
This is known as the Hinton’s shortcut.

Hinton’s Shortcut
Imagine you're building a movie recommendation system:
1. Restricted Boltzmann Machine (RBM):
• In this system, you have six movies.
• RBM is like a smart system that learns how to recommend
movies to users based on their preferences.
• It has two types of nodes: visible nodes (representing users'
preferences for movies) and hidden nodes (representing
features that make a movie appealing, like genre, actors, etc.).
• The RBM's job is to find which features (hidden nodes) are most
important for recommending movies based on users'
preferences (visible nodes).
2. Contrastive Divergence:
• This is how RBM learns. It starts with some random guesses for
its recommendations.
• RBM calculates how likely users would like these movies based
on its initial guesses for features.
• It then adjusts its recommendations based on how different its
initial guesses were from users' actual preferences.
• This process continues until RBM's recommendations closely
match what users really like. This is like fine-tuning its
recommendations.
3. Gibbs Sampling:
• During the learning process, RBM uses Gibbs Sampling. This
means it repeatedly tries to guess the important features and
refine them.
• It calculates how different its guesses for features were from
the real features.
• It does this by going back and forth between visible and hidden
nodes, each time adjusting its guesses to get closer to the real
preferences.
4. Gradient Formula:
• The gradient formula tells us how to adjust the RBM's guesses
for features (weights) to make its recommendations more
accurate.
• It calculates how the change in these guesses affects the
system's recommendations (log probability).

• It measures the difference between RBM's initial and final
guesses for features.
The whole point is to make RBM's recommendations as close as possible to
what users actually like, which is done by tweaking its guesses for
important movie features (hidden nodes).
For instance, suppose RBM initially thought that "action" is not important,
but users love action movies. Through Contrastive Divergence and Gibbs
Sampling, it would learn to adjust its guesses and find that "action" is
indeed an important feature for making recommendations. The gradient
formula helps fine-tune these adjustments.
In a simplified way, think of RBM as a movie recommender that initially
guesses what features are important in movies, and through learning and
adjustments, it refines its recommendations to match what users enjoy.
Working of RBM – Illustrative Example –
Consider – Mary watches four movies out of the six available movies and
rates four of them. Say, she watched m 1, m3, m4 and m5 and likes m3,
m5 (rated 1) and dislikes the other two, that is m 1, m4 (rated 0) whereas the
other two movies – m2, m6 are unrated. Now, using our RBM, we will
recommend one of these movies for her to watch next. Say –
• m3, m5 are of ‘Drama’ genre.
• m1, m4 are of ‘Action’ genre.
• ‘Dicaprio’ played a role in m5.
• m3, m5 have won ‘Oscar.’
• ‘Tarantino’ directed m4.
• m2 is of the ‘Action’ genre.

• m6 is of both the genres ‘Action’ and ‘Drama’, ‘Dicaprio’ acted in it
and it has won an ‘Oscar’.
We have the following observations –
• Mary likes m3, m5 and they are of genre ‘Drama,’ she probably likes
‘Drama’ movies.
• Mary dislikes m1, m4 and they are of action genre, she
probably dislikes ‘Action’ movies.
• Mary likes m3, m5 and they have won an ‘Oscar’, she
probably likes an ‘Oscar’ movie.
• Since ‘Dicaprio’ acted in m5 and Mary likes it, she will
probably like a movie in which ‘Dicaprio’ acted.
• Mary does not like m4 which is directed by Tarantino, she
probably dislikes any movie directed by ‘Tarantino’.
Therefore, based on the observations and the details of m2, m6; our
RBM recommends m6 to Mary (‘Drama’, ‘Dicaprio’ and ‘Oscar’ matches both
Mary’s interests and m6). This is how an RBM works and hence is used in
recommender systems.
Working of RBM
Thus, RBMs are used to build Recommender Systems.

Module - 4 - Boltzmann Machines, Energy Based Models, RBM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module - 4 - Boltzmann Machines, Energy Based Models, RBM

Uploaded by

Copyright:

Available Formats

Boltzmann Machines

Deep Learning models are broadly classified into supervised and

• Artificial Neural Networks (ANNs)

• Recurrent Neural Networks (RNNs)

• Convolutional Neural Networks (CNNs)

• Self Organizing Maps (SOMs)

• It is an unsupervised DL model in which every node is connected to

every other node.

Machines are undirected (or the connections are bidirectional).

• Boltzmann Machine is not a deterministic DL model but

a stochastic or generative DL model.

• It is rather a representation of a certain system.

• There are two types of nodes in the Boltzmann Machine — Visible

nodes – those nodes which we cannot or do not measure.

considers them as the same and everything works as one single

of the system are adjusted accordingly.

• Boltzmann machines help us understand abnormalities by learning

about the working of the system in normal conditions.

Boltzmann Machine. The Boltzmann distribution is governed by the

Pi - probability of system being in state i

∑e(-∈j/kT) - Sum of values for all possible states

and thus Boltzmann machines create different states of the

machine using this distribution.

• From the above equation, as the energy of system increases, the

probability for the system to be in state ‘i’ decreases.

gas is most stable when it spreads).

• Here, in Boltzmann machines, the energy of the system is defined

in terms of the weights of synapses.

Now, let's relate this to the Boltzmann distribution:

1. Pi (Probability): Pi represents the likelihood that you are at a specific

2. ∈i (Energy): The energy in this scenario corresponds to the height of

3. T (Temperature): Temperature stands for the weather conditions on

stay on lower hills (lower energy states).

4. k (Boltzmann Constant): The Boltzmann constant acts as a rule that

links energy and temperature. It determines how much the weather

affects your choice of hills.

heights of all possible hills you could climb.

In this hiking analogy:

based on the temperature (weather).

other hills until you discover the best hill to explore.

most stable state (lowest energy) by adjusting its "weights" (connections) in

response to the temperature and the energy of other possible states.

Types of Boltzmann Machines:

• Restricted Boltzmann Machines (RBMs)

• Deep Belief Networks (DBNs)

• Deep Boltzmann Machines (DBMs)

Restricted Boltzmann Machines (RBMs):

In a full Boltzmann machine, each node is connected to every other node

RBMs. The restrictions in the node connections in RBMs are as follows –

• Hidden nodes cannot be connected to one another.

• Visible nodes connected to one another.

Energy function example for Restricted Boltzmann Machine –

E(v, h) = -∑ aivi - ∑ bjhj - ∑∑ viwi,jhj

vi, hj - visible node, hidden node

Z - sum of values for all possible states

to certain features. By the process of Contrastive Divergence, we make the

automatically identifies important features.

• RBM adjusts its weights by this method.

• Using some randomly assigned initial weights, RBM calculates the

• The process continues until the reconstructed input matches the

previous input. The process is said to be converged at this stage.

• This entire procedure is known as Gibbs Sampling.

d/dwij(log(P(v0))) = <vi0 * hj0> - <vi∞ * hj∞>