You are on page 1of 11

Boltzmann Machines

https://www.geeksforgeeks.org/types-of-boltzmann-machines/

Deep Learning models are broadly classified into supervised and

unsupervised models.

Supervised DL models:

• Artificial Neural Networks (ANNs)

• Recurrent Neural Networks (RNNs)

• Convolutional Neural Networks (CNNs)

Unsupervised DL models:

• Self Organizing Maps (SOMs)

• Boltzmann Machines

• Autoencoders

Boltzmann Machines:

• It is an unsupervised DL model in which every node is connected to

every other node.

• That is, unlike the ANNs, CNNs, RNNs and SOMs, the Boltzmann

Machines are undirected (or the connections are bidirectional).

• Boltzmann Machine is not a deterministic DL model but

a stochastic or generative DL model.

• It is rather a representation of a certain system.

• There are two types of nodes in the Boltzmann Machine — Visible

nodes — those nodes which we can and do measure, and the Hidden

nodes – those nodes which we cannot or do not measure.


• Although the node types are different, the Boltzmann machine

considers them as the same and everything works as one single

system.

• The training data is fed into the Boltzmann Machine and the weights

of the system are adjusted accordingly.

• Boltzmann machines help us understand abnormalities by learning

about the working of the system in normal conditions.

Boltzmann Machine

Energy-Based Models:
Boltzmann Distribution is used in the sampling distribution of the

Boltzmann Machine. The Boltzmann distribution is governed by the

equation –
(- i/kT)/ ∑e(-∈j/kT)
Pi = e ∈

Pi - probability of system being in state i


∈i - Energy of system in state i
T - Temperature of the system(The term
"temperature" in these models is used as a
metaphorical or mathematical concept to
control the level of randomness or
uncertainty in the system)
k - Boltzmann constant

∑e(-∈j/kT) - Sum of values for all possible states

of the system
• Boltzmann Distribution describes different states of the system

and thus Boltzmann machines create different states of the

machine using this distribution.

• From the above equation, as the energy of system increases, the

probability for the system to be in state ‘i’ decreases.

• Thus, the system is the most stable in its lowest energy state (a

gas is most stable when it spreads).

• Here, in Boltzmann machines, the energy of the system is defined

in terms of the weights of synapses.

• Once the system is trained and the weights are set, the system

always tries to find the lowest energy state for itself by adjusting

the weights.
Climbing a Hill

Imagine you are on a hiking trip, and you come across a range of hills. Each

hill represents a different state, and the height of the hill represents the

energy of that state. The higher the hill, the more energy a state has.

Now, let's relate this to the Boltzmann distribution:

1. Pi (Probability): Pi represents the likelihood that you are at a specific

hill. This probability is like the chance that you find yourself on a

particular hill.

2. ∈i (Energy): The energy in this scenario corresponds to the height of

the hill. Higher hills have more energy, while lower hills have less

energy.

3. T (Temperature): Temperature stands for the weather conditions on

your hiking trip. On a hot day, you are more likely to find yourself on

higher hills (higher energy states), while on a cold day, you'd prefer to

stay on lower hills (lower energy states).

4. k (Boltzmann Constant): The Boltzmann constant acts as a rule that

links energy and temperature. It determines how much the weather

affects your choice of hills.

5. Σ (Summation): Just like before, this symbol (∑) means adding up the

heights of all possible hills you could climb.

In this hiking analogy:

• The probability (Pi) that you are on a specific hill is determined by how

high that hill is (its energy) relative to the heights of other hills, all

based on the temperature (weather).

• If a hill is very high (high energy), you are less likely to be there on a

cold day (low temperature). You prefer to stay on lower hills, where it's
more stable. But on a hot day (high temperature), you might be more

likely to climb higher because you have more energy to reach the

summit.

• Your goal is to find the most stable hill (the lowest energy state). You

adjust your hiking path based on the temperature and the heights of

other hills until you discover the best hill to explore.

In this way, just as you would choose your hill to climb based on the weather

and the energy of different hills, Boltzmann Machines help a system find its

most stable state (lowest energy) by adjusting its "weights" (connections) in

response to the temperature and the energy of other possible states.

Types of Boltzmann Machines:

• Restricted Boltzmann Machines (RBMs)

• Deep Belief Networks (DBNs)

• Deep Boltzmann Machines (DBMs)

Restricted Boltzmann Machines (RBMs):

In a full Boltzmann machine, each node is connected to every other node

and hence the connections grow exponentially. This is the reason we use

RBMs. The restrictions in the node connections in RBMs are as follows –

• Hidden nodes cannot be connected to one another.

• Visible nodes connected to one another.

Energy function example for Restricted Boltzmann Machine –

E(v, h) = -∑ aivi - ∑ bjhj - ∑∑ viwi,jhj


a, v - biases in the system - constants

vi, hj - visible node, hidden node


P(v, h) = Probability of being in a certain state
P(v, h) = e(-E(v, h))/Z

Z - sum of values for all possible states


Suppose that we are using our RBM for building a recommender system

that works on six (6) movies. RBM learns how to allocate the hidden nodes

to certain features. By the process of Contrastive Divergence, we make the

RBM close to our set of movies that is our case or scenario. RBM identifies

which features are important by the training process. The training data is

either 0 or 1 or missing data based on whether a user liked that movie (1),

disliked that movie (0) or did not watch the movie (missing data). RBM

automatically identifies important features.

Contrastive Divergence:

• RBM adjusts its weights by this method.

• Using some randomly assigned initial weights, RBM calculates the

hidden nodes, which in turn use the same weights to reconstruct the

input nodes.

• Each hidden node is constructed from all the visible nodes and each

visible node is reconstructed from all the hidden node and hence, the

input is different from the reconstructed input, though the weights are

the same.

• The process continues until the reconstructed input matches the

previous input. The process is said to be converged at this stage.

• This entire procedure is known as Gibbs Sampling.


Gibb’s Sampling

The Gradient Formula gives the gradient of the log probability of the

certain state of the system with respect to the weights of the system. It is

given as follows –

d/dwij(log(P(v0))) = <vi0 * hj0> - <vi∞ * hj∞>


v - visible state, h- hidden state
<vi0 * hj0> - initial state of the system
<vi∞ * hj∞> - final state of the system
P(v0) - probability that the system is in state v0
wij - weights of the system
The above equations tell us – how the change in weights of the system will

change the log probability of the system to be a particular state. The system

tries to end up in the lowest possible energy state (most stable). Instead of

continuing the adjusting of weights process until the current input matches

the previous one, we can also consider the first few pauses only. It is

sufficient to understand how to adjust our curve so as to get the lowest

energy state. Therefore, we adjust the weights, redesign the system and

energy curve such that we get the lowest energy for the current position.

This is known as the Hinton’s shortcut.


Hinton’s Shortcut

Imagine you're building a movie recommendation system:

1. Restricted Boltzmann Machine (RBM):

• In this system, you have six movies.

• RBM is like a smart system that learns how to recommend

movies to users based on their preferences.

• It has two types of nodes: visible nodes (representing users'

preferences for movies) and hidden nodes (representing

features that make a movie appealing, like genre, actors, etc.).

• The RBM's job is to find which features (hidden nodes) are most

important for recommending movies based on users'

preferences (visible nodes).

2. Contrastive Divergence:

• This is how RBM learns. It starts with some random guesses for

its recommendations.
• RBM calculates how likely users would like these movies based

on its initial guesses for features.

• It then adjusts its recommendations based on how different its

initial guesses were from users' actual preferences.

• This process continues until RBM's recommendations closely

match what users really like. This is like fine-tuning its

recommendations.

3. Gibbs Sampling:

• During the learning process, RBM uses Gibbs Sampling. This

means it repeatedly tries to guess the important features and

refine them.

• It calculates how different its guesses for features were from

the real features.

• It does this by going back and forth between visible and hidden

nodes, each time adjusting its guesses to get closer to the real

preferences.

4. Gradient Formula:

• The gradient formula tells us how to adjust the RBM's guesses

for features (weights) to make its recommendations more

accurate.

• It calculates how the change in these guesses affects the

system's recommendations (log probability).


• It measures the difference between RBM's initial and final

guesses for features.

The whole point is to make RBM's recommendations as close as possible to

what users actually like, which is done by tweaking its guesses for

important movie features (hidden nodes).

For instance, suppose RBM initially thought that "action" is not important,

but users love action movies. Through Contrastive Divergence and Gibbs

Sampling, it would learn to adjust its guesses and find that "action" is

indeed an important feature for making recommendations. The gradient

formula helps fine-tune these adjustments.

In a simplified way, think of RBM as a movie recommender that initially

guesses what features are important in movies, and through learning and

adjustments, it refines its recommendations to match what users enjoy.

Working of RBM – Illustrative Example –

Consider – Mary watches four movies out of the six available movies and

rates four of them. Say, she watched m 1, m3, m4 and m5 and likes m3,

m5 (rated 1) and dislikes the other two, that is m 1, m4 (rated 0) whereas the

other two movies – m2, m6 are unrated. Now, using our RBM, we will

recommend one of these movies for her to watch next. Say –

• m3, m5 are of ‘Drama’ genre.

• m1, m4 are of ‘Action’ genre.

• ‘Dicaprio’ played a role in m5.

• m3, m5 have won ‘Oscar.’

• ‘Tarantino’ directed m4.

• m2 is of the ‘Action’ genre.


• m6 is of both the genres ‘Action’ and ‘Drama’, ‘Dicaprio’ acted in it

and it has won an ‘Oscar’.

We have the following observations –

• Mary likes m3, m5 and they are of genre ‘Drama,’ she probably likes

‘Drama’ movies.

• Mary dislikes m1, m4 and they are of action genre, she

probably dislikes ‘Action’ movies.

• Mary likes m3, m5 and they have won an ‘Oscar’, she

probably likes an ‘Oscar’ movie.

• Since ‘Dicaprio’ acted in m5 and Mary likes it, she will

probably like a movie in which ‘Dicaprio’ acted.

• Mary does not like m4 which is directed by Tarantino, she

probably dislikes any movie directed by ‘Tarantino’.

Therefore, based on the observations and the details of m2, m6; our

RBM recommends m6 to Mary (‘Drama’, ‘Dicaprio’ and ‘Oscar’ matches both

Mary’s interests and m6). This is how an RBM works and hence is used in

recommender systems.

Working of RBM

Thus, RBMs are used to build Recommender Systems.

You might also like