Professional Documents
Culture Documents
Neural Unit 2
Neural Unit 2
Solution 2
2
𝑦 𝑦𝑡 𝑦𝑡 − 𝑦 𝑦𝑡 − 𝑦 σ𝑛𝑖=1 𝑦𝑡𝑖 − 𝑦𝑖
Hint: Use Uniform Distribution, Xavier Normal & Uniform & He init Techniques and
find range of the initial weights
Associative memory
Question: What is Associative Memory? Explain its types.
It is called as Content Addressable Memory (CAM)
This type of memory is accessed simultaneously and in parallel
on the basis of data content rather than by specific address or
location.
How memory write word?
1. When a word is written in associative memory, no address
is given.
2. The memory is capable of finding an empty location to
store the word.
How memory read the word?
1. When a word is to be read from an associative memory,
then the content of the word or part of the word is specified.
2. The memory locates all words that matches specified
content and marks them for reading.
Associative memory
It takes relatively less time to find an item based on content
rather than by an address.
A: 101 111100
K: 111 000000
Word 1: 100 111100 no match (0)
Word 2: 101 000001 match (1)
Associative memory
Associative memory
Following are the two types of associative memories we can
observe:
1. Auto Associative Memory
2. Hetero Associative memory
y 1 1 1 1 Correct Response
y 1 1 1 1 Correct Response
Auto Associative memory
Training network with two missing entry:
x 0 0 1 1 1 1 1 1
1 1 1 1
z xW 0 0 1 1 2 2 2 2
1 1 1 1
1 1 1 1
y 1 1 1 1 Correct Response
In this case
1) Training input vector X and target output vector t are different.
2) It uses Hebb’s / Delta Rule to find the weight matrix
Hetero Associative memory
Algorithm:
Step 1: Set all initial weights to zero
Step 2: For each input vector,
Start the weight adjustment by
(1) 𝑤𝑖𝑗 𝑛𝑒𝑤 = 𝑤𝑖𝑗 𝑜𝑙𝑑 + 𝑥𝑖 𝑦𝑗 Hebb Rule
OR
(2) 𝑊 = σ𝑖 𝑆𝑖𝑇 𝑡𝑖
Step 3: Calculate net input 𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑘 𝑤𝑘
Step 4: Take activation function as,
1 𝑖𝑓 𝑧 > 0
𝑦 = ൞ 0 𝑖𝑓 𝑧 = 0
−1 𝑖𝑓 𝑧 < 0
Hetero Associative memory
Example:
Train a heteroassociative network to store the given bipolar input
vector 𝑆 = 𝑆1 , 𝑆2 , 𝑆3 , 𝑆4 to the output vector 𝑡 = 𝑡1 , 𝑡2 . Also
test performance of network with missing and mistaken data.
𝑺𝟏 𝑺𝟐 𝑺𝟑 𝑺𝟒 𝒕𝟏 𝒕𝟐
1 -1 -1 -1 -1 1
1 1 -1 -1 -1 1
-1 -1 -1 1 1 -1
-1 -1 1 1 1 -1
Hetero Associative memory
Solution:
(1) For pair, 𝑆 = (1, −1, −1, −1) and 𝑡 = (−1,1)
1 1 1
1 1 1
w1 S T t 1 1
1 1 1
1 1 1
y 1 1
Correct Response
Hetero Associative memory
Case II: With mistaken data,
𝑥 = [−1 1 1 − 1]
Net input,
4 4
2 2
z xW 1 1 1 1 0 0
2 2
4 4
y 0 0
The whole dataset is used for getting minima in less noisy and
less random manner.
BGD algorithm,
Concept of Optimization
2. Stochastic Gradient Algorithm (SGD)
𝜕𝐸
𝑤𝑖 𝑛𝑒𝑤 = 𝑤𝑖 𝑜𝑙𝑑 − 𝛼
𝜕𝑤𝑖
=1
Concept of Optimization
2. Stochastic Gradient Algorithm (SGD)
Batch Size=2
Concept of Optimization
3. Mini batch Gradient Algorithm (MBGD)
In this variant of gradient descent instead of taking all the
training data, only a subset of the dataset is used for calculating
the loss function. Since we are using a batch of data instead of
taking the whole dataset, fewer iterations are needed.
Mini-batch gradient descent algorithm is faster than both
stochastic gradient descent and batch gradient descent
algorithms.
This algorithm is more efficient and robust than the earlier
variants of gradient descent.
Moreover, the cost function in mini-batch gradient descent is
noisier than the batch gradient descent algorithm but smoother
than that of the stochastic gradient descent algorithm.
Concept of Optimization
3. Mini batch Gradient Algorithm (MBGD)
Advantages
1. It leads to more stable convergence.
2. More efficient gradient calculations.
3. Requires less amount of memory.
Disadvantages
1. Mini-batch gradient descent does not guarantee good
convergence
2. If the learning rate is too small, the convergence rate will be
slow. If it is too large, the loss function will oscillate or even
deviate at the minimum value.
Comparison: Cost Function & Time
Question: Explain the difference between Batch Gradient Descent,
Stochastic Gradient Descent & Mini Batch Gradient Descent.
Concept of Optimization
4. AdaGrad(Adaptive Gradient Descent)
The adaptive gradient descent algorithm is slightly different
from other gradient descent algorithms. In this case the change
in learning rate depends upon the difference in the parameters
during training. The more the parameters get change, the more
minor the learning rate changes.
Disadvantages
1. If the neural network is deep the learning rate becomes
very small number which will cause dead neuron problem.
Hopfield Network
Question: What is Hopfield Network? and Explain its types
𝒘𝟐𝒏 𝒘
𝒘𝟏𝟐 𝟐𝒊
𝒘𝟐𝟏
𝒘𝟏𝒏 𝒘𝟏𝒊
𝒘𝟏𝟏 𝒘𝟏𝟐 = 𝒘𝟐𝟏
𝒘𝟏𝟐
Training Algorithm- Discrete Hopfield Network