You are on page 1of 13

Pa pe r Code:-PC(:-CSE-401-G

r--- Note: Attempt Jive q11 e:')fio11s in all, selecting 011e question
_ Question No. 1 is comp11/so1:r. A ll qu fro nz each Section.
estion s ca rry equal- mark.-.·, .
Q.1.(a) Hebbian lea rn ing .
c;

Ans. Hebbian Le arn ing Rule: ln Hebbi (3)


an \earning, "weights between learning
are adjusted so that each weight bet nod es
te represents the relationship between
. which tend to be positiv e or neg ative at .the nod es.. Nodes
the same time wi\\ have stro ng positiv
thosewhich tend to be opposite wi\\h ave e we ights while
strong negative weights. Nodes that are
will haveweights near zero. For e:-;.arnp uncotTela•ted
\e, if two nodes A and B1re oft~n simulta
Hebbian learning wHl increase the con nec neo usly act ive,
tion strength between the two so that
iitheronetends to cause excitation of the excitation of
other. On the other hand, if nodes A and
opposit~ activations at al\ times, _then Hebbi C were of
an learning would gradually decrease the
. inbetween beJow zero so t\:lat an connection
\'
exc ite d A or C wo uld inhibit the other.
Forthi~ rul e the learning signal is equal
simple to the neuron 's output
We have r ~ = f(W / x) .
... (i)
The incre1~e nt ~~Vi of the we ight vec~or
becom~s .
:; '.: · ·•. ;.. tiw . = c f(W . 1 x) x·. ;
l ...(ii)
The single weight adjustment using the l
foll.owing increment:
• . ~ivI).. = C f(WI-' x.)l · XI- ...(iii)
This can be written as
~W i; = c oi x/ or j = \, 2, 3, .... .. n ... (iv)
Hebbian \earning has four features:
(l) First it is unsupervised;
·• . .
(2) Second it is a local learning
rule, meaning that it can be_applied to 11 .
!)ara!\el· a etw ork Ill
'
(3) Third it is simple and therefore req
uires very little computation;
(4) Fourth it is biologically plausible
.
Q.1.(b) Re inf orc em en t learning (3)
. .
. A .
ns. Reinforcement learning is an areao fM 11· Learnin(T0 It 1s abo ut takmg sui table .
action t0 ac me
maximize reward in a particular situ . . 1· loyed

by var

ious software an d
lllach· ation . tis emp .. . .
Rei Ines to find the best possible behav~ •t 1 ould take ma specif .
1
our or p~t l i_ s 1 ic s1tuat 1on.
lra-n:orcement learning differs from sup that in su erv ised learning the
ervised le~.mm~ m a_w a; ·th the :orre
~ n1ng data ha s the answer key with ct answer itse lf
it so the m(\del is trame w1
r Solved papers, Feb -2022
2 :;;••1, 7"' Sem eSte , 3
,~hereas in reinforce ment learn ing, there is
Neu ml Networks l~ 11 YIJcivlanc divid_ing th~ classes,
no answer but the reinforcement agent deci 011 <lill'crcnt sides ol plane
des what


to do to perfo rm the given task . In the abse
nce ofa training dataset. it is bound to learn
experience. from its
y
Q.1. (c) Fecdforward vs feedback Netw
orks.'
Ans. Difference between RNN and Feed
( 1) In contrast to feed forward networks,
-forward neural network
recurrent neural networks feature a single
(3) t
weight para111e1er across all network layers.
Reinforcement learning can still be achi eved
adjusting these weights using bnck propagat by
ion and grad ient descent.
(2) Unli ke recune1i1neural networks, 1--- - -~ ..'.~l}--►
which continuously feed info rmation from
input
10 output, feedforward neural networks cons
tantl y feed data back into the input for furth -► x
p1occssingand final output. er
Fig. : The OR Gate Fig. : X-OR f unction
(3) Recurrelll neural networks contain But all the classificatio n problems are not
a feedback loop that allows data to be recy linear sep~rable. F~r example the EX-OR
back into the input h;:t:.irc being forwarded agai cled
n for further processing and final output. Whe blem is not linear sepa rable because EX-OR
reas is non equality gate t.e.
li:edfornard neural n.::t\\ urks just forward data
from input to output. Data can only now in ~ ,
direction in fccclforn ard ncura) net\, orks . Data one Talile: The EX-OR Gate
from prior leve ls can' t be saved because of
forward traveling pauern: hence there is no this
internal state or memory. RNN , on the othe lnputA lnputB Output
uses a loop for cycling through the data, allow r hand ,
ing it to keep track of both old and new info1 0 0
111a1ion. 0
Q. l .(d ) What is the need of Activati I
on Functions in ANN? 0 I
A ns. The activation func tion is the (3)
111ost i111portant factor in a neural network I 0 1
decided " hether or not a neuron will be activ whic h
ated or not and tran sfer red to the next laye I I 0
This simply mea ns that it will decide whe r. .
ther the neuron's input to the network is .
lane that d1v1.des the space so tha1 di ffercnt
or 1101 in the process of prediction. For relevant In this case we can' t draw any hype rp
this reason, it is also refe rred to as thre • . bl h'cli is not linear separable. a
transformation for the neurons which can shol d or classes are on different side s. To cope up " ,ith a pro em w 1 ·
converge the network.
Multilayer Perceptron is required.
Q .1.(c) Explain term Linear Separab
ility classification .
Ans. Linea rs Separability: When a (3) Section-A
linear hyperplane exists to place the instances
0 111.: class on om: side and thos of •
e of other class on the other side of plane. • • How they resemble artificinl neuron modc~s.
?
f.i11ear Sepa rabi lity . For example
The ,rthi s is calle d Q.2. Wh at arc b10log1cal neurons .
in case of OR gate we can draw a linea r . A •rftcial Neural Networks. ( 1:,)
hype rplan e to Compare and con tras t b1ol . · I eurons with rr 1
separate the inputs whose corresponding og1e a n called a neuro n • I .. 11111d·rn1c 111al
outputs are I on one side and the inputs Ans. Biolooical Neuron: Thee Iementary nerve ce11 1st ie '
corresponding outputs are Oan other sidee who se . co1n . I
ofthe plane. o pone nts ofa biolo g,ca neuro1l
building block of biolo gical neural network'. The three mc11n
Table : OR Gate are :
Input A 1 1. A neuron cell body called soma· . r.
Input B Output . receivin" input, allct
2. Branching extens10 II d de11dr11es ,or "'
0 ns ca e to the dendrite, 0 t· 0 ti1"'•r neuron<. . .
0 0 3. An axon that carries tI,e net iron's outp uI I h,ist ol 11111,;
0 . I curon collects s1,,·,,nals fr0m o1hcr~II,rou•·h _. , ,,
I I . .
I 0
lnth e h11111 an brain, a typ1ca n d o111 spikesofelectnca\ac11v1·1y throuI !!h a Ion=-·
I structu res called dendrites . The neuron sen s ~I'
Tl ·on-d.:m n cl<. ,,11.1'1
I I . ids ofbra nche s. ,c
I thin stand known as an axon, whic . h s lits into thousa1 :l.'i
P .
_ r
. Solved papers. Feb -2022
iester, 5
-~,; S¥" between6"ANN and biological networks are as follows•·
r,;i•
Ne11ml Networks r comparision
organ is called a synapse:. At the end of each bran€h, a stn'.cture called a synapse converts the ANN Biological Networks
/
ac1i,~1y from the axon into electrical effec~ that inhibit or excite activity from the axon into TI1e human brain is extremelylarge fora neural
electrical effects tha1inhibit or excite activity in the connected neurons. When a neuron receives vso~ ally been limited to
have us
A]'JNs . 'th hundreds of conne- network containing IO 11neurons and there
I ()()00 ur11ts w1
excita1ory input thal is sufficiently large compared with its inhibitory input, it sends a spike of I . exists IO15 interconnections.
electrical activity down its axon. Leaming occurs by changing the effecti veness ofthe synapses . ns per unit.
so !hat 1he innuence ofone neuron on another changes. The neuron is able 10 respond to lhe ~ I as an organized beha- The memory follows a pattern ofdistributed
V Tl1 111ernory , -
2 . eur an d tI,e data retrieved follows representation and data retrieval depends on
toia l of i1s inpu1s aggregated within a short time interval ca lled period of latent
v10 retention capacity of the brain
swnmation. d red
nnor e sequence.
Leaming takes place arbitrarilyand henc.: the
rning takes place by a formulated
v i..---:-
. system
3 i,ea f les and hence the system ts is bound for failure. Biological networks
4 Parts of a
seto ru . t are fault tolerable and even ina traumatic lo~
~ ' •w c., I The process mg e1emen s
1ess fau IY·
. . many signals and sum up t ie
1 other neurons can takes o.~ ihe functionsol
receive ~
Dendrites : Accept inputs . hted inputs. These networks can damaged cells.
we1g . .
be retrained in case ofs1g111ficant
Soma :·Process the inputs
'
damage (i.e. loss of data).
Axon : Tum the processed Inputs
Into outputs Redundancy is used. Whensome cells die.
Redundancy can increase the reli~bility
4 the system appears to perfon~ th~ sa~e.
Synapses : The electrochemical ofthe system, allowing it to funcu_on
contacl between neurons It can counteract source of noise ,n b1olo-
even when some of the neural units are
gical systems.
destroyed.

The strength
The strength ofneuron depends. on ac1ive
5 of neuron depends on -
chemicals present and connecttons are
Fig. : Biological Neuron. whether It is active or inactive and has
relatively simpler interconnections. stronger or wea ker as a result of s1n1cture
I I rather than individual synapses.
Similarities between artificial neuron :mu ANN: . ayer . ..
.. have broad capab1ht1es
(I) Biological neural networks process infonnation in parnllel; this is also true ofai1ificial Biolog1 system
6 ANNs focus on learning a single task. and cancal address many di fferenl I) pe)
• of
neural networks.
(2) Leaming in biological neural networks is through past experiences which improve -
tasks. - f
their performance level; this is also true ofartificial neural networks. . I o. al systems have the property o
7 Artificial systems are slow to converge 810 oi,1cbl learn . I as littleas one
(J) Learning in biological neural networks involves adjustment of the synaptic w111,
and usually requie hundreds or being a e to . Aface that is viewed
connections; learning in artificial neural networks is also by adjustment of weights. Weight in
artificial neural networks is similar to synapse in biological neural networks. I . .ng presentatton. '
thousands of training presentations for tra1ni an be recognize
once, c
. d auain.
"'
.
-
learning to take place. (S)
(4) lnfonnation transmission in biological neural networks involves using electricalsignals. dcls
. of ANN-
In artificial neural ne1works, electrical signals arc also used in information transmission. Q.3.(a) Explain various a,.cIIItecture mo connection arc I111ecwre .
. t es of neuron
(5) Information storage in biological neural networks is at the synapses, in artificial Ans. There exist five basic YP d twork
neural networks infom1ation is also stored in weights matrix. I s· oJ -layer feed-forwar ne •
. 111,, e .
2. Multilayer feed-fonvar
cl network J
d
6 Neurn/ Neh
oorks B.Tech 7111 Semester, Solved papers, Feb -2022 I -

7
3. Single node with its own feedback
This layer also has a hidden layer that is internal to the network and has no direct
4. Single-layer recurrent network .
contact with the external layer. The existence of one or more hidden layers enables
5. Multilayer recurrent network the
network to be computationally stronger, a feed-forward network because of informat
1. Single-la yer feed-forward network : ion
floW through the input function, and the intermediate computations used to determine
the

~" 0 <>Utput z. There are no feedback connecti ons in which outputs of the model are fed back

"' ~~~ -
into itself.
3. Single node with its own feedback :

r.1- - -~ -\\ •~ ~0
l'l'n 1
INPUT OUTPUT

,-~~
':in-;~ .c..__ _ _ _ _.,..,_ _ _ _ __ 0
_In this type ofn~twork, we have only two layers input layer and the output layer
FEEDBACK
but the 111put layer does not count because no computation is perform ed in this layer. The
output layer is fonned when different weights are applied to input nodes and the cumulati When outputs can be directed back as inputs to the same layer or preceding layer
ve
effect per node is tal<en. After this, the neurons collectively giye the output layer to compute nodes, then it results inieedback networks. Recurrent networks are feedback network
the output signals. s
with closed loops. The above fi gure shows a single recurren t network having a single
2. Multilayer feed-forw_ard network: neuron with feedback to itself.
4. Single-layer recurrent network:
~~ - - - - - - . .
t -- - - - - , ·m - - - -->1
@) r--- -
rrw;r-
,,, Semeste1; Solved papers, Feb -2022 9
8 9fcCII 7
. . Ne11rn/ Nehvorks values tend to yield faster train_ing than _f~m~tions tha_t produce only positive values such as
th The above network 1s a single-layer network with a feedb k
· ac connecti · logistic, because of better numencal cond11Joning. For hidden units, sigmoid activation functions
e prOfessmg element's output can be directed back to itself
I or to another on in which arediflicult to tr~in becaus~ the_ e_rror fu~ction is stepwise constant. hence the gradient eith~r
e ement or both. A recurrent neural network is a class ofa1·t1·f-1c· '1'
• · ' 1a neu ral net processing
k (!oeS not exist or 1s zero, 111a~1ng 1~ 11np~ss1ble to use back propagation or more efficient gradient-
connections between nodes form a directed graph along wor
xJ .b. . ' a sequence Tl . I ·s where baSed training meth~ds. S1g1~01d un_1ts are easier to train that threshold units. With sigmoid
e 11 it dynamic temporal behaviorforatimesequence Un like" d" . 11s a lows it to
RNN . . . ,ee ,orward neu. I units, a small change m the weights will usually produce a change in the outputs, which makes it
s can use their internal state (memory) to process seq ..
M . uences ol inputs ia networks, p0ssible to tell w~1ethcr l!rnt change in the weights is _good or bad. With threshold units. a small
5· ultilayer recurrent network: · ·
change in the weights will often produce no change in the outputs.
Why do we need activation functions: A neural network without an activation

7 function is essentially just a linear regression model. The activation function does the non-
linear transform at ion to the input making it capable to learn and perform more complex

l_ tasks.
Significance: The significance ofthe activation function lies in making a given model
learn and execute di tficult tasks. Further. a non-linear activation function allows the stacking of
multiple layers of neurons to create a deep neural network, which is required to learn complex
data sets with high accuracy.

\
·------0 _j---► Section - B

Q.4. Discuss architecture of McCulloch Pitts Neural Network model in detail.


l_n this type of network, processing element out ut . Also explain McCulloch Pitts model to design logic neh,•orks of AND anti OR logic
element m the same layer and in the d" I p can be directed to the processing
prece 111g ayerforminga I ·1 function. (15)
Th ey perform the same task fo I mu t1 ayer recurrent network
d d r every e ement of a seque .h . Ans. McCulloch-Pitts (MCP) Neuron Model: The early model of an artificial
epen ent on the previous comp t 1· . I nee, wit the output being
. " u a ions. nputs are not n d d neuron is introduced by Warren McCulloch and Walter Pitts in 1943. The MCP neuron
main ,eature of a Recu rrent N IN . . ee e at each time step. The
· fi eura etwork 1s its hidd (i) Has a number of inputs 1 . ...... / • These inputs represent the incoming signals
m ormation about a sequence. . en state, which captures some 1 11
received from the nt:!uron's synapses. So !;can eith~r be I, which corresponds 10 the presence
of an incoming signal from the ith conne..:tion, or I; is 0, which corresponds to the absence ofa
~-3.(b) What arc Activation function and'
Also write significance ofno -1· . • vhywc need these function in ANN? signal from the ith connection.
n mcar functions used i1 ANN . (ii) The MCP neuron produces an output y, which can either be I, corresponding to the
. Ans.A~tivatio·n Functions: Activation Fun . l . . (7) ., neuron sending a signal, or "it can be O, corresponding to the neuron remaining at rest.
introduce non-linearity into the netwo k w· I ct1ons for the hidden units are needed to
. r . IL lOUt non r . I. Mathematically an MCP neuron is a function which takes an n-tuple of 1's and o·s and produ~es
networks more powerful than1·ust I . p • meanty 11dden units would not make
· P am erceptro ( I · , a I or a o. In order to quantity the influence each synapse has on the A/CP neuron. we assign
mput and output units). The reason is tl"•t 1· n w 11ch do not have any hidden units 1·ust
fu r H ~
nc ion. owever, it is the nonlinearity Ca mearfunct' · ff ' a weight w. to each input l;-
h l~n o mear functions is again a linear
that makes multilayer networks so pow ..r. J.leA., t e capaq1lity to representi1on-linear functions) (i ii) The weights are decimal numbers, and the size of the weigh~corre~~onds t? the
· · I o 1 the i\lCP neuron. Pos1t1ve weights
for 1
. non rmear function does the job, except
po ynom1als. The differential active ru • "lmostany
• amount of influence the associated connec11on ias 1
11
• • •
. d ne"ative wciolits correspond to 111h1b1tory synapses.
bounded·• the sigrno1d
· · functions such asa ion. •uncllons are usefi I fi
I
· function which is correspond to excitatory synapses, an d' "' weight and"' these weighted
. ·mputs arc then ad<l ~-d
u or solv111g
com mo11 I1 ·
c 0 1ccs. Functions such as ta
1ogi5tic and tanh and ti G
nIl or arctan
·
le auss1an function are the most
. that prod uce both pos1t1ve
. . and negative.
0
Each input is multiplied by the correspon 111"' • - A
together to produce ~
7,,, Semester, Solved p_apers, Feb -2022 11
I
9fi!CI
10 Neu ml Networks

Sum = L 1,11~
l=I
...(i)
y = {(sum) ... (ii)
where w1 * Ii means 11•i multiplied by /i. This corresponds to the summing of the
incoming signals that is assumed to be performed by a neuron. Finally, there is a threshold r
,, hich is a decimal number that is compared to the weighted sum ofthe signals. This correspond~
to the apparent threshold value that the sum ofthe received signals must exceed in order to
cause a neuron to .. fire''. If 11 * w 1 + w 2 * !2 + .......w * I > = T i e w * / + w * I +
11 11 ' • • I I 2 2 Fig. : (I)
....... w,, * 111 is greater than or equal to T. then the MCP neuron produces an output of f. If w
* ' 1 + 11·2 * ' 2 + ......w11 * /11 < T. i.e. 11• * / + w * 1 + ...... w,, * 1 is less than T. then th! A McCulloch-Pitts neuron to implementANDfunction is shown in fig.( 1).The threshold
1 1 2 2 11
AICP neuron produces a 0. So an MCP neuron is completely determined by ils weights and of the neuron is 2. The output ofthe neuron Y can be written as Y= ( (yi11) .
threshold. By choosing various combinations of weights and threshold, we can produce several Reason for choosing threshold= 2 is explained below:
different MCP neurons. The net input is given by Y;,, =~weight* input
Since inputs are 1 , 12 and the weights are 1 and 1 respectively. therefore
Wl'ighls 1
Input Yin = 1 * 11 + 1 * 12 = 11 + 12
w, From this the activation fucntions of the output neuron can be formed.
I
Y = {(yin) = I if Yin :::_ 2
I, Wi 0 if yin < 2
where

I., W3
Sum

[+ ]
ThresholdT
Output
► y
Now if
(a)
and
I 1 -1-
1
- 2- 1
w = w = 1
2 then Y;11 = 2
If Th ,t. 2 then answer can not be obtained both inputs= I. Threshold value can be
chosen according to the applications. •
and Y = ( (yin) = 1
(b) lf 11 = I
Fig.(a) : Symbolic Illustration of linear threshold gate
and 12 = 0 then Yin = 1
and y = ((yi 11) =0. Since Yin = 1 < 2
AND logic function : The truth table the for AND gate is Similarly for other combinations. .
XOR Funeation: The truth table the for XOR gate 15
Y(Output)
x, X2 Y(Output)
0 0 0
0 I I 0
I 0
I 0 1
0 0
0 I 1
0 0 0
·ter
12
Neural Networks / ,;1h
I
sem es ' Sol\;ed papers, Feb -2022 13
gJec E : If the neuron output is I and shou
3 ld have been 0 (a = I and d = 0. and e =
CASI e input vector p is subtrac ted
J,O~- l_),
,nt a
t;
art
from the weight vector w. TI1is makes the
her away from the input vector, increasi \\eight
ng the chance that the input veclor is
1,eetor Po . the future. The perceptron Iearn .
•r. dns a O,n mg ru Ie can be wnt. ten more conc.iseIy .
c1ass11 1e , = d _ a and the cha nge to be made 111
5 0
f the error ~ · to the weight vector ir:
terrn
cas e 1 . 1f e = o' then make a change w equal to 0.
.
XOR FUNCTION
Fig. : (2) c •,sc 2 .. 11·,,... = t ' then make a change w equal to pD.
' • f e = _ 1. then make a change w
Cas e 3 · 1 .
eqt1al to - pD.
XO R can be modeled using AND
NOT and ases can then be writ . . I .
OR ; All twe
I eC ten with a smg e expression:
xi ,'(OR X2 = (X, AND NOT X2) OR (Xi AN w =(d - a) pD =epD
This explains the network sho wn abov
D NO T x,) ...(1)
.
e. The first laye r performs the two AND y 0 uca nge t t\1
e exp ress ion for chanoe s in a neuron's bias by noting that the bias
and the second layer perfonns the OR. NOT's · 0
is
Both Z neurons and the Y neuron have . a weight that always has an input of I
a threshold of2. :
simply b = (d- a) ( l)= e
Q.5 . Explain pe1·ccp tron learnin ...(ii)
g rule in detail. for the case of a layer of neurons" e
Ans. Pcrccptron Learning Ruic: (15) hav~
w = (d - a) ( p)° = e(p ) ...(iii)
I . Perceptrons are trained on exam
ples of des ired behavior. b =(d -a) =e
2. This rule is for supervised learning ...(iv)
mode. The pcrceptron \earning rule can be sum
3. The desired behavior can be summar marized as follows :
ized by a set of input, output pairs whe wnew= w old +e( p)D
input to the network and dis the corr re pis an ...(v)
esponding correct (target) output. The
reduce the error e, which is the differen obje ctive is to b11ew = bold + e ...(vi)
ce between the neuron responses aand
veclor d. The perceptron learning rule the target e= (d -a)
/eam p calculates desired changes to
weights and biases give n an input vect lhe perc eplron's
or p, and the associated en-o re. The targ
contain values of either Oor I, as perc et veclor / must Scc tion -C
eptrons (with hardlimiting transfer func
output such values. tions) can only
Each time lea mp is executed the perceptr Q 6 ( ) O"ffercntiate between sup . d nd unsupen·iscd learning. (8)
on has better chance of producing the . . a
ervise a .
outpulS. The perceptron nilc is proven correct , be perv iscd and unsu
to converge on a solution in a finite num Ans. Difference tween su pervised learn mg:
ifa solution exislS. lfbias is not used lear1 ber of"ilerations
1pworks to find a solution by alte ring only Superv ised Learning Unsupervised Learning
vector w to point toward input vectors the weight
to be classified as I, and away from Supervised \earning algorithms are train Unsupervised learning algorithms are train
classified as 0. This results in a decision vectors to be ed ed
boundary that is perpendicular to ir. and usin
classifies the input vectors. There are which properly using \abellec\ data. g unla belle d data.
three conditions that can occur for a sing
an input vector pis presented and the le neuron once Uns uper vised \earning model does not take
network's response a is calculated: Supervised learning model takes ct·,rect
CASE 1 : If an input vector is presente feedback to check if it is predicting corr any feedback.
d and the output of the neuron is corr ect
(a = d. and e = d - a = 0), then
the weight vector w is not altered. ect
output or not.
CASE 2 : lflhe neuron output is Oand Unsupervised learning model finds the
should have been t (a= Oand d = I, Supervised \earning model predicts th
d - a = I), the input vector pis add and e = e
ed to the weight vector w. Thi s makes hidden patterns in data .
point closer to the input vector, increasi the weight vector output.
ng the chance that the input vector will In unsupervised leaming,. onl) input data
a I in the future. be classified as . is
, mpu1 data is provided
In supervised \earnmg •
to the model along with the ot1tput. prov ided to the mod el. -
17
,,, Semester· Solved papers, Feb -2022 l:i
B·rec
I
14 Neural Networks .
Wht 1e t111 .

·s batching provides •
computation effi1c1.ency, .1t can sh·1thave a long processinu
. . . "'
r: large trat ·ning datasets as 11 still needs to store all of the data 11110 memory. 13atch
The goal ofsupervised learni ng is to train the The goal of unsupervised le:-irning is to find
ti1ne ior t lso usually produces a stable error gradient and convergence. but sometimes
model so that it can predict the output when the hidden patterns and useful insights from . tdescen a . . .
gradien _ oint isn't the most ideal, fi nding the local minimum versus the global one.
it is given new data. the unknown dataset. vergence P . . ..
1hatcon(?)Stocws / ,,·c t',,r,·c,,lient de~cent

·• Stochastic gradient descen1
• •
(SGD) runs a tra1n1111c-
Supervised learning needs supervision to Unsupervised learning does 1101 need any - , pie withi n the dataset and it updates
train the model. supervision to train the model. I1for each exam . . each tra1111ng example's . parameters.
epOC . s·
ta tune. mce y
ou only need to hold one train mg example. they arc c:as1cr to store 111
. . . .
Supervised learning can be categorized in Unsupervised Learning can h.: cl.issificd in one a Wlule . t11ese r:·equent
11 updates can offer more detail and speed. 1t can result 111 losses 111
e rt-1c1ency
• when compared to batch gradient descent. Its frequent
C/assifica1ion and Regression problems. Clustering and Assvciutio11.,· problems. rneinory. . updates
tatt0nal . . . can
.
Supervised learning can be used for those Unsupervised learning can be used for those cornpu . . d. •
result in noisy gra ,ents, u b t this can also be helpful in escaping the local mm1mumand l111d1ng
cases wl,ere we know the input as well as cases where we have.only input tbta and no
corresponding outputs. corresponding out put datn. theglobal one: . / / . d1·e,11 descent . Mini-batch gradient descent combines concepls
(3) !v/1111- JOIC I g 1 Cl · • . . . . ,
Supervised lenrning model produces an I
both bate 1gra d ·1ent descent and stochastic gradient descent. It splits the trainmg d.itasct
·k
Unsupe1vised lcaming model may give less .
accurme result. accurate result as comp:ired to supervised
frorn
. all bateI1sizes
~. an d perr:orms
11 updates
' on each of those batches. TlllS approach stn ·cs af
leaming. into sm I , tational efficiency of batch gradient descent and the speed o
balance between t 1e compu '
Supervised learning is not close to true Unsupervised learning is more close to the stochastic gradient descent.
Artificial intelligence as in this, we first train trueArtilicial Intelligence as it learns similarly
the model for each data, and then only it can 7 (a) Ex1)la in Error Back propagation algorithm in detail. . ~9)
as a child learns daily routine things by this Q. · • · p paoation Trainino aloorilhm: The layer nel\\Ork 1s mapping
predict the correct oLttput. experiences. Ans. F,rror 8 ac1,- ro • e ' • e b

It includes various algorithms such as Linear It includes various algorithms such as the input vector: into the output vectoro as follows :
Regression, Logistic Regression, Suppo1t o = N(:)
Clustering, KNN. and Apriori algorithm. where N denotes a composite nonlinear matrix operat.or. For the two-layer net the
Vector Machine, Multi-class Classification,
Decision tree, Bayesian Logic, etc. mapping ---' o can
. z-, , be represented as a mapping within a mapping. or
o = G[Wf[Vz]]
Q.6.(b) Explain Gradient Descent in detail. (7) f[ Jlz) = Y d t
. a in - ➔ y. Note that the right arrows eno c
Ans. Gradient descent is an optimization algorithm which is commonly-used to train and it related to the hidden layer m, PP g _. . rformcd by a sin••lc-laycrofthe
machi ne learning models and neural networks. Training data helps these models le:irn over mapping of one space into another. Each of the n~appmg is pe "'
time, and the cost function within gradient descent specifically acts as a barometer, gauging its r .
layered network. J.he operator is a non r1 near d1aoonal
"' operator.
accuracy with each iteration of parameter updates. Until the function is close to or equal to Algorithm :
zero, the model will continue to adjust its parameters to yield the smallest possible error. Once . · Given are P training pairs d }
2 p Note thal the /th component o r
'.nacl~ine learning m?dels are optimized for accuracy, they can be powerful tools for artificial ', d "-, d2 ·······, zp, fl' '
\ - I' 1, - _, '. -
intcJltgencc (A I) and computer science applications. d · (K x 1) and 1- I, , ···· · I
Where z. is(/ x I), ; is ' ted Size J-1of the hickk n aycr
. Types of Gradient Descent : There are three types ofgradient descent learning , . tors have been augmen . I
each x.is ofvalue-1 sinceinputvec f is ofvaluc- 1. since hicldcn a) cr
algonthrns: batch gradient descent, stochastic gradient descent and mini-batch gradient descent. ..' . N that the Jth component o y
havmg outputs y 1s selected. ot\d. is (J x 1) and O is (K x I).
. . {I) Botch gradie/1( desce/1( : Oatch gradient descent sums the error for each point in :i outputs have also been augment 'Y
training set, updating the model only afler all training examples have been evaluated. This process o £ chosen IJ' · (K x J) Vis (J"' /).
referred to as a training epoch. Step 1 : 11 > · ma-.;_ . . . d I mall randomval ues: 1s · ~
Weiohts Wand Yare 1mt1ahzc a s
0
,h Semester. Solved papers, Feb -2022
16 0 ~7 TI
Nert ml Nchvorks
Step 4 : Error signal vectors 60 and 6Yof the both layers are computed.
q <;- 1, p<;- J, £<;-0 Vector 6 0 is (K x I ), 6Y is (J x I ).
Step 2 : Training step starts here. Theerror s igna l terms o f the output layer in this step are
Input is pre·sented and the layers output computed. I •
z <;- zp. d <;- '!lµ 60 k= 2(t!. - o.)( l - oi). for k = l. 2. ..... K
)Ii <;- f (v/ z), for ; = I , 2. ...... .I The error s ig nal terms of the hidden layer in this step are
where v;, a column vector is the .Ith row of JI and l I , ~ .
ok <;- ( (w/ y), for k = 1, 2, ...... K 6 ,. = :, ( - Yj l L.., b,,• "'t,. for j = I, 2, ..... ./
\ I - k= I
where w k, a column vector, is the kth row of IV. Step S : Output layer weights are adjusted :
w kj ~ 111 kj + T160 kY;. fo r k = I , 2, .... Kand i= 1. 2, .... , J
Step I Initialize weight \V. V Step 6: I lidden layer weights are adj usted
v;; ~vii + ri6>/ i. fo r i = I. 2..... J and i = I . 2. ...., /
Begin ofa new trni ning cycle 13cgin ofa new traini ng slo:p
Step 7 : I f p < P then p ~ P + I. q'.-q+ I , and go to Step 2: otherwise, go to Step 8.
Sul.H1111 p.1111.·rn 7. and
Step 8: T he training cycle is completed.
Step 2 compute layers responses For £ < £ max te rminate the training session. Output weigh~ I~, V, q and£.
yu nv,1
o = fl lYz[. If£ > Emax then £ ~ 0, p ~ I, and initiate the new training cycle by going to Step 2.
Q.7.(b) Write short note on Delta Learning rule. (6)
Step 3 Compu te cycle error
E- E+ l/2lld-olli Ans. De lla learnin g rule: Delta learning rule is only valid for continuous ac1ivation
functions. This rule is used in the supervised training mode.
Calculate error 6,., 6,
The learnirig s ignal forth is rule is called delta and is defined as follows :
S1cp4 o.,= 112i(d,- o,)( l - o/ )' I r : [ d;-J(wfx)] f '(iV;x) ...(i)
6,=w,'d);.(4.28)
1: =f 1- y,' ]•
The term j'(w: x) is the derivative of the activation function ((net) com pu1ed for
net = IV,.,
The explanation of the delta learning rule is shown in diagram. This learning rule can be
A<ljus1 w-:igh ts of 0 111put layer .
w-w+11o..y' readily deri ved from the condition of least squared error between o, and d,. Calculating 1he
gradient vector with respect to W1 of the squared error defined as

Step 6 ,ldjusl wdgh1s o f hidden layer · E = 2I (d; - o; )2 ...(ii)


v-v+110,z'
which is equivalent to

...(iii)
--- gradient vector value
Error
y
"vE = -{d; - /(ii; x))j'(wf x)x ...(iv)
Step 3: Error value is computed The components ofthe gradient vector are
I
2
a£ ...(v)
E <;- 2(dk - ok) + E, fo r k = I . 2. .....,K

A
[J. fech 7'" Se111este1; Solved papers. Feb -2022 19
18_ Neural Networks
Since the minimization of the error requires the weight changes to be in the
negativegradient direction, we 1ake
MV, = - 17v' E ...(vi)
where 11 is a +ve equation from (v) and (vi)
t.W, = 11[ d; - f(wf x)]J'<wf x)x ... (vii)
input Recall of Presented Recall of
or, for the single weight the adjustment becomes pattern Associated Dlstortod Associated
presonted Pattern Pattern
t.w,1 = 'l')(d, - o) (' (net,) x,, ...(viii) Pattern
for ; = 1. 2 . ..... 11 Hetero-associa\lve memory
Auto-associative memory
This delta rule was introduced by McClelland and Rumelhart in 1986. This rule parallels
the discrete perceptron training rule. Q.8.(h) \Vritc storage anti Retrieval algorithm for associative memory. (8)
This rule is also called as continuous perceptron training rule. Ans. Storage and Retrieval Algorithm: Storage algorithm is called encoding and
the retrieval algorithm is called decoding process.
x,
Conti nuous Give n : P bipolar binary vector {S( 1l, Sl21, ••• , SI'"}
where S is II x 1 form = I, 2 ...., P
111

Initializing vector J/ 0 is 11 x I .
;:------➔ O,
StorageAlgorithm:
Step t : Weight matri x Wis (11 x 11) : IV~ 0, m ~ 1
x, Step 2 : Vector 8 11 is stored. Therefore the storage algorithm for calculating the weight
matrix is
JV ~ S1111"i S'-1111 ' - I
cl,
p
x, " s(m)s(m) .
or w = (1- SiJ ) L, ; j [S,, ~ Kronecker funct1011)
lj //l=I
Fig. Step 3 : If 111 < P then m ~ m + I and go to step 2, otherwise go to step 4.'
Step ~ : Storage is complete. output vector W.
Section -D Note : 1f instead of bipolar binary vectors. unipolar binary vectors S"' form = I, 2. ....
Q.8.(a) Explain different type ofAssociative memories in de tail with cxample.(7) Pare used then entries of original unipolar vector S'" are replaced by 2..,Y") - 1 where i = I. 2,
Ans. There are two classes ofassociative memory ..., n
- Auto associative Memory I'
- Hetero associative Memory w ..
= (I -Su) L (2Sf 111> - 1) (2S;"') -1)
I)
m=I
. ~1) Auto Associative Memory: An auto-associative memory, also know as auto-
associative correlator, is used to retrieve a previously stored pattern that most closely resembles Rctrival Algoritfim: .. . . . . .
the current pattern. ( 1) Cycle counter K is initialilzed. K ~ I. Within the cycle counter 11s 1111tiahzecl , ~ I

(2) ~~tero Associa_tive Memory : A hetero-associative memory. also known as and network is initialized as V ~ 0'.
(2) 1nteoers 1, 2, .... 11 are arranoed in an ordered random sequences a 1• a:· ... , a,,. If
hetcro-~soc1at1 ve correlator, 1s used to rerieve pattern in general, different from the input pattern 0
, o .

not only m content but possibly also different in type and fom,at.
• •
randomized sequence ofupdat1on ts not use 1 d 11 en a 1
= I •
a :
= 2 ,
...• a ,,,
= n or a ,
= , at each
update cycle.
8. Tech 7'" Semeste,; Solved papers, Feb -2022
21
(3) Neuron ; is updated by computing JI'''" a,
II
b,o
neI ai = Lwa,}lj
j =I
Vnew a; = sgn (net ) [where net = ,ifJ.:v k ]
a,
1
(4) If;< n then ; f-; + I and goto step 3, otherwise go to step S. b/'
(5) If V,,ewa·I = V0 . for i = 1. 2.... , II
I •
=> no updation and retrieval is complete output is K and V,,c,l'J , 1111ew 2 • 11"""'"
otherwise K f- K + I. go to step 2.

Q.9. Explain bidirect ional associat ive memory arhitcct urc, its associat I) ~ " b ,.,
encodin g and decodin g in detail.
ion
(15)
_ _.
.__ 'lt---i~!: ,. •

Ans. Bidirect io nal Associa tive memory : Bidirectional associative memory


is a (w). layer A
heteroassociative, contant-addressable memory consistin g oftwo layers. It uses the (w'), layer A
forward
and background infonnation flow to produce an associative search for stored stimulus-response Fig.: Discrete -time bidirectional associat ive _m emory expande d diagram
.
associat ion (kosko t987, 1988). Conside r that stored in the memory are p vector associati
on Suppose one of the stored pattern, a(m' ), is presented to the memory. The retrieval
pairs known as
proceeds as follows

...(i) b = r[f1(b(111)a(m)r)a<u1')]
...(iv)
When the memory neurons are activated, the ne~vork evolves to a stable state of two which further reduces to
pattern reverberation, each pattern at output ofone layer. The stable reverberation correspo
to a local energy minimum. The network' s dynamics involves two layers of interaction.
the memory processes information in time and involves bidirectional data now, it
nds
Becuase b = r[11b<m') + ±
m.-m
a<m'>]
.b(m)a(mlr
...(v)
differs in
principle from a linear associator, although both networks are used to store assocatio The netb vector inside brackets in Equation (v) contains a signle tenn nb(m") additive
n pairs. It with the noise tenn Tl of value
also differs from the recurrent autoassociative memory in its update mode.
Associa tion Encodin g and Decodin g : The coding of information (i)
bidirccticinal associative memory is done using the customary outer product rule, or
into the T) = f
m~m'
b("'l(a<m)ra<m'))
...(vi)
by adding
p cross-correlation matrices . The formula for the weight matrix is Assuming temporarily the orthogonality ofstored patterns a<111 ), for 111 = 1. 2, .... p, the
noise the tenn T) reduces to zero. Therefore, immediate stabilization and exact associati
p . on b =
W = }>(l)b(, ), ...(ii) b(m') occurs within only a single pass though layer 8. If the input vector is a distorted
version
i=I of pattern a(m'), the stabilization at b(m') is not imminent, howeve r, and de pends
where a<i) and b(i) are bipolar binary vector, which are membars of the i'th pair. on many
factors such as the HD between the key vector and prototype vectors. as well
yielding the following weight values: as on the
orthogo nality or HD between vectors bli), for i = I. 2, ..... p.
_ ..[!.... _(m)b _(111> To gain better insight into the memory perfom,ance, let us look at the noise terrnT\ as
w .. - LI 0
J in
I) m=l ...(iii) (vi) as a function of HD between the stored prototypes a<m), for 111 = I. 2
...... p . Note th:it
22 N e11m/ Netwo]'ks B.Tech 7"' Se111esle1; Solved papers, Feb -2022
23
two vectors containging± elements are orthogonal if and only if they ditTer is exactly 11/2 bits.
*
Therefore. HD (a( 111 l, i 111 ')) = n/2 for 111 = 1. 2 ...... p. 111 111', then l') = 0 an perfect retrieve! > 0

t:
m
in a single pass is guaranteed.
l!.a, for L IV;ii = 0
If 0< 111 >, for 111 = 1, 2, ... ,p and the input vector i 111') are somewhat similar so that HD i = l. 2, ... , 11 /=I ... [(x)a]
(a< 111 ), a(111')) < n/2. for m = 1. 2, ..... , p. 111 ~ 111' . the scalar products in paratheses in Equation < 0
(1•i) tend to be positive, and a positive contribution to the entries of the noise vector l') is likely
and for the outputs oflayer 8 they result as
to occur. For this to hold, we need to assume the statistical independence of vectors 1J( 111 ), for
111 = I , 2,... p. Pattern b(m') thus tends to be positively amplified in proportion to the similarity

between prototype patterns al111 ) and 0<111 ' · If the pattern are dissimilar rather than similar and
the HD value is above 11/2, then the negative contribution in parentheses in equ.(v,) are negatively
amplifying thepattern b(m "). Thus,acomplement -b(m')
conditions described.
may resul t under the
tJ./•;
j = 1.2 .... ,11 = l:
-2
Ill

for L W;Ja;
i=I
> 0

= O

< 0
... [(x)b J

The gradients of energy (ix) with respect to q and b can be computed. respectively, as
Stability Considerations : Let us look at the stability of updates within the bidirectional 'vaE(a,b) =- IVb
... [(xi)a]
accociative memory. We know in te1111S ofa recursive update mechanism, the restrieval consists
'v bE(a ,b) =- W1a
of the following steps: ... [(xi)b]
l11e bitwise update expression (x) translate into the following energy changes due to
First Forward Pass :
the single bit incre~ents 6.a; anq Mi:
First Backward Pass : b r[w'
2
,=
1
a ]

Second Forward Pass : a = r[wb ]


3 2

...(vii)
6.Ea;(a,b) = -(f J =l
wijb1)6.a1,for i = 1,2...., n
... [(xii)a]

k I 2' th Backward Pass : bk = r[IV ak- t]


1
6.Ebj(a,b) = - ~ wija;
(
Ill )
M.i, for j = 1,2 • ... , 111
r=l ...[(xii)b]
As the updates in (vii) continue and the memory comes to its equilibri um at the k' th
step. we have ak ➔ bk + 1 ➔ ak + 2, and cl+ 2 = ak. In such a case, the memory is said lo Inspecting the right sides of equations (xii) and comparing them with the ordinary
be bidirectionally stable. This corresponds to the energy fu nction reaching one of its minima
update rules as in (x) lead to be consculsion that 11£::: 0. As with recurrent autoassociative
after which any further decrease of its value is im possible. Let us propose the energy fucntion
for minimiz.ation by this system in transition as memoty, the energy changes are nonpositive. Since Eis a bounded function from below according
6 I I to the following inequality :
E(a b) = --aI Wb--b 1W1a ...(viii)
' 2 2 II m
The reader may easily verify that this expression reduces to E(a, b) :::_ LL\wu\
i=I J=I
E(a, b) = -a1Wb ... (ix)
then the memory converages to a stable point. The point is a local minimum of the
Let us evaluate the energy changes during a single pattern recall. The summary of
thresholding bit updates for the outputs oflayer A can be energy function, and the memory is said to be bidirectionally stable. Moreover, no restrictions
24 Neural Networks
')
exits regarding the choice of matrix W, so any arbitrary real n x 111 matrix w ill result in

bidirectionally stable 111e 1110 1y.

You might also like