You are on page 1of 74

Unit I

Architecture of Neural Network


Introduction
 Deep Learning is the most exciting and powerful branch of
Machine Learning. It's a technique that teaches computers to do
what comes naturally to humans: learn by example.

 Deep learning is a key technology behind driverless cars,


enabling them to recognize a stop sign. It is the key to voice
control in consumer devices like phones, tablets, TVs, and
hands-free speakers.

 In deep learning, a computer model learns to perform


classification tasks directly from images, text, or sound.

 Deep Learning models can be used for a variety of complex


tasks:
Introduction
1) Artificial Neural Networks(ANN) for clustering, classification
& pattern recognition.
2) Convolutional Neural Networks(CNN) for image classification.
3) Recurrent Neural Networks(RNN) for Time Series analysis

 In this unit, we’ll try to cover everything related to Artificial


Neural Networks or ANN.

 The human brain is much more complex and unfortunately,


many of its cognitive functions are still not well known. But the
more we learn about the human brain, the better computational
models are developed and put to practical use.

 ANN tries to replicate and implement the most basis functions


of human brain.
Introduction
 The first mathematical model of Neuron was proposed by
McCulloch and Pitts in 1943.
Why Artificial Neural Networks?
There are two basic reasons why we are interested in building
artificial neural networks (ANNs):

Technical viewpoint: Some problems such as character recognition


or the prediction of future states of a system require massively
parallel and adaptive processing.

Biological viewpoint: ANNs can be used to replicate and simulate


components of the human (or animal) brain, thereby giving us insight
into natural information processing.
How do our brain work?
Question: Explain the working of biological neural network.
Dendrite: It receives signals from other
neurons.

Soma (cell body): It sums all the incoming


signals to generate input..

Axon: When the sum reaches a threshold


value, the neuron fires, and the signal
travels down the axon to the other neurons.

Synapses: The point of interconnection of


one neuron with other neurons. The
amount of signal transmitted depends
upon the strength (synaptic weights) of the
connections.
Artificial Neural Networks
Artificial neuron also known as perceptron is the basic unit of
the neural network. In simple terms, it is a mathematical
function based on a model of biological neurons. It can also
be seen as a simple logic gate with binary outputs. They are
sometimes also called perceptrons.

Each artificial neuron has the following main functions:

1) Takes inputs from the input layer

2) Weighs them separately and sums them up

3) Pass this sum through a nonlinear function to produce


output.
Definition: Artificial Neural Networks
An artificial neural network consists of a pool of simple
processing units which communicate by sending signals to each
other over a large number of weighted connections.

Correlation between Biological & Artificial Neuron


Biological Neuron Artificial Neuron
Neuron Neuron / Node
Dendrites Input
Synapse Input Weight
Soma (or cell body) Net Input
Axon Output
Model of an Artificial Neuron
Question: Explain the model of Artificial Neural Network
Consider simple artificial neural network,

 Here 𝑥1 , 𝑥2 , … 𝑥𝑛 are the 𝑛 inputs to the artificial neuron

 𝑤1 , 𝑤2 , … 𝑤𝑛 are the weights attached to the input links

[In case of Biological Neuron, all inputs received by Dendrites, sums


them and produce output if the sum is greater than threshold value. The
input signal is travelled through axon and passed through synapse
which may accelerate or retard the input signal]
Model of an Artificial Neuron
 In this case, the acceleration and retardation of the input signals is
modelled by the weights.

 The net input 𝐼 received at soma(cell body) of the artificial neuron is


𝑛

𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=1

 To generate final output 𝑦, the sum is passed through Activation


function or Transfer function or squash function which release the
output.
𝑦=𝜑 𝑧
Architecture of Artificial Neural Network
Question: Explain the architecture of Artificial Neural Network
Consider simple artificial neural network,

 Artificial Neural Networks can be viewed as weighted directed graphs


in which artificial neurons are nodes, and directed edges with weights
are connections between neuron outputs and neuron inputs.

 The Artificial Neural Network receives information from the external


world in pattern and image in vector form.
Working of Artificial Neural Network
 Each input is multiplied by its corresponding weights. Weights are the
information used by the neural network to solve a problem. Typically
weight represents the strength of the interconnection between neurons
inside the Neural Network.

 The weighted inputs are all summed up inside the computing unit
(artificial neuron). In case the weighted sum is zero, bias is added to
make the output not- zero or to scale up the system response. Bias has
the weight and input always equal to ‘1'.

 The sum corresponds to any numerical value ranging from 0 to


infinity. To limit the response to arrive at the desired value, the
threshold value is set up. For this, the sum is forward through an
activation function.

 The activation function is set to the transfer function to get the desired
output. There are linear as well as the nonlinear activation function.
Important terms associated with model of Artificial Neuron
 Input Nodes: These are the nodes to which input signals are
applied. For example: 𝑥1 , 𝑥2 ,…𝑥𝑛 are input nodes.
 Output Nodes: These are the nodes from which final output is
taken. For example: 𝑦1 , 𝑦2 ,…𝑦𝑛 are output nodes.
 Weights: Each neuron in the network is interconnected with other
neurons by direct communication links. Each link is associated
with a weight, which contains information about the signal
flowing through the link.
 Hidden Layers: These are the nodes that lie in between input and
output node layers.
 Bias: The bias is a constant signal value which is added o a
neuron. It is just like another weighted link with a constant signal
value generally taken as 1.
 Activation function: Each neuron or node has an activation
function associated with it and determines the input and output
relationship for the neuron. It can be linear or non linear.
Differences between ANN and NN
Question: Explain the difference between ANN & BNN
ANN BNN

It is short for Artificial Neural Network. It is short for Biological Neural Network.

Processing speed is fast as compared to Biological


Neural Network. They are slow in processing information.

Allocation for Storage to a new process is strictly


irreplaceable as the old location is saved for the Allocation for storage to a new process is easy as it is
previous process. added just by adjusting the interconnection strengths.

Processes operate in sequential mode. The process can operate in massive parallel operations.

If any information gets corrupted in the memory it Information is distributed into the network throughout into
cannot be retrieved. sub-nodes, even if it gets corrupted it can be retrieved.

The activities are continuously monitored by a control There is no control unit to monitor the information being
unit. processed into the network.
Bias
 Bias is a constant which helps the model in a way that it can fit best
for the given data.
 Each neuron has Bias.
 Bias increases flexibility of the model.
 In case of Bias 𝑏, net input is given by,
𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 + 𝑏
Bias
 Bias can also be added if net input is zero
For example:

Without Bias With Bias

𝑏=1
Importance of Activation Function
Question: Define Activation function and explain any three-activation function.

Definition: Activation function is an internal state of neuron used to


convert the input signal of node of ANN into an output signal.
 It is also called as Transfer or not.
function. It can also be attached
between two Neural Networks.  The main purpose is to convert
input signal of node into output
https://youtu.be/icZItWxw7AI
 Activation function are important signal. That output signal now is
for a ANN to learn and understand used as input in the next layer in
the complex pattern. the stack.

 The main function of it to  The non linear activation function


introduce non-linear properties will help the model to understand
into network. the complexity and give accurate
results.
 Activation function decides
whether to fire a particular neuron
Difficulties in ANN without a Activation Function
 If we do not apply a Activation function then output signal simply be
a simple linear function.

 A linear function is just polynomial of one degree. Linear equations


are easy to solve but they have limited in their complexity.

 A Neural Network without Activation function would simply be a


Linear regression Model which has limited power and does not
performs good most of the times.

 Also without activation function our neural network would not able
to learn and model other complicated kinds of data such as images,
videos, audio, speech etc.
Desirable features of an activation function
 Vanishing Gradient problem: If slope of tangent is zero then
learning become slow and it is called as vanishing gradient.

 Zero-Centered: Output of the activation function should be


symmetrical at zero so that the gradients do not shift to a particular
direction.

 Computational Expense: Activation functions are applied after every


layer and need to be calculated millions of times in deep networks.
Hence, they should be computationally inexpensive to calculate.

 Differentiable: As mentioned, neural networks are trained using the


gradient descent process, hence the layers in the model need to
differentiable or at least differentiable in parts. This is a necessary
requirement for a function to work as activation function layer.
Important Activation Functions
1) Step function (or Heaviside function / Thresholding function)

1 𝑖𝑓 𝑧 > 𝜃
𝜑 𝑧 =ቊ
0 𝑖𝑓 𝑧 ≤ 𝜃
where 𝜃 is called threshold value Vanishing Gradient

In this case our output signal is


either 1 or 0 resulting in the
neuron being on or
off.
Important Activation Functions
Sigmoidal function

1
𝜑 𝑧 =
1 + 𝑒 −𝛼𝑧
Vanishing Gradient

In this case output signal


varies in between 0 and 1
Important Activation Functions
Hyperbolic tangent function

𝑒 𝑧 − 𝑒 −𝑧
𝜑 𝑧 = 𝑡𝑎𝑛ℎ 𝑧 = 𝑧
𝑒 + 𝑒𝑧
Vanishing Gradient

In this case output signal


varies in between -1 and 1
Important Activation Functions
ReLU function

𝑅𝑒𝐿𝑈 𝑧 = max(0, 𝑧)

Vanishing Gradient
Important Activation Functions
Leaky ReLU function

Leaky ReLU(z)= max(0.1𝑧, 𝑧)


Vanishing Gradient Problem
Question: What is vanishing gradient problem? Explain one activation
function, which is used to avoid this problem.
[Hint: Consider that one activation function as ReLU function explained on next
slide]
Vanishing Gradient problem:
 Neural Networks are trained using the process gradient descent.
 The gradient descent consists of the backward propagation step which is
basically chain rule to get the change in weights in order to reduce the
loss after every iteration.
 If chain rule going through multiple layers while backpropagation. If the
value of Activation function is between 0 and 1, then several such values
will get multiplied to calculate the gradient of the initial layers. This
reduces the value of the gradient for the initial layers and those layers are
not able to learn properly.
 In other words, their gradients tend to vanish. This is called the vanishing
gradient problem.
 Graphically, If slope of tangent is zero it is called as vanishing gradient
problem.
Importance of ReLU function
Question: Why ReLU is the most used Activation Function?
 ReLU stands for Rectified Linear Unit and it
is defined as 𝑅𝑒𝐿𝑈 𝑧 = 𝑚𝑎𝑥 0, 𝑧 .

 The main advantage of using the ReLU


function over other activation functions is
that it does not activate all the neurons at the
same time it means neurons will only be
deactivated if the net input is less than zero.

 Computational Simplicity: The rectifier function is trivial to implement,


requiring only a max() function.
 Representational Sparsity: An important benefit of the rectifier function is that
it is capable of outputting a true zero value.

 Linear Behavior: A neural network is easier to optimize when its behavior is


linear or close to linear.
Important remark about Activation Functions
 For hidden layers if you are not sure which activation function to use,
just use ReLU as your default choice.
 If slope of tangent is zero then learning become slow and it is called
as vanishing gradient. Except Leaky ReLU all the functions are
having this vanishing gradient problem.
 Comparison between Step function and Sigmoid function using one
particular example

Result from Step function Result from sigmoid function


Types of Artificial Neural Network
Question: Explain the types Artificial Neural Network

Question: Explain Feed forward Neural Network.

Artificial neural networks are computational models that work


similarly to the functioning of a human nervous system.

There are several kinds of artificial neural networks based on the


mathematical operations and a set of parameters required to
determine the output.

There are 3 types of Artificial Neural Network


1. Feedforward Neural Network
2. Backpropagation Neural Network
3. Recurrent( Feedback) Network
Feedforward Neural Network

This neural network is one of the simplest forms of ANN, where


the data or the input travels in one direction.

This neural network may or may not have the hidden layers..

In simple words, it has a front propagated wave and no


backpropagation.

Example:
Consider Single layer feed-forward
network. Here, the net input is
calculated and fed to the output.
Feedforward Neural Network

Applications

1. Feedforward neural networks are used in computer vision and


speech recognition

2. These kind of Neural Networks are responsive to noisy data


and easy to maintain.

There are two types of Feed Forward Neural Network

1. Single Layer Feed Forward Neural Network

2. Multi Layer Feed Forward Neural Network


Single Layer Feed Forward Network
 In this type of network, we have
only two layers, i.e. input layer
and output layer but the input
layer does not count because no
computation is performed in this
layer.
 Output Layer is formed when
different weights are applied on
input nodes and the cumulative
effect per node is taken.
 After this, the neurons
collectively give the output layer
to compute the output signals.
Multilayer Feed Forward Network
 This network has a hidden
layer that is internal to the
network and has no direct
contact with the external
layer.
 The existence of one or
more hidden layers enables
the network to be
computationally stronger.
 There are no feedback
connections in which
outputs of the model are
fed back into itself.
Backpropagation Neural Network

 Backpropagation in neural network is a short form for


“backward propagation of errors.” It is a standard method of
training artificial neural networks.

 It is the method of fine-tuning the weights of a neural network


based on the error rate obtained in the iteration.

 Proper tuning of the weights allows you to reduce error rates


and make the model reliable by increasing its generalization.

 This method helps calculate the gradient of a loss function with


respect to all the weights in the network.
Recurrent or Feedback Network

 RNN or feedback neural network is the second kind of ANN


model, in which the outputs from neurons are used as feedback
to the neurons of the previous layer.

 In other words, the current output is considered as an input for


the next output.

 It is mainly used for dynamic information processing like time


series prediction, processing control, and so on.

 Hopfield network and perceptron with feedback are the popular


types of this network.
Practice Examples
Q.1 Consider single layer feed forward network,

having weights and activation function ,

Calculate what will be the output value 𝑦 of the unit for each of the following
input patterns:
Examples
Q.2 Consider single layer feed forward network having weights are 𝑤1 =
1, 𝑤2 = 1

and having activation function is:

a) Test how the neural AND function works.


b) Suggest how to change either the weights or the threshold level of this
single-unit in order to implement the logical OR function
c) Suggest how to change either the weights or the threshold level of this
single-unit in order to implement the logical XOR function
𝑓 𝑥, 𝑦 = 𝑥 + 𝑦 − 2𝑥𝑦
Examples
Q.2 Suppose we input the values 10, 30, 20 into the three
input units and activation function is sigmoidal function
with learning rate 𝛼 = 1. Then find the outputs.

Input units Hidden units Output units


Weighted Weighted
Unit Output Unit Output Unit Output
Sum Input Sum Input
I1 10 H1 7 0.999 O1 1.0996 0.750
I2 30 H2 -5 0.0067 O2 3.1047 0.957
I3 20
Supervised Learning, Unsupervised &
Reinforcement Learning
Question: Explain Supervised, Unsupervised and Reinforcement
Learning.
Supervised Learning
a) Happens in presence of Supervisor.
b) Output is already is known.
c) Machine is fed with lots of Input and Output dataset called as
Labelled Dataset.
d) Model is build is predict the outcomes.
e) It is fast learning mechanism with high learning.
f) It includes regression and classification problems.
g) Examples: SVM, KNN, ANN, Decision Trees
Supervised Learning, Unsupervised &
Reinforcement Learning
Unsupervised Learning

a) Happens without help of supervisor.


b) No output mapping with input
c) It contain unlabeled dataset
d) It is used to detect Hidden Patterns or association among data
items.
e) It is independent learning process.
f) Used in Clustering and Association Rule Algorithms
Supervised Learning, Unsupervised &
Reinforcement Learning
Reinforcement learning
a) Learns from feedback and past experience.
b) It is long term iterative process.
c) More feedback ⇒ More accurate system
d) It is also called as Markov Decision Process
e) Used in Robots training, Self driving car etc.
f) It includes clustering and Association Rule problems
Learning Rules
Question: Explain various learning rules with appropriate uses.
[Hint: Write theory of Hebb learning Rule, Perceptron learning Rule, Delta
Learning Rule & Competitive learning rules]
 As stated earlier, ANN is completely inspired by the way biological
nervous system, i.e. the human brain works. The most impressive
characteristic of the human brain is to learn, hence the same feature is
acquired by ANN.

 Learning rule or Learning process is a method or a mathematical


logic. It improves the Artificial Neural Network’s performance and
applies this rule over the network. Thus learning rules updates the
weights and bias levels of a network when a network simulates in a
specific data environment.

 Applying learning rule is an iterative process. It helps a neural


network to learn from the existing conditions and improve its
performance.
Some Important Rules
 We know that, to change the input and output behavior of ANN
we need to adjust weights of the ANN model. Hence the
methods which modifies the weights are called Learning rules,
which are simply algorithms or equations.

 Following are some learning rules for the neural network


a) Hebbian Learning Rule
b) Perceptron Learning Rule
c) Delta Learning Rule (Widrow−HoffRule)
d) Competitive Learning Rule (Winner−takes−all)
Hebbian Learning Rule
 The Hebbian rule was the first learning rule introduced by
Donald Hebb in 1949.
 It is a kind of feed-forward, unsupervised learning.

 Hebb learning rule:


a) If two neighbor neurons activated/deactivated at the same time
then the weight connecting these neurons increases.

b) If two neurons do not activated/deactivated at the same time


then weight connecting these neurons decreases.

 Hence connections between two neurons might be strengthened


if the neurons fire at the same time and might weaken if they fire
at different times.
Hebbian Learning Rule
 Step 1: Initially, the weights are set to zero, i.e. 𝑤𝑖 = 0 for all
inputs 𝑖 = 1 to 𝑛 and 𝑛 is the total number of input neurons.

 Step 2: The updated weight and bias by Hebb rule is given by,
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶𝒙𝒊 𝒚𝒕
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝒚𝒕
where 𝛼 is learning rate varying between 0 & 1
Change in weight
 Step 3: Repeat step 2 for each input vector

Note: It needs modification, if product of Input and Output have


same sign then weight keeps increasing.
Example
Q. Implement Bipolar AND function with bias using Hebb
Learning Rule. [Take 𝜶 = 𝟏]
Solution:
Example
Example
Example
Example
Example
Q.1 Classify the given two dimensional input pattern using Heb
Rule. Draw the network architecture.
[Given output of the pattern 1 is 1 and output of the pattern 2 is 1,
𝛼 = 1]

-1 -1 -1 -1 -1 -1
-1 1 1 1 -1 1
-1 -1 -1 1 -1 1
Pattern 1 Pattern 2

Solution:
Pattern 1 input can be written as,
x1   1 1 1 1 1 1 1 1 1
Example
Pattern 2 input can be written as,
x2   1 1 1 1 1 1 1 1 1
Consider initial weight as,

w0   0 0 0 0 0 0 0 0 0
By Hebb Rule,
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶(𝒙𝒊 𝒚)
w1  w0  1( x1T 1) (consider transpose so that matrix
product is possible)

w1   1 1 1 1 1 1 1 1 1
T

w2  w1  1 ( x2T  y2 )
Example
w2  w1  1( x2T  y2 )

w2   1 1 1 1 1 1 1 1 1 
T

1 1 1 1 1 1 1 1 1 1  1
T

w2   0 0 0 2 2 0 2 0 2

Hence the final network architecture is


Perceptron Learning Rule
 This rule is an error correcting the supervised learning algorithm
of single layer feedforward networks with linear activation
function, introduced by Rosenblatt.

 As being supervised in nature, to calculate the error, there would


be a comparison between the desired/target output and the actual
output. If there is any difference found, then a change must be
made to the weights of connection.
Perceptron Learning Rule
Mathematical Model:
 Let 𝑋 be input vector along with the target output vector 𝑡.

 Now the actual 𝑦 can be calculated by the following activation


function
1 𝑖𝑓 𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 > 𝜃
𝑦=ቊ
0 𝑖𝑓 𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 ≤ 𝜃
Where 𝜃 is the threshold

 The updating weight can be done in following two cases:


Case I: when 𝑦𝑡 ≠ 𝑦 then
𝑤𝑖 𝑛𝑒𝑤 = 𝑤𝑖 𝑜𝑙𝑑 + 𝛼𝑦𝑡 𝑥𝑖
𝑏 𝑛𝑒𝑤 = 𝑏 𝑜𝑙𝑑 + 𝛼𝑦𝑡
Case II: when 𝑦𝑡 = 𝑦 then
No change in weight
Example
Q. Implement Bipolar AND function with bias using Perceptron
Learning Rule. [Take 𝛼 = 1]
Solution:
Example
Example
Example
Example
Example
Delta Learning Rule
 It is developed by Widrow & Hoff

 It is also called as LMS (least mean square learning rule)

 It is used in supervised training model.

 It is independent of the activation function.

 Delta rule for single layer is update rule of single layer


perceptron learning rule.
Mathematical Model of Delta Learning Rule
𝑑𝐸
 ∆𝑤 = 𝛼 , 𝛼 − 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒
𝑑𝑤𝑖

where 𝐸 is the error between the target output 𝑦𝑡 and actual


output 𝑦 is
1
𝐸 = 𝑦𝑡 − 𝑦 2
2
Differentiating w.r.t. 𝑤𝑖
𝑑𝐸 𝑑𝐸 𝑑𝑦
= .
𝑑𝑤𝑖 𝑑𝑦 𝑑𝑤𝑖
𝑑𝐸
= − 𝑦𝑡 − 𝑦 𝜑′ 𝑧 𝑥𝑖
𝑑𝑤𝑖
∴ ∆𝑤 = 𝛼 𝑦𝑡 − 𝑦 𝜑′ 𝑧 𝑥𝑖
∆𝑤 = 𝛼 𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑎𝑐𝑡𝑢𝑎𝑙 . 𝑖𝑛𝑝𝑢𝑡
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚 𝝋′ 𝒛 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚 𝝋′ 𝒛
Mathematical Model of Delta Learning Rule
Case I: 𝜑 𝑧 = 𝑧
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚

1
Case II: y = 𝜑 𝑧 =
1+𝑒 −𝑧
𝑒 −𝑧
∴ 𝜑′ 𝑧 =
1 + 𝑒 −𝛼𝑧 2

1 1
∴𝜑 𝑧 = −𝑧
1−
1+𝑒 1 + 𝑒 −𝑧
∴ 𝜑′ 𝑧 = 𝑦 1 − 𝑦

𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚 𝒚 𝟏 − 𝒚 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒚𝒕 − 𝒚 𝒚 𝟏 − 𝒚
Examples on Delta Learning Rule
Q. Implement Bipolar OR function with bias using Delta Learning
Rule. Perform 1 Epoch [Take 𝛼 = 0.1, 𝜑 𝑧 = 𝑧]
Solution:
Q. Find weights of the following ANN model using Delta Learning
1
Rule [Take 𝛼 = 1, 𝜑 𝑧 = −𝑧 ]
1+𝑒

𝒙𝟏 𝒙𝟐 𝒚𝒕
1 0.167 0.93
0.83 0.583 0.57
1 0.916 0.3
Competitive Learning
 In competitive learning, the
neural network consists of
single layer output neurons.

 All the output neurons are


fully connected to the input
neurons.

 As the name suggests, here all


the output neurons compete
against each other for the right
to get fired or activated.
Winning Neuron
 For an output neuron to win the competition, its net input must
be maximum, among all the other output neurons.

 All the output neurons are fully connected to the input neurons.

 The output of the winning neuron is set to 1, while that of others


is set to 0.

Mathematically,
1 𝑖𝑓 𝑧𝑘 > 𝑧𝑗 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗
𝑦𝑘 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝑧𝑘 = net input of 𝑘 𝑡ℎ neuron
 This rule is called “Winner-takes all” because only the winning
neuron is updated and rest of the neurons are left unchanged.
Thank You

You might also like