You are on page 1of 136

Unit I

Artificial Neural Networks


1.1. Introduction, 1.2. Models of Neuron Network-Architectures –Knowledge representation,
Artificial Intelligence and Neural networks–1.3. Learning process-Error correction learning,
Hebbian learning –Competitive learning-Boltzman learning, supervised learning,
Unsupervised learning–Reinforcement learning-Learning tasks.

1. Neural Networks, Fuzzy Logic, and Genetic Algorithms : Synthesis and Applications,
Rajasekaran, S.; Vijayalakshmi Pai , G. A..
2. Internet Sources
Genetic
Algorithms (GA)

GA- FL
NN- GA

NN-FL-GA
Fuzzy Logic (FL)

NN- FL

Neural
Networks (NN)
Fundamentals of Neural Networks
What is Neural Net ?
• A neural net is an artificial representation of the human
brain tries that to simulate its learning process.
• An artificial neural network (ANN) is often called a "Neural
Network" or simply Neural Net (NN).
• Traditionally, the word neural network is referred to a
network of biological neurons in the nervous system that
process and transmit information.
• Artificial neural network is an interconnected group of
artificial neurons that uses a mathematical model or
computational model for information processing based on
a connectionist approach to computation.
• The artificial neural networks are made of
interconnecting artificial neurons which may share
some properties of biological neural networks.
• Artificial Neural network is a network of simple processing
elements (neurons) which can exhibit complex global
behavior, determined by the connections between the
processing elements and element parameters.
1. Introduction
Neural Computers mimic certain processing capabilities of the human
brain.
• Neural Computing is an information processing paradigm, inspired
by biological system, composed of a large number of highly
interconnected processing elements (neurons) working in unison to
solve specific problems.

• Artificial Neural Networks (ANNs), like people, learn by example.


• An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process.
• Learning in biological systems involves adjustments to
the synaptic connections that exist between the neurons.
• This is true of ANNs as well.
1.1. Why Neural Network
 Neural Networks follow a different paradigm for computing.

• The conventional computers are good for - fast arithmetic and


does what programmer programs, ask them to do.
• The conventional computers are not so good for - interacting
with noisy data or data from the environment, massive
parallelism, fault tolerance, and adapting to circumstances.
• The neural network systems help where we can not formulate
an algorithmic solution or where we can get lots of examples of
the behavior we require.
Inspiration from Neurobiology
 What are animals so good at?  What are computer not so
good at?
So good at Not so good at
Dealing with noisy data Dealing with noisy data
Dealing with unknown data Dealing with unknown data
Massive parallelism Massive parallelism
Fault tolerance Fault tolerance
Adapting to circumstances Adapting to circumstances
Knowledge storage Knowledge storage
…… ……
• Neural Networks follow different paradigm for computing.
 The von Neumann machines are based on the
processing/memory abstraction of human information
processing.
 The neural networks are based on the parallel
architecture of biological brains.

• Neural networks are a form of multiprocessor computer


system, with
 simple processing elements ,
 a high degree of interconnection,
 simple scalar messages, and
 adaptive interaction between elements.
1.2. Research History & Literature
• The history is relevant because for nearly two decades the future
of Neural network remained uncertain. McCulloch and Pitts (1943)
are generally recognized as the designers of the first neural
network. They combined many simple processing units together
that could lead to an overall increase in computational power. They
suggested many ideas like: a neuron has a threshold level and once
that level is reached the neuron fires. It is still the fundamental
way in which ANNs operate. The McCulloch and Pitts's network
had a fixed set of weights.
McCulloch-Pitts Neuron
It is very well known that the most fundamental unit of
deep neural networks is called an artificial
neuron/perceptron.
But the very first step towards the perceptron we use
today was taken in 1943 by McCulloch and Pitts, by
mimicking the functionality of a biological neuron.
The first computational model of a neuron was
proposed by Warren MuCulloch (neuroscientist) and
Walter Pitts (logician) in 1943.
The heart of the McCulloch and Pitts idea is that given
the all-or-none character of neural activity, the behavior of
the nervous system can be described by means
of propositional logic.
To understand this, let’s examine that main components of
biological neurons:
1. The cell body containing the nucleus and metabolic
machinery
2. An axon that transmit information via its synaptic
terminals, and
3. The dendrites that receive inputs from other neurons via
synapses.
McCulloch–Pitts neuron model
The main elements of the McCulloch-Pitts model can be
summarized as follow:
1. Neuron activation is binary. A neuron either fire or not-fire
2. For a neuron to fire, the weighted sum of inputs has to
be equal or larger than a predefined threshold
3. If one or more inputs are inhibitory the neuron will not
fire
4. It takes a fixed one time step for the signal to pass
through a link
5. Neither the structure nor the weights change over time
McCulloch and Pitts decided on this architecture based
on what it was known at the time about the function of
biological neurons.
Naturally, they also wanted to abstract away most details
and keep what they thought were the fundamentals
elements to represent computation in biological
neurons.
Mathematical Definition
McCulloch and Pitts developed a mathematical
formulation know as linear threshold gate, which
describes the activity of a single neuron with two
Type equation here.
states, firing or not-firing.
In its simplest form, the mathematical formulation is as
follows: Threshold

x1 w1 Net or y or Sum = xiwi


Net/ f(Net/
1 ≥
x2 w2 y

f(Net or y or Sum)=
yin/ yin/

0 ℎ"#$ %"
Sum Sum)
w3
x3
Aggregation
Where
x1,x2…..,xN are binary input values
W1,W2……..,WN are weights associated with each input
Sum is the weighted sum of inputs;
And T is a predefined threshold value for the neuron activation
• Figure shows a graphical representation of the McCulloch-
Pitts artificial neuron.
• An input is considered excitatory when its contribution to
the weighted sum is positive, for instance x1*W1=1*1=1
• whereas an input is considered inhibitory when its
contribution to the weighted sum is negative, for instance
x1*W1=1*-1=-1.
• If the value of Sum is >= T, the neuron fires, otherwise, it
does not.
Let’s Take a real-world example:
A bank wants to decide if it can sanction a loan or not. There
are 2 parameters to decide- Salary and Credit Score. So there
can be 4 scenarios to assess-
1.High Salary and Good Credit Score
2.High Salary and Bad Credit Score
3.Low Salary and Good Credit Score
4.Low Salary and Bad Credit Score
• Let X1 = 1 denote high salary and X1 = 0 denote Low salary
and X2 = 1 denote good credit score and X2 = 0 denote bad
credit score
• Let the threshold value be 2. The truth table is as follows
Cont.….,
The truth table shows when the loan
should be approved considering all the
varying scenarios. X1 X2 X1+X Loan
2 appr
In this case, the loan is approved only if oved
the salary is high and the credit score is 1 1 2 1
good. 1 0 1 0
The McCulloch Pitt's model of neuron 0 1 1 0
was mankind’s first attempt at 0 0 0 0
mimicking the human brain.
And it was a fairly simple one too. It’s
no surprise it had many limitations-
1.The model failed to capture and compute cases of non-
binary inputs. It was limited by its ability to compute
every case with 0 and 1 only.
2.The threshold had to be decided before hand and
needed manual computation instead of the model deciding
itself.
3.Linearly separable functions couldn’t be computed.
McCulloch Pitt's model of neuron was implemented by
using
1.AND Function truth table
2.XOR Function truth table
3.ANDNOT Function truth table
1.Implement AND function using McCulloch–Pitts neuron
x1
 The McCulloch-Pitts neuron was the earliest X1 W1=1
neural network discovered in 1943
 It is usually called M-P neuron y
 Since the firing of the Threshold ,the activation Y
function here is defend as
x2
X2 W2=1

f ( yin ) = { 1, if , yin ≥θ
0,if , yin <θ
X1
1
X2
1
Y
1
 The threshold value should satisfy the following 1 0 0
condition
θ > nw − p 0 1 0

 Consider the truth table for AND function 0 0 0


 The M-P neuron has no particular training
algorithm
(1,1) , yin = x1w1 + x2 w2 = 1×1 + 1×1 = 2
 In M-P neuron only analysis is being performed. (1, 0 ) , yin = x1w1 + x2 w2 = 1×1 + 0 ×1 = 1
 Hence, assume the weights be W1=1 ;and W2=1
( 0,1) , yin = x1w1 + x2 w2 = 0 ×1 + 1×1 = 1
( 0, 0 ) , yin = x1w1 + x2 w2 = 0 ×1 + 0 ×1 = 0
• Cont.…,
• Threshold value is set equal to 2 i.e., (θ = 2 )
• This can be also be obtained by θ ≥ nw − p
• Here , n=2,w=1(excitatory weights) and p=0 (no
inhibitory weights).
• Substituting these vales in the above mentioned equation
we get
• This is one more method to find the θ
θ ≥ 2 ×1 − 0  θ ≥ 2
• Thus the output of neuron Y can be written

y = f ( yin ) = { 1,if , yin ≥ 2


0,if , yin < 2
2.Implement XOR function using McCulloch–Pitts neuron
X1 X2 y

 Consider the truth table for XOR 0 0 0


function.
0 1 1
 The M-P neuron has no particular
training algorithm. 1 0 1
 In M-P neuron ,only analysis is
being performed. 1 1 0
Where
X1 and X2 ---------are the inputs X1 W11
X1 Z1 V1
Z1 and Z2----------are the
W12
intermediate nodes y
Y---------are the output node Y
W11,W12,W21,and W22---------- are W21
X2
weights w.r.to input neurons X2 Z2
V1 and V2---------are weights w.r.to W22 V2
V1 and V2 respectively
Cont.…,
XOR function cannot be represented by simple and single logic
function; it is represented as.
y = x1 x2 + x1 x2
y = z1 + z2
Where
z1 = x1 x2 , ( function1)
z2 = x1 x2 , ( function2)
y = z1 ( OR ) z2 , ( function3)
A single-layer net is not sufficient to represent the XOR
function. We need to add an intermediate layer is necessary.
For identify the threshold value.
Case-1
 First function
z1 = x1 x2 X1 X2 Z
(0,0),z1in=0x1+0x1=0
 The truth table for function z
1
0 0 0
 Assume the weights are Initialized to (0,1), z1in=0x0+1x1=1
0 1 0
W11=1 ; W21=1 (1,0), z1in=1x1+0x1=1 1 0 1
Calculate the net inputs,
(1,1), z1in=1x1+1x1=2 1 1 0

W11
Now find threshold value activation function X1 Z1 V1

f ( yin ) = { 1,if , yin ≥θ


0,if , yin <θ
W12

Y
y

W21
 if the θ=0 or 1, 3 or 2 neurons fire
X2 Z2
 if the θ=2 then the last neuron is fires. W22 V2
 Hence ,it is not possible to be obtain function Z1
using this weights of W11=W21=1
Case-1 cont…
 First function
z1 = x1 x2
X1 X2 z
 The truth table for function z 1
0 0 0
 Assume the weights are Initialized to
0 1 0
W11=1 ; W21=-1
1 0 1
Calculate the net inputs,
( 0, 0 ) , z1in = 0 ×1 + 0 × −1 = 0 1 1 0
( 0,1) , z1in = 0 ×1 + 1× −1 = −1
(1, 0 ) , z1in = 1×1 + 0 × −1 = 1
(1,1) , z1in = 1×1 + 1× −1 = 0 W11
Now find threshold value activation function X1 Z1 V1

f ( yin ) = { 1,if , yin ≥θ


0,if , yin <θ
W12

Y
y

W21
 if the θ=0,-1 then the neuron is not fired.
X2 Z2
if the θ=1 then the neuron is fires. W22 V2
Hence W11=1 ; W21=-1
Case-2
 Second function
X1 X2 z
z2 = x1 x2
0 0 0

 The truth table for function Z2 0 1 1


 Assume the weights are initialized to 1 0 0
W12=W22=1 1 1 0
 Calculate the net inputs,
( 0, 0 ) , z2in = 0 ×1 + 0 ×1 = 0 X1 W11
X1 Z1 V1
( 0,1) , z2in = 0 ×1 + 1×1 = 1 W12
y
(1, 0 ) , z2in = 1×1 + 0 ×1 = 1 Y
(1,1) , z2in = 1×1 + 1×1 = 2 W21
X2
X2 Z2
 Now find threshold value activation W22 V2
function
f ( yin ) = { 1,if , yin ≥θ
0,if , yin <θ
Cont.….,
 if the θ=0,1,1then the neuron is not fired.
 if the θ=2 then the neuron is fires.
 Hence ,it is not possible to be obtain function
Z2 using this weights of W12=W22=1

So now taken weights W12=-1 W22=1

 Calculate the net inputs ,

( 0, 0 ) , z2in = 0 × −1 + 0 ×1 = 0
( 0,1) , z2in = 0 × −1 + 1×1 = 1
(1, 0 ) , z2in = 1× −1 + 0 ×1 = −1
(1,1) , z2in = 1× −1 + 1×1 = 0
 Now find threshold value activation if the θ=0,-1 then the neuron is not fired.
function
if the θ=1 then the neuron is fires.
f ( yin ) = { 1,if , yin ≥θ
0,if , yin <θ Hence W12=-1 ; W22=1
Case-3
 Third function X1 X2 y Z1 Z2
y=Z1 (OR) Z2
 The truth table for function y 0 0 0 0 0
0 1 1 0 1
yin = z1v1 + z2 v2 1 0 1 1 0
 Assume the weights are initialized to 1 1 0 0 0
v1 = v2 = 1
 Calculate the net inputs, W11
X1
( 0, 0 ) , z2in = 0 × 1 + 0 × 1 = 0 X1 Z1 V1

( 0,1) , z2in = 0 × 1 + 1× 1 = 1 W12


y
(1, 0 ) , z2in = 1× 1 + 0 × 1 = 1 Y
(1,1) , z2in = 0 × 1 + 0 × 1 = 0 X2 W21
X2 Z2
 Now find threshold value W22 V2
activation function
if the θ=0 then the neurons are not fired as desired.
f ( y in ) = { 1 , if , y in ≥ θ
0 , if , y in < θ if the θ=1 then the neuron is fires.
Hence v1 = v2 = 1
The final conclusion of XOR function

From case-1 X1 W11=1


X1 Z1 V1=1
W12=-1
W11=1 ; W21=-1 y
Y
From case-2
X2 W21=-1
W12=-1 ; W22=1 X2 Z2
W22=1 V2=1

From case-3
v1 = v2 = 1

X1 1
X1 Z1 1
-1
y
Y

X2 -1
X2 Z2
1 1
3.Implement ANDNOT function using McCulloch–Pitts neuron
X1 X2 Y
• Consider the truth table for ANDNOT
0 0 0
function.
0 1 0
• The M-P neuron has no particular
1 0 1
training algorithm.
1 1 0
• In M-P neuron ,only analysis is being
performed. (1,1 ) , y in = 1 × 1 + 1 × 1 = 2
• Hence, assume the weights be W1=1 (1, 0 ) , y in = 1 × 1 + 0 × 1 = 1
and W2=1. yin = x1w1 + x2 w2 ( 0,1 ) , y in = 0 × 1 + 1 × 1 = 1
• Calculate the net inputs, ( 0, 0 ) , y in = 0 ×1 + 0 ×1 = 0

X1
X1 W1=1
• Now find threshold value activation
function
Y y
f ( y in ) = { 1, if , y in ≥ θ
0 , if , y in < θ X2
X2 W2=1
Cont.….,
if the θ=0 or 2, the neurons are not fired as desired
if the θ=2 then the neuron is fires.
From the calculated net inputs, it is not possible to fire the neuron
for input(1,0) only. Hence ,these weights are not suitable. W1=W2=1

Hence, assume the weights be W1=1 and W2=-1 yin = x1w1 + x2 w2


X1
Calculate the net inputs, X1 W1=1

(1,1) , yin = 1×1 + 1× −1 = 0


Y y
(1, 0 ) , yin = 1×1 + 0 × −1 = 1
( 0,1) , yin = 0 ×1 + 1× −1 = −1 X2
X2 W2=-1
( 0, 0 ) , yin = 0 ×1 + 0 × −1 = 0
From the calculated net inputs, now it is possible to fire the
neuron for inputs(1,0) only by fixing a threshold of 1,i.e., for
Y unit.
θ ≥1
1.1. Introduction, 1.2. Models of Neuron Network-
Architectures –Knowledge representation, Artificial
Intelligence and Neural networks–1.3. Learning process-
Error correction learning, Hebbian learning –Competitive
learning- Boltzman learning, supervised learning,
Unsupervised learning–Reinforcement learning-Learning
tasks.
Learning-overview
What is meant with learning?
The ability of the neural network (NN) to learn from its
environment and to improve its performance through
learning.
The N N is stimulated by an environment
The N N undergoes changes in its free parameters
The N N responds in a new way to the environment

D e f i n i t i o n o f l e a r n i n g Learning is a process by which


Parameters of a neural network are adapted through a
process of stimulation by the environment in which the
network is embedded. The type of the learning is determined
by the manner in which the parameter changes take place.
Basic types of learning rules
1. Error-correction learning- optimum filtering
2. Hebbian learning- neurobiological
3. Competitive learning- neurobiological
4. Boltzmann learning- statistical mechanics
5. supervised learning -labelled training data
6. Unsupervised learning - unlabelled data
7. Reinforcement learning-optimal behavior in an
environment to obtain maximum reward
8. Learning tasks -the assumption that new
expertise is built through experiences
Erro r correction learning
• The error correction learning procedure is simple enough
in conception.
• The procedure is a s follows:
• During training an input is put into the network and
flows through the network generating a set of values
on the output units.
• Then, the actual output is compared with the desired
target, and a match is computed.
• If the output and target match, no change is made to
the net.
• However, if the output differs from the target a change
must be made to some of the connections.
Block diagram of a neural network ,highlighting
the only neuron in the output layer

X(n)
Input vector
Z(n) y k (n) d k (n)
One or more layers of Output neuron
Σ
hidden neurons k
-
+
e k (n)

Multilayer feedforward network

Where
 Z(n): signal vector produced by hidden neurons
 N: discrete time step of nitrative process involved in adjusting synaptic weights neuron k
 yk(n):output signal of neuron k
 dk(n):desired response/target output
 error signal = desired response – output signal
e k (n) = d k (n) –y k (n)
Fig. signal-flow graph of output neuron

X 1 (n)

w k 1 (n)
X 2 (n)
w k 2 (n)
φ(.) -1 d k (n)
x(n)
w k j (n)
v k (n) y k (n)
X j (n)
e k (n)
w k m (n)
X m (n)

input signal X j : presynaptic signal


Local field V k : postsynaptic signal
Error correction learning
• error signal = desired response – output signal

e k (n) = d k (n) –y k (n)

e k (n) actuates a control mechanism to make the


output signal y k (n) come closer to the desired
response d k (n) in step by step manner
Error correction learning-
continued..
A cost function
ε(n) = ½e² k (n) is the instantaneous value of the error
energy -a steady state.
Minimization of cost function leads to a learning rule
commonly referred to as the Delta rule or Widrow-
Holf rule.
Let (n) denote the value of synaptic weight of neuron
k excited by element of the signal vector x(n) at time
step n.
According to delta rule, the adjustment ∆wjk(n)
applied to the synaptic weight at time step n is
defined by,
∆wjk(n) = η e k (n) x j(n) , η is the learning rate
parameter
The adjustment made to a synaptic weight of a
neuron is proportional to the product of the error
signal and the input signal of the synapse in
question.
w jk (n+1) = wjk(n) + ∆wjk(n)
Main features
Error-correction learning is local
The learning rate η determines the stability or
convergence. It is positive constant that
determines rate of learning as we proceed from
one step in the learning process to another.
Delta rule presumes that error signal is directly
measurable.
The choice of η also has a profound influence on the
accuracy and other aspects of the learning
process.
E x a m ples of e r r o r
correction learning
The least-mean square (LMS) algorithm (Windrow
and Hoff) , also called Delta rule.

And its generalization known as the back


propagation (BP) algorithm.

Error correction learning is a example of


closed loop feedback system.
R e a l life applications
•Automated medical diagnosis :
play an important role in medical decision-making, helping
physicians to provide a fast and accurate diagnosis.
In ma n y research papers researchers discussed the
diagnosis by using back propagation algorithm which is a
use of error correction learning.
With the development of back propagation algorithm it has
now become possible to successfully attack problems
requiring neural networks with high degrees of
nonlinearity and high precision.
Introduction, 1.2. Models of Neuron Network-
Architectures –Knowledge representation, Artificial
Intelligence and Neural networks–1.3. Learning process-
Error correction learning, Hebbian learning –Competitive
learning- Boltzman learning, supervised learning,
Unsupervised learning–Reinforcement learning-Learning
tasks.
Hebbian Learning

It is based on correlative adjustment of weights.

The input and output patterns pairs are associated with a


weight matrix, W.
W= X × YT

Where X represent input and Y output

The transpose of the output is taken for weight adjustment.


Hebb Network was stated by Donald Hebb in 1949. According to
Hebb’s rule, the weights are found to increase proportionately to
the product of input and output.
It means that in a Hebb network if two neurons are
interconnected then the weights associated with these neurons
can be increased by changes in the synaptic gap.
This network is suitable for “bipolar data”
The Hebbian learning rule is generally applied to logic gates.
The weights are updated as:
W(new)=W(old)+x*y
Training Algorithm For Hebbian
Learning Rule
The training steps of the algorithm are as follows:
1. Initially, the weights are set to zero, i.e. w =0 for all
inputs i =1 to n and n is the total number of input
neurons.
2. Let s be the output. The activation function for inputs is
generally set as an identity function.
3. The activation function for output is also set to y= t.
4. The weight adjustments and bias are adjusted to:
W(new)=W(old)+X*Y
B(new)=b(old)+Y
The change is weights is
∆W=X*Y
=>W(new)=W(old)+ ∆W
The steps 2 to 4 are repeated for each input vector and output.
Example-1:
Hebbian Learning Rule -Implementation of logical AND
function
Let us implement logical AND function with Inputs Bias Targe
bipolar inputs using Hebbian Learning t
X1 and X2 are inputs, b is the bias taken as 1, X1 X2 b y
the target value is the output of logical AND 1 1 1 1
operation over inputs. 1 -1 1 -1
STEP-1:-Initially, the weights are set to zero -1 1 1 -1

and bias is also set as zero. -1 -1 1 -1

W1=W2=b=0
STEP-2:- First input vector is taken as [x1 x2 b]
= [1 1 1] and target value is 1
The new weights will be:
W(new)=w(old)+X*Y
W1(n)=W1(o)+X *Y=0+1*1=1
1

W2(n)=w2(o)+X2*Y=0+1*1=1
B(n)=b(o)+Y=0+1=1
The change in weights is :
∆W1=x1*Y=1;
∆W2=x2*Y=1
∆b=Y=1
STEP-3:- The above weights are the final new weights. When
the second input is passed, these become the initial weights.
STEP-4:- Take the second input = [1 -1 1]. The target is -1.
The weights vector is =[W1 W2 b] = [1 1 1]
The change is weights is: ∆W1=X1*Y=-1 ∆W2=X2*Y=1
∆b=Y=-1
The new weights will be W1(n)=W1+ ∆W1=> 1+(-1)=0
W2(n)=W2+ ∆W2=> 1+(1)=2
B(n)=b+ ∆b => 1+(-1)=0
STEP-5:- Similarly, the other inputs and weights are
calculated.
Cont.…,
STEP-6:- The table below shows all the input:

Target Weight Bias New


Inputs Bias
Output Changes Changes Weights

X1 X2 b y ∆W1 ∆w2 b W1 W2 b
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2

STEP-7:- Hebb Net for AND Function: 1


-2

X1 X1 2 Y

2
X2 X2
Hebb Net Solved Numerical Example-2

 Using the Hebb rule, find the weights required to


+ + + + + +
perform the following classifications of the given input
+ + +
patterns shown in Figure.
+ + + + + +
 The pattern is shown as 3x3 matrix from in the squares
‘I’ ‘O’
 The “+” symbols represent the value “+1” and EMPTY

squares indicate “-1”

Pattern Inputs Target

X1 X2 X3 X4 X5 X6 X7 X8 X9 b Y

I 1 1 1 -1 1 -1 1 1 1 1 1

O 1 1 1 1 -1 1 1 1 1 1 -1
Cont.…,
 Set the initial weights and bias to ZERO, i.e.,
∆ w i = x i y , i = 1to 9
w1 =w2 =w3 =w4 =w5 =w6 =w7 =w8 =w9 =b=0 ∆ w 1 = x1 y = 1 × 1 = 1
 Presenting first input pattern(I),we calculate change in
weights:
∆ w2 = x2 y = 1 × 1 = 1
wi (new) = wi (old ) + ∆wi ,sin ce[∆wi = xi y ] ∆ w3 = x3 y = 1 × 1 = 1
 Setting the old weights as the initial weights here, we ∆ w4 = x4 y = − 1 × 1 = − 1
obtain
w i ( new ) = w i ( old ) + ∆ w i ∆ w5 = x5 y = 1 × 1 = 1
w1 ( new ) = w1 ( old ) + ∆ w1 = 0 + 1 = 1
∆ w6 = x6 y = − 1 × 1 = − 1
w 2 ( new ) = w 2 ( old ) + ∆ w 2 = 0 + 1 = 1
w 3 ( new ) = w 3 ( old ) + ∆ w 3 = 0 + 1 = 1 ∆ w7 = x7 y = 1 × 1 = 1
sim ilarly ∆ w8 = x8 y = 1 × 1 = 1
w 4 ( new ) = − 1, w 5 ( new ) = 1, w 6 ( new ) = − 1
w 7 ( new ) = 1, w8 ( new ) = 1, w 9 ( new ) = 1
∆ w9 = x9 y = 1 × 1 = 1
b ( new ) = 1 ∆b = y = 1
Cont...,
 The weights after presenting first input pattern wi (new) = wi (old ) + ∆wi ,sin ce[∆wi = xi y ]
are
w1 = w2 = w3 = w7 = w8 = w9 = 1 w1 (new) = w1 (old ) + x1 y = 1 + 1× −1 = 0
w4 = w6 = −1
w2 (new) = w2 (old ) + x2 y = 1 + 1× −1 = 0
and , b = 1
These are the modify wights w3 (new) = w3 (old ) + x3 y = 1 + 1× −1 = 0
 Presenting first input pattern(O),we calculate w4 (new) = w4 (old ) + x4 y = −1 + 1× −1 = −2
change in weights:
w5 (new) = w5 (old ) + x5 y = 1 + −1× −1 = 2
 The weights after presenting first input pattern w6 (new) = w6 (old ) + x6 y = −1 + 1× −1 = −2
are
w1 = w2 = w3 = w7 = w8 = w9 = 0 w7 (new) = w7 (old ) + x7 y = 1 + 1× −1 = 0
w5 = 2 w8 (new) = w8 (old ) + x8 y = 1 + 1× −1 = 0
w4 = w6 = −2 w9 (new) = w9 (old ) + x9 y = 1 + 1× −1 = 0
and , b = 0
b(new) = b(old ) + y = 1 + 1× −1 = 0
Cont...,
 To draw the finely Hebb Net work is

X1
X1
0
X2
X2 0
X3 0
X3
X4 0
X4 y
-2
X5 Y
2
X5
-2
X6
X6 0
X7 0
X7 0
X8
X8

X9 X9
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning – Competitive learning-
Boltzman learning, supervised learning, Unsupervised
learning–Reinforcement learning-Learning tasks.
Competitive Learning
 Unsupervised learning.
 One neuron wins over all others
 Only the winning neuron (or its close associates) learn.
 Hard Learning – weight of only the winner is updated.
 Soft Learning – weight of winner and close associates is updated.
 A simple competitive network is basically composed of two
networks:
1.Hamming net - The Hamming net measures how much the input
vector resembles the weight vector of each perceptron.
2.Maxnet- The Maxnet finds the perceptron with the maximum value.
Each of them specializes in a different function.
• It is concerned with unsupervised training in which
the output nodes try to compete with each other to
represent the input pattern.
• To understand this learning rule we will have to
understand competitive net which is explained as
follows −
Basic Concept of Competitive Network:
• This network is just like a single layer feed-forward
• Network having feedback connection between the
outputs.
• The connections between the outputs are inhibitory
• type, which is shown by dotted lines, which means
the
• competitors never support themselves.
• Basic Concept of Competitive Learning Rule:
As said earlier, there would be competition among the
output nodes so the main concept is - during training,
the output unit that has the highest activation to a given
input pattern, will be declared the winner.
This rule is also called Winner-takes-all because only
the winning neuron is updated and the rest of the
neurons are left unchanged.
Mathematical Formulation
Following are the three important factors for mathematical
formulation of this learning rule −

Rule-1: Condition to be a winner:

Let a set of sensors all feed into three different nodes, so that
every node is connected to every sensor.

Let the weights that each node gives to its sensors be set
randomly between 0.0 and 1.0.
Let the output of each node be the sum of all its sensors,
each sensor's signal strength being multiplied by its weight.

 Suppose if a neuron yk wants to be the winner, then there


would be the following condition

It means that if any neuron, say, yk wants to win, then its
induced local field the output of the summation unit say vk,
must be the largest among all the other neurons in the
network. 1,if , vk > v j , forall , j , j ≠ k
yk = {.0,otherwise
Rule-2: Condition of the sum total of weight:
When the net is shown an input, the node with the highest
output is deemed the winner. The input is classified as being
within the cluster corresponding to that node.
Another constraint over the competitive learning rule is the
sum total of weights to a particular output neuron is going to
be 1.
For example, if we consider neuron k then

w
k
kj = 1, forall , k
Rule-3: Change of weight for the winner:
The winner updates each of its weights, moving weight from
the connections that gave it weaker signals to the connections
that gave it stronger signals.

Thus, as more data are received, each node converges on the


center of the cluster that it has come to represent and
activates more strongly for inputs in this cluster and more
weakly for inputs in other clusters.

If a neuron does not respond to the input pattern, then no


learning takes place in that neuron. However, if a particular
neuron wins, then the corresponding weights are adjusted as
follows
∆wkj = { −η ( x j − wkj ,if , neuron , k , wins
0,if , neuron , k ,losses

• Here η is the learning rate.

This clearly shows that we are favoring the winning


neuron by adjusting its weight and if a neuron is lost,
then we need not bother to re-adjust its weight.
Example
• Construct a competitive learning neural network for the
input vector [0 0 1 1], [1 0 0 0], [0 1 1 0],
• [0 0 0 1 ]. Assuming the learning weight η=0.5 and the
no. of clusters that input vector falls into is ‘2’
 0.2 0.9 
• No. of clusters=2  
0.4 0.7
W ij =  
• Step1:-Initialize Wij matrix to random values.  0.6 0.5 
 
 0.8 0.3 

• Step2:- Calculate the induced outputs V1 and V2 of O1


and O2 for input vector [ 0 0 1 1 ] X1 X2 X3 X4
V1=0+0+1*0.6+1*0.8= V1=1.4 V = X *W + X *W + X * W + X *W
1 1 11 2 21 3 31 4 41
Cont.…,
V2 = X 1 *W12 + X 2 *W22 + X 3 *W32 + X 4 *W42
∆W11 = η ( x1 − w11 ) = 0.5 ( 0 − 0.2 ) = −0.1
V2=0+0+1*0.5+1*0.3
∆W21 = η ( x2 − w21 ) = 0.5 ( 0 − 0.4 ) = −0.2
V2=0.8
∆W31 = η ( x3 − w31 ) = 0.5 (1 − 0.6 ) = 0.2
Since V1>V2 ,output of O1is Y1 ∆W41 = η ( x4 − w41 ) = 0.5 (1 − 0.8 ) = 0.1
Y1=1 and Y2=0

Calculate the change in weight is


∆W12 = 0
∆W22 = 0

∆Wij = { η ( xi − wij ),if , y j ,is ,winner


0
∆W32 = 0
∆W42 = 0
Cont.…,
Update weight matrix ∆W ij = W ij + ∆W ij

= 0 .2 − 0 .1 = 0 .1
 0 .1 0 .9 
 0 .2 0 . 7 
W = 
ij
 0 .8 0 .5 
 
 0 .9 0 .3 

Step3:- Calculate the induced outputs V1&V2 of O1 &O2 for input vector [ 1 0 0 0 ]
x1 x2 x3 x4

V1 = X 1 *W11 + X 2 *W21 + X 3 *W31 + X 4 *W41


= 1*0.1 + 0 + 0 + 0 = 0.1
V2 = X 1 *W12 + X 2 *W22 + X 3 *W32 + X 4 *W42
= 1*0.9 + 0 + 0 + 0 = 0.9
Cont.…,
Since V2>V1
 Y1=0 and Y2=1 ∆W11 = η ( x1 − w11 = 0
∆W21 = η ( x2 − w21 = 0
Calculate the change in weights
∆W31 = η ( x3 − w31 = 0
∆ W ij = { η ( x i − w ij ),if
0
, y j , is , w in n er
∆W41 = η ( x4 − w41 = 0
∆W12 = η ( x1 − w12 = 0.5(1 − 0.9) = 0.05
∆W22 = η ( x2 − w22 = 0.5(1 − 0.7) = −0.35
Update weight matrix
∆W32 = η ( x3 − w32 = 0.5(1 − 0.5) = −0.25
Wij = Wij + ∆Wij
 0.1 0.05 ∆W42 = η ( x4 − w42 = 0.5(1 − 0.3) = −0.15
0.2 0.35
Wij = 
 0.8 0.25
 
0.9 0.15
Cont..,
Step4:- Calculate the induced matrix outputs V1& V2 of O1 & O2 for input vector [ 0 1 1 0 ]
x1 x2 x3 x4

V1 = X 1 *W11 + X 2 *W21 + X 3 *W31 + X 4 *W41

V1=0+1*0.2+1*0.8+0

V1=1

V2 = X 1 *W12 + X 2 *W22 + X 3 *W32 + X 4 *W42


V2=0+1*0.35+1*0.25+0

V2=0.60

Since V1>V2 ,

 Y1=1 and Y2=0


Cont.…,
Update weight matrix
∆W11 = η ( x1 − w11 ) = 0.5 ( 0 − 0.1) = −0.05
∆W21 = η ( x2 − w21 ) = 0.5 ( 0 − 0.2 ) = 0.4
Wij = Wij + ∆Wij
∆W31 = η ( x3 − w31 ) = 0.5 (1 − 0.8 ) = 0.1
0.05 0.95
∆W41 = η ( x4 − w41 ) = 0.5 (1 − 0.9 ) = −0.45
 0.6 0.35
Wij = 
 0.9 0.25
  ∆W12 = 0
0.45 0.15
∆W22 = 0
∆W32 = 0
∆W42 = 0
Cont.…,
Step5:- Calculate the induced matrix outputs V1& V2 of O1 & O2 for input vector [ 0 1 1 0 ]
x1 x2 x3 x4

V1 = X 1 *W11 + X 2 *W21 + X 3 *W31 + X 4 *W41


V1=0+0+0+0.45

V1=0.45

V2 = X 1 *W12 + X 2 *W22 + X 3 *W32 + X 4 *W42


V2=0+0+0+1*0.15

V2=0.15

Since V1>V2 ,

 Y1=1 and Y2=0


Cont.…,

∆W11 = η ( x1 − w11 ) = 0.5 ( 0 − 0.05 ) = −0.025


Update weight matrix
∆W21 = η ( x2 − w21 ) = 0.5 ( 0 − 0.6 ) = 0.3
∆W31 = η ( x3 − w31 ) = 0.5 (1 − 0.9 ) = 0.45
∆W41 = η ( x4 − w41 ) = 0.5 (1 − 0.45 ) = −0.275
Wij = Wij + ∆Wij
0.025 0.95
 0.3 0.35
Wij = 
∆W12 = 0
 0.45 0.25
 
0.175 0.15 ∆W22 = 0
∆W32 = 0
∆W42 = 0
From example -Draw the Competitive Network

X1
X1 W11

W12
V1 Y1
X2
W21 O1
X2

W22

X3 W31 Y2
X3 V2
W32 O2

W41
X4
W42
X4
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning –Competitive learning-
Boltzman learning, supervised learning, Unsupervised
learning–Reinforcement learning-Learning tasks.
Boltzman learning
 These are stochastic learning processes having
recurrent structure and are the basis of the early
optimization techniques used in ANN.

 Boltzmann Machine was invented by Geoffrey Hinton


and Terry Sejnowski in 1985. More clarity can be
observed in the words of Hinton on Boltzmann Machine.

 “A surprising feature of this network is that it uses only


locally available information.
The change of weight depends only on the behavior of the
two units it connects, even though the change optimizes a
global measure” - Ackley, Hinton 1985.

 A Hopfield Net consisting of Binary Stochastic Neuron


with hidden units is called Boltzmann Machine.

 A Boltzmann Machine is a network of symmetrically


connected, neuron like units that make stochastic
decisions about whether to be on or off.
• The Boltzmann learning rule is a stochastic
learning algorithm derived from ideas rooted
from statistical machines.
• A neural network designed based on this rule is
called Boltzmann machine.
• In a Boltzmann machine, the neuron constitute
a recurrent structure and they operate in
binary manner
- On state denoted by +1
- Off state denoted by -1
Energy Function
The machine is characterized by an energy function
3
1= − ∑7 ∑8 6789798 j ≠k
4
Where
Xj and Xk is state of neuron j and k
Wjk is the synaptic weight from j to k
j not equal to k shows the none of neurons has
self feedback
Probability
The machine operates by choosing a neuron at random.
Let neuron k at some step of learning process, then flipping
the state from +Xk to –Xk at some temperature T with
probability
1
P ( xk →−xk ) =
1+ exp ( −∆Ek / T )
Where
∆Ek is energy change
T is pseudo temperature
If this rule is applied repeatedly, the machine will reach
thermal equilibrium.
Types of neuron and operations
• The neuron are partitioned into two functional groups
• Visible: They provide the interface between the network and
the environment.
• Hidden: They always operate freely
There are two modes of operation
• Clamped condition: in which the visible neurons are all
clamped onto specific states determined by the environment.
• Free running condition: in which all the neurons are allowed
to operate freely.
Weight Upgradation
• ;<=
>
is the correlation between states of j and k in its
clamped condition.
• ;<=
?
is the correlation between states of j and k in its
clamped condition.
• Both correlations are averaged in its thermal equilibrium.
• According to Boltzmann learning rule, the change in
weight is
∆$<= = A(;<= >
-;<=?
)
Where η is the learning rate
;<=
>
, ;<=
?
Are the values from -1 to +1
Some important points about Boltzmann Machine

• They use recurrent structure.


• They consist of stochastic neurons, which have one of
the two possible states, either 1 or 0.
• Some of the neurons in this are adaptive free state
and some are clamped frozen state
• If we apply simulated annealing on discrete Hopfield
network, then it would become Boltzmann Machine.
Objective of Boltzmann Machine
• The main purpose of Boltzmann Machine is to
optimize the solution of a problem. It is the work of
Boltzmann Machine to optimize the weights and
quantity related to that particular problem.
Architecture -p

The following diagram shows b b b


-p -p
the architecture of Boltzmann U1,1 U1,j U1,n
-p
machine. It is clear from the -p -p -p
b b
diagram, that it is a two- -p Ui,1 -p Ui,j
-p
Ui,n -p
dimensional array of units. b -p
-p
-p -p
Here, weights on
Un,1 -p Un,j Un,n
-p
interconnections between
b b b
units are – p where p > 0.The
weights of self-connections
are given by b where b > 0.
Training Algorithm
As we know that Boltzmann machines have fixed weights, hence
there will be no training algorithm as we do not need to update the
weights in the network.
However, to test the network we have to set the weights as well as
to find the consensus function .
Boltzmann machine has a set of units Ui and Uj and has bi-
directional connections on them.
1. We are considering the fixed weight say wij .
2. wij ≠ 0 if U and U are connected.
3. There also exists a symmetry in weighted interconnection, i.e.
wij = wji .
4. wij also exists, i.e. there would be the self-connection between
units.
5. For any unit Ui , its state u would be either 1 or 0.
•The main objective of Boltzmann Machine is to maximize the
Consensus Function CF which can be given by the following
relation
C F =   w ij u i u j
i j≤i

•Now, when the state changes from either 1 to 0 or from 0


to 1, then the change in consensus can be given by the
following relation
 
∆ C F = (1 − 2 u i )  w ij +  u i w ij 
 j≠i 
Here ui is the current state of Ui .
The variation in coefficient (1 - 2ui ) is given by the
following relation
(1 − 2ui ) = { +−1,1,UU ,,isis..currently
i
i
, off
currently , on
Generally, unit Ui does not change its state, but if it
does then the information would be residing local to the
unit.
With that change, there would also be an increase in
the consensus of the network.
Probability of the network to accept the change in the
state of the unit is given by the following relation
1
AF ( i, T ) =
 ∆CF (i ) 
1 + exp  −
 T 
Here, T is the controlling parameter. It will decrease
as CF reaches the maximum value.
Testing Algorithm
Step 1 − Initialize the following to start the training −
Weights representing the constraint of the problem
Control Parameter T
Step 2 − Continue steps 3-8, when the stopping condition
is not true.
Step 3 − Perform steps 4-7.
Step 4 − Assume that one of the state has changed the
weight and choose the integer I, J as random values
between 1 and n.
Step 5 − Calculate the change in consensus as follows
 
∆ C F = (1 − 2 u i )  w i j +  u i w ij 
 j≠i 
Step 6 − Calculate the probability that this network would accept the change in state
1
AF (i , T ) =
 ∆ C F (i) 
1 + exp − 
 T

Step 7 − Accept or reject this change as follows −


Case I − if R < AF, accept the change.
Case II − if R ≥ AF, reject the change.
Here, R is the random number between 0 and 1.

Step 8− Reduce the control parameter temperature as follows −

T new=0.95T old

Step 9− Test for the stopping conditions which may be as follows −

 Temperature reaches a specified value

 There is no change in state for a specified number of iterations


Step 6 − Calculate the probability that this network would
accept the change in state 1
A F (i , T ) =
 ∆ C F (i) 
1 + exp − 
 T
Step 7 − Accept or reject this change as follows −
Case I − if R < AF, accept the change.
Case II − if R ≥ AF, reject the change.
Here, R is the random number between 0 and 1.
Step 8− Reduce the control parameter temperature as
follows −T new=0.95T old
Step 9− Test for the stopping conditions which may be as
follows −
• Temperature reaches a specified value
• There is no change in state for a specified number of
iterations
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning –Competitive learning- Boltzman
learning, supervised learning, Unsupervised learning–
Reinforcement learning-Learning tasks.
Supervised Learning
 Supervised learning is the types of machine learning in which machines are trained using well "labelled"

training data, and on basis of that data, machines predict the output.

 The labelled data means some input data is already tagged with the correct output.

 In supervised learning, the training data provided to the machines work as the supervisor that teaches the

machines to predict the output correctly.

 It applies the same concept as a student learns in the supervision of the teacher.

 Supervised learning is a process of providing input data as well as correct output data to the machine

learning model.

 The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with

the output variable(y).

 In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud

Detection, spam filtering, etc.


Steps Involved in Supervised Learning:
1. First Determine the type of training dataset

2. Collect/Gather the labelled training data.

3. Split the training dataset into training dataset, test dataset, and validation dataset.

4. Determine the input features of the training dataset, which should have enough knowledge so

that the model can accurately predict the output.

5. Determine, such as support vector machine, decision tree, etc. the suitable algorithm for the

model

6. Execute the algorithm on the training dataset. Sometimes we need validation sets as the control

parameters, which are the subset of training datasets.

7. Evaluate the accuracy of the model by providing the test set. If the model predicts the correct

output, which means our model is accurate.


How Supervised Learning Works
 In supervised learning, models are trained using labelled dataset, where the model learns about each type
of data.
 Once the training process is completed, the model is tested on the basis of test data (a subset of the training
set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and diagram:
 Suppose we have a dataset of different types of shapes which includes
square, rectangle, triangle, and Polygon. Now the first step is that we
need to train the model for each shape.
 If the given shape has four sides, and all the sides are equal, then it will
be labelled as a Square.
 If the given shape has three sides, then it will be labelled as a triangle.
 If the given shape has six equal sides then it will be labelled as hexagon.
 Now, after training, we test our model using the test set, and the task
of the model is to identify the shape.
 The machine is already trained on all types of shapes, and when it
finds a new shape, it classifies the shape on the bases of a number of
sides, and predicts the output.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:

 Regression algorithms are


Supervised learning
used if there is a relationship
between the input variable  Classification
and the output variable. algorithms are used
 It is used for the prediction when the output
of continuous variables, Classificatio variable is categorical,
such as Weather forecasting, Regression
n which means there are
Market Trends, etc. two classes such as
 Below are some popular. Yes-No, Male-Female,
Regression algorithms which True-false, etc.
come under supervised
learning:

 Linear Regression  Filtering,


 Regression Trees  Random Forest
 Non-Linear Regression  Decision Trees
 Bayesian Linear Regression  Logistic Regression
 Polynomial Regression  Support vector Machines
Advantages and Disadvantages of supervised learning:

Advantages of Supervised learning:


 With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
 In supervised learning, we can have an exact idea about the classes of objects.
 Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

Disadvantages of supervised learning:


 Supervised learning models are not suitable for handling the complex tasks.
 Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
 Training required lots of computation times.
 In supervised learning, we need enough knowledge about the classes of object.
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning –Competitive learning- Boltzman
learning, supervised learning, Unsupervised learning–
Reinforcement learning-Learning tasks.
Unsupervised Learning
• As the name suggests, unsupervised learning is a
machine learning technique in which models are not
supervised using training dataset.
• Instead, models itself find the hidden patterns and
insights from the given data.
• It can be compared to learning which takes place in the
human brain while learning new things.
• It can be defined as : Unsupervised learning is a type
of machine learning in which models are trained
using unlabeled dataset and are allowed to act on
that data without any supervision.
• Unsupervised learning cannot be directly applied to a
regression or classification problem because unlike
supervised learning, we have the input data but no
corresponding output data.
• The goal of unsupervised learning is to find the
underlying structure of dataset, group that data
according to similarities, and represent that dataset
in a compressed format.
Why use Unsupervised Learning?
Below are some main reasons which describe the
importance of Unsupervised Learning:
• Unsupervised learning is helpful for finding useful
insights from the data.
• Unsupervised learning is much similar as a human
learns to think by their own experiences, which
makes it closer to the real AI.
• Unsupervised learning works on unlabeled and
uncategorized data which make unsupervised learning
more important.
• In real-world, we do not always have input data with
the corresponding output so to solve such cases, we
need unsupervised learning.
Example:
 Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs.
 The algorithm is never trained upon the given dataset, which means it does not have any idea
about the features of the dataset.
 The task of the unsupervised learning algorithm is to identify the image features on their own.
 Unsupervised learning algorithm will perform this task by clustering the image dataset into the
groups according to similarities between images.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:

 Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given.
 Now, this unlabeled input data is fed to the machine learning model in order to train it.
 Firstly, it will interpret the raw data to find the hidden patterns from the data and then will
apply suitable algorithms such as k-means clustering, Decision tree, etc.
 Once it applies the suitable algorithm, the algorithm divides the data objects into groups according
to the similarities and difference between the objects
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of problems:
 An association rule is an
unsupervised learning
 Clustering is a method
Unsupervised Learning method which is used for
of grouping the objects finding the relationships
into clusters such that between variables in the
objects with most large database.
similarities remains into  It determines the set of
a group and has less or Clustering Association
items that occurs together
no similarities with the in the dataset.
objects of another  Association rule makes
group. marketing strategy more
effective.
 Cluster analysis finds
 Such as people who buy
the commonalities X item (suppose a bread)
between the data are also tend to purchase
objects and categorizes Y (Butter/Jam) item.
them as per the  A typical example of
presence and absence Association rule is Market
of those commonalities. Basket Analysis.
Unsupervised Learning algorithms
Below is the list of some popular unsupervised learning
algorithms:
K-means clustering
KNN (k-nearest neighbour's)
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition
Advantages and Disadvantages of
Unsupervised Learning
Advantages of Unsupervised Learning
 Unsupervised learning is used for more complex
tasks as compared to supervised learning because,
in unsupervised learning, we don't have labeled
input data.
 Unsupervised learning is preferable as it is easy to get
unlabeled data in comparison to labeled data.
Disadvantages of Unsupervised Learning
 Unsupervised learning is intrinsically more
difficult than supervised learning as it does
not have corresponding output.
 The result of the unsupervised learning
algorithm might be less accurate as input
data is not labeled, and algorithms do not
know the exact output in advance.
Difference between Supervised and Unsupervised Learning
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled
data.
Supervised learning model takes direct feedback to check if it is Unsupervised learning model does not take any feedback.
predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in data.

In supervised learning, input data is provided to the model along In unsupervised learning, only input data is provided to the
with the output. model.
The goal of supervised learning is to train the model so that it The goal of unsupervised learning is to find the hidden patterns
can predict the output when it is given new data. and useful insights from the unknown dataset.
Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train
the model.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.
Supervised learning can be used for those cases where we know the Unsupervised learning can be used for those cases where we have
input as well as corresponding outputs. only input data and no corresponding output data.
Supervised learning model produces an accurate result. Unsupervised learning model may give less accurate result as
compared to supervised learning.
Supervised learning is not close to true Artificial intelligence as Unsupervised learning is more close to the true Artificial
in this, we first train the model for each data, and then only it can Intelligence as it learns similarly as a child learns daily routine
predict the correct output. things by his experiences.
It includes various algorithms such as Linear Regression, Logistic It includes various algorithms such as Clustering, KNN, and
Regression, Support Vector Machine, Multi-class Classification, Apriori algorithm.
Decision tree, Bayesian Logic, etc.
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning –Competitive learning- Boltzman
learning, supervised learning, Unsupervised learning–
Reinforcement learning- Learning tasks.
Reinforcement Learning

 Reinforcement Learning is defined as a Machine Learning method that is concerned with how

software agents should take actions in an environment.

 Reinforcement Learning is a part of the deep learning method that helps you to maximize

some portion of the cumulative reward.

 This neural network learning method helps you to learn how to attain a complex objective

or maximize a specific dimension over many steps.

 Reinforcement learning is a machine learning training method based on rewarding desired

behaviors and/or punishing undesired ones.

 In general, a reinforcement learning agent is able to perceive and interpret its environment,

take actions and learn through trial and error.


Applications of Reinforcement Learning
Here are applications of Reinforcement Learning:
 Robotics for industrial automation. Challenges of Reinforcement Learning
 Business strategy planning Here are the major challenges you will face
 Machine learning and data processing while doing Reinforcement earning:
 It helps you to create training systems that provide  Feature/reward design which should be
custom instruction and materials according to the very involved
requirement of students.  Parameters may affect the speed of
 Aircraft control and robot motion control learning.
 Realistic environments can have partial
observability.
Why use Reinforcement Learning?  Too much Reinforcement may lead to an
Here are prime reasons for using Reinforcement overload of states which can diminish the
Learning: results.
 It helps you to find which situation needs an action  Realistic environments can be non-
 Helps you to discover which action yields the highest stationary.
reward over the longer period.
 Reinforcement Learning also provides the learning
agent with a reward function.
 It also allows it to figure out the best method for
obtaining large rewards.
Important Components of Deep Reinforcement Learning Method
Here are some important terms used in Reinforcement AI:
 Agent: It is an assumed entity which performs actions in an environment to
gain some reward.
 Environment (e): A scenario that an agent has to face.
 Reward (R): An immediate return given to an agent when he or she performs
performs specific action or task.
 State (s): State refers to the current situation returned by the environment.
 Policy (π): It is a strategy which applies by the agent to decide the next
action based on the current state.
 Value (V): It is expected long-term return with discount, as compared to the
the short-term reward.
 Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
 Model of the environment: This mimics the behavior of the environment. It
helps you to make inferences to be made and also determine how the
environment will behave.
 Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
 Q value or action value (Q): Q value is quite similar to value. The only
difference between the two is that it takes an additional parameter as a
current action.
How Reinforcement Learning works?
 Let’s see some simple example which helps you to illustrate the reinforcement learning

mechanism.

 Consider the scenario of teaching new tricks to your cat

 As cat doesn’t understand English or any other human language, we can’t tell her directly what

to do. Instead, we follow a different strategy.

 We emulate a situation, and the cat tries to respond in many different ways. If the cat’s response

is the desired way, we will give her fish.

 Now whenever the cat is exposed to the same situation, the cat executes a similar action with

even more enthusiastically in expectation of getting more reward(food).

 That’s like learning that cat gets from “what to do” from positive experiences.

 At the same time, the cat also learns what not do when faced with negative experiences.
Example of Reinforcement Learning
In this case,
 Your cat is an agent that is exposed to the
environment. In this case, it is your house. An
example of a state could be your cat sitting, and
you use a specific word in for cat to walk.
 Our agent reacts by performing an action
transition from one “state” to another “state.”
 For example, your cat goes from sitting to
walking.
 The reaction of an agent is an action, and the
policy is a method of selecting an action given a
state in expectation of better outcomes. How Reinforcement Learning works
 After the transition, they may get a reward or
penalty in return.
Reinforcement Learning Algorithms
There are three approaches to implement a Reinforcement Learning algorithm.
Value-Based:
In a value-based Reinforcement Learning method, you should try to maximize a value function V(s).
In this method, the agent is expecting a long-term return of the current states under policy π.
Policy-based:
In a policy-based RL method, you try to come up with such a policy that the action performed in
every state helps you to gain maximum reward in the future.
Two types of policy-based methods are:
•Deterministic: For any state, the same action is produced by the policy π.
•Stochastic: Every action has a certain probability, which is determined by the following equation.
Stochastic Policy : n{a\s) = P\A, = a\S, =S]

Model-Based:
In this Reinforcement Learning method, you need to create a virtual model for each environment.
The agent learns to perform in that specific environment.
Characteristics of Reinforcement Learning

Here are important characteristics of reinforcement learning

 There is no supervisor, only a real number or reward signal

 Sequential decision making

 Time plays a crucial role in Reinforcement problems

 Feedback is always delayed, not instantaneous

 Agent’s actions determine the subsequent data it receives


Types of Reinforcement Learning
There are two important learning models in reinforcement
learning:
1. Markov Decision Process
2. Q learning
1.Markov Decision Process
The following parameters are used to get a solution:
 Set of actions- A
 Set of states -S
 Reward- R
 Policy- n
 Value- V
The mathematical approach for mapping a solution in
reinforcement Learning is recon as a Markov Decision
Process or (MDP).
2.Q-Learning
Q-learning is a value-based method of supplying information to inform which action an
agent should take.
Let’s understand this method by the following example:
 There are five rooms in a building which are connected by doors.
 Each room is numbered 0 to 4
 The outside of the building can be one big outside area (5)
 Doors number 1 and 4 lead into the building from room 5
Next, you need to associate a reward value to each door:
 Doors which lead directly to the goal have a reward of 100
 Doors which is not directly connected to the target room gives zero reward
 As doors are two-way, and two arrows are assigned for each room
 Every arrow in the above image contains an instant reward value
Explanation:
 In this image, you can view that room represents a state
 Agent’s movement from one room to another represents
an action
 In the below-given image, a state is described as a node,
while the arrows show the action.
For example, an agent traverse from room number 2 to 5
 Initial state = state 2
 State 2-> state 3
 State 3 -> state (2,1,4)
 State 4-> state (0,5,3)
 State 1-> state (5,3)
 State 0-> state 4
Introduction, 1.2. Models of Neuron Network-Architectures
–Knowledge representation, Artificial Intelligence and
Neural networks–1.3. Learning process-Error correction
learning, Hebbian learning –Competitive learning- Boltzman
learning, supervised learning, Unsupervised learning–
Reinforcement learning- Learning tasks.
Learning tasks
Learning tasks for neural networks can be classified according to the
source of information for them.
In this way, some learning tasks that learn from data sets are function
regression, pattern recognition or time series prediction
The following figure shows the learning tasks for neural networks
described in this section.
As we can see, they are capable of dealing with a great range of
applications.
Any of that learning tasks are formulated as being a variational problem.
All of them are solved using the three-step approach described in the
previous section. Modelling and classification are the most traditional

Function Pattern
regression recognition

Data sets
Function regression
Function regression is the most popular learning task for
neural networks.
It is also called modelling.
The function regression problem can be regarded as the
problem of approximating a function from a data set
consisting of input-target instances.
The targets are a specification of what the response to the
inputs should be.
While input variables might be quantitative or qualitative,
in function regression, target variables are quantitative.
• Loss indices for function regression are usually based on a
sum of errors between the outputs from the neural
network and the targets in the training data.
• As the training data is usually deficient, a regularization
term might be required to solve the problem correctly.
• An example is to design an instrument that can determine
serum cholesterol levels from measurements of the spectral
content of a blood sample.
• There are a number of patients for which there are
measurements of several wavelengths of the spectrum.
• For the same patients, there are also measurements of
several cholesterol levels based on serum separation.
Pattern recognition
The learning task of pattern recognition gives rise to
artificial intelligence.
That problem can be stated as the process whereby a received
pattern, characterized by a distinct set of features, is
assigned to one of a prescribed number of classes.
Pattern recognition is also known as classification.
Here the neural network learns from knowledge represented
by a training data set consisting of input-target instances.
The inputs include a set of features that characterize a
pattern, and they can be quantitative or qualitative.
• The targets specify the class that each pattern belongs to and
therefore are qualitative.
• We can formulate classification problems can be formulated as being
modelling problems.
• As a consequence, the loss index used here is also based on the sum
squared error.
• Anyway, the learning task of pattern recognition is more challenging
to solve than that of function regression.
• This means that a good knowledge of the state of the technique is
recommended for success.
• A typical example is to distinguish hand-written versions of
characters.
• Images of the characters might be captured and fed to a computer.
• An algorithm then seeks to which can distinguish as reliably as
possible between the characters.
Perceptron Learning
Perceptron Learning
 In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term for

all folks.

 It is the primary step to learn Machine Learning and Deep Learning technologies, which consists

of a set of weights, input values or scores, and a threshold.

 Perceptron is a building block of an Artificial Neural Network.

 Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for performing

certain calculations to detect input data capabilities or business intelligence.

 Perceptron is a linear Machine Learning algorithm used for supervised learning for various

binary classifiers.

 This algorithm enables neurons to learn elements and processes them one by one during

preparation.
What is Perceptron?

 A perceptron is the smallest element of a neural network.

 A perceptron network is a group of simple logical statements that come together to create

an array of complex logical statements, known as the neural network.

 The perceptron is a mathematical model of the biological neuron.

 It produces binary outputs from input values while taking into consideration weights and

threshold values.

 Perceptron's use a brittle activation function to give a positive or negative output

based on a specific value.


Components of a Perceptron
weights
Each perceptron comprises four different Bias
parts: X1
W1
b
1. Input Values: A set of values or a dataset for
predicting the output value. They are also X2 W2
described as a dataset’s features and dataset. Activation
Function
2. Weights: The real value of each feature is known inputs W3
as weight. It tells the importance of that feature X3 Σ f
in predicting the final value. Ypred
Summation
3. Bias: The activation function is shifted towards Function
the left or right using bias. You may understand Wa
it simply as the y-intercept in the line equation.
Xa
4. Summation Function: The summation function
binds the weights and inputs together. It is a
function to find their sum.

5. Activation Function: It introduces non-linearity


in the perceptron model.
Perceptron Learning Algorithm

 Perceptron Networks are single-layer feed-forward networks. These are also called Single
Perceptron Networks.
 The Perceptron consists of an input layer, a hidden layer, and output layer.
 The input layer is connected to the hidden layer through weights which may be inhibitory
or exciters or zero (-1, +1 or 0).
 The activation function used is a binary step function for the input layer and the
hidden layer.
The output is
Y= f (y)
 The activation function is:

 1, if , y ≥ threshold

F ( y ) = 0, if − ( threshold ) ≤ y ≤ threshold
 −1, if , y < −(threshold )

 The weight updation takes place between the hidden layer and the output layer to match
the target output.
 The error is calculated based on the actual output and the desired output

The weights are adjusted according to learning rule alpha α as

W(new)=w(old)+ α*t*x

B(new)=b(old)+ α*t

Α is the learning rate and t is the target

 If the output matches the target then no weight updation takes place.
 The weights are initially set to 0 or 1 and adjusted successively till an optimal solution is found.
 The weights in the network can be set to any values initially.
 The Perceptron learning will converge to weight vector that gives correct output for all input
training pattern and this learning happens in a finite number of steps.
 The Perceptron rule can be used for both binary and bipolar inputs.
Learning Rule for Single Output Perceptron
1. Let there be “n” training input vectors and x (n) and t (n) are associated with the target values.
2. Initialize the weights and bias. Set them to zero for easy calculation.
3. Let the learning rate be 1.
4. The input layer has identity activation function so x (i)= s ( i).
5. To calculate the output of the network:
Y=b+ ΣX*W for all input vectors from 1to n.
6.The activation function is applied over the net input to obtain an output.

 1, if , y ≥ threshold

F ( y ) = 0, if − ( threshold ) ≤ y ≤ threshold
 −1, if , y < −(threshold )

7.Now based on the output, compare the desired target value (t) and the actual output.
if y ≠ t then weight updating takes place
W(n)=w(o)+α*x
B(n)=b(o)+ α*t
if y=t then
W(n)=w(o)
B(n)=b(o)
8. Continue the iteration until there is no weight change. Stop once this condition is
achieved.
Learning Rule for Multiple Output Perceptron
1. Let there be “n” training input vectors and x (n) and t (n) are associated with the target values.
2. Initialize the weights and bias. Set them to zero for easy calculation.
3. Let the learning rate be 1.
4. The input layer has identity activation function so x (i)= s ( i).
5. To calculate the output of each output vector from j= 1 to m, the net input is:
Y=b+ ΣX*W for all input vectors from 1to n for every j.
6.The activation function is applied over the net input to obtain an output.
 1, if , y ≥ threshold

F ( y ) = 0, if − ( threshold ) ≤ y ≤ threshold
 −1, if , y < −(threshold )

7.Now based on the output, compare the desired target value (t) and the actual output and make weight
adjustments.
if y ≠ t then weight updating takes place
W(n)=w(o)+α*t*x
B(n)=b(o)+ α*t
if y=t then
W(n)=w(o)
B(n)=b(o)
where w is the weight vector of the connection links between ith input and jth output neuron and t is the target
output for the output unit j.
8.Continue the iteration until there is no weight change. Stop once this condition is achieved.
Examples Of Perceptron Learning Rule
Implementation of OR function using a Perceptron network for bipolar inputs and output.
The input pattern will be A, B. Let the initial weights be W1=0.6,W2=0.6. The threshold is set to
1 and the learning rate is n=0.5.
Case-1: A=0,B=0 and Target=0 A B Y=A+B
Wi.Xi = 0*0.6+0*0.6 = 0 0 0 0
This is not greater than the threshold of 1,so the output=0 (since Target=Output) 0 1 1
Case-2: A=0,B=1 and Target=1 1 0 1
Wi.Xi = x1w1+x2w2 = 0*0.6+1*0.6 = 0.6 1 1 0
This is not greater than the threshold of 1,so the output=0(since Target ≠Output)
Modified weights
Wi=wi+n(t-o)xi
W1=0.6+0.5(1-0)0 = 0.6
W2=0.6+0.5(1-0)1= 1.1
Check :
From Case-1 wi.xi=0*0.6+0*1.1 = 0 (since Target=Output)
From case-2 wi.xi=0*0.6+1*1.1 =1.1
This is greater than the threshold of 1,so the output=1(since Target =Output)
Case-3: A=1,B=0 and Target=1
Wi.xi=1*0.6+0*1.1= 0.6
This is not greater than the threshold of 1,so the output=0 (since Target ≠Output)
Modified weights
Wi= wi + n(t-o)xi
W1=0.6 + 0.5(1-0)1 =1.1
W2=1.1 + 0.5(1-0)1= 1.1
Check :
From case-1: wi.xi = 0*1.1 + 0*1.1 = 0
This is not greater than the threshold of 1,so the output=0 (since Target=Output)
From case-2: wi.xi = 0*1.1 + 1*1.1 = 1.1
This is greater than the threshold of 1,so the output=1 (since Target =Output)
From case-3: wi.xi=1*1.1 + 0*1.1= 1.1
This is greater than the threshold of 1,so the output=1(since Target =Output)
Case-4: A=1,B=1 and Target=1
Wi.xi=1*1.1+1*1.1= 2.2
This is greater than the threshold of 1,so the output=1(since Target =Output)
Network draw

1.1
Weighted sum
Output

Σ Θ=1

1.1

B
Thank You

You might also like