Robotics 15

EXPERT SYSTEMS AND SOLUTIONS
Email: expertsyssol@gmail.com
expertsyssol@yahoo.com
Cell: 9952749533
www.researchprojects.info
PAIYANOOR, OMR, CHENNAI
Call For Research Projects Final
year students of B.E in EEE, ECE,
EI, M.E (Power Systems), M.E
(Applied Electronics), M.E (Power
Electronics)
Ph.D Electrical and Electronics.
Students can assemble their hardware in our
Research labs. Experts will be guiding the
projects.
1
Learning in Neural Networks
Sibel KAPLAN
İZMİR-2004
2
Program...
 History
 Introduction
 Models of Processing Elements (neurons)
 Models of Synaptic Interconnections
 Learning Rules for Updating the Weights
 Supervised learning
 Perceptron Learning
 Adaline Learning
 Back Propagation (an example)
 Reinforcement Learning
 Unsupervised Learning
 Hebbian Learning Rule
3
History...
 1943: ANNs were first studied by McCollach and Pitts

1949: Donal Hebb used ANN to store knowledge
1958: Rosenblatt found perceptron and it was started to use for pattern
recognition
1960: Widrow-Marcian Hoff found ADALINE
1982: J.J. Hopfield published his study about single-layer feedback
neural network with symmetric weights
1990s: It was applied to different kinds of problems, softwares were
begun to be used
4
The Real and Artificial Neurons
5
 ANNs are systems that are constructed to make use of some
organizational principles resembling those of the human brain
 ANNs are good at tasks such as;
 pattern matching and classification
 function approximation
 optimization
 vector quantitization
 data clustering
6
The Neuron (Processing Element)
Bias
b
x1 w1
Activation
Local function
Field

Output
x2 w2 v  () y
Input
values
Summing
  Function (net input)
xm wm
weights
7
Models of ANNs are specified by three basic entitites:
 Models of the neurons themselves,
 Models of synaptic interconnections and structures,
 Training or learning rules for updating the connecting

weights
8
There are three important parameters about a neuron:
I. An integrated function associated with the input of a

neuron to calculate the net input (for M-P neuron):
m
fi neti = w
j 1
ij x j (t )  Qi
II. A set of links, describing the neuron inputs, with

weights W1, W2, …, Wm
9
III. Activation Function
 Activation functions [a(f)] output an activation value as a function of

its net input. Some commonly used activation functions are;
 Step Function
 Hard limiter (thresold function)
 Ramp function
 Unipolar sigmoid function
 Bipolar sigmoid function
10
Connections
An ANN consists of a set of highly interconnected neurons

such that each neuron output is connected through
weights to other neurons or to itself.
11
Basic types of connection geometries (Nelson
and Illingworth, 1991).
w11 x1
x1 y1 y1
w21
x2 y2 x2
y2
xm w23
yn
wnm xm
yn
Single-Layer Feedforward Single Node with

Single Layer Recurrent
Network Feedback to Itself
Network
12
Basic types of connection geometries (Nelson and
Illingworth, 1991).
x1 x1 y1
y1
x2 y2 x2 y2
xm
yn xm yn
Input Hidden Layers Output

Layer Layer
Multilayer Feedforward Network Multilayer Recurrent Network
13
Learning Rules
Generally, we can classify learning in ANNs in two broad

classes:
 Parameter learning which is concerned with updating of
the connecting weights
 Structure learning which focuses on the change in the

network structure, including the number of PEs and their
connection types.
These two kinds of learning can be performed

simultaneously or separately.
14
In weight learning, we have to develop learning rules to
efficiently guide the weight matrix W in approaching a
desired matrix that yields the desired network
performance. In general, learning rules are classified into
three categories:
 Supervised Learning (learning with a teacher)
 Reinforcement Learning (learning with a critic)
 Unsupervised learning
15
In supervised learning;
when input is applied to an ANN, the corresponding desired response of
the system is given. An ANN is supplied with a sequence of examples (x 1,
d1) , (x2, d2)...(xk, dk) of desired input-output pairs.
In reinforcement learning;
only less detailed information than supervised learning is available.

There is only a single bit of feedback information indicating whether the
output is right or wrong. That is, it just says how good or how bad a
particular output is and provides no hint as to what the right answer
should be.
16
In unsupervised learning;
there is no teacher to provide any feedback information. The
network must discover for itself patterns, features,
regularities, correlations or categories in the input data and
code for them in the output. While discovering these
features, the network undergoes changes in its parameters;
this process is called self-organizing.
17
Three Categories of Learning
Supervised Learning Reinforcement Learning
Unsupervised Learning
18
Machine Learning
Supervised Unsupervised
Data: Data:
Labeled examples Unlabeled examples
(input , desired output) (different realizations of the
input)
Problems:
classification Problems:
pattern recognition clustering
regression content addressable memory
NN models: NN models:
perceptron self-organizing maps (SOM)
adaline Hamming networks
back-propagation Hopfield networks
hierarchical networks
19
Knowledge about the learning task is given in the form of
examples called training examples.
The aim of the training process is to obtain a NN that

generalizes well, that is, that behaves correctly on new
instances of the learning task.
20
A general form of the weight learning rule indicates that the
incremet of the weight vector wi produced by the learning
step at time t is proportional to the product of the learning
signal r and the input x(t);
wi (t )  rx(t )
r = fr (wi, x, di ) (learning signal)
Hence, the increse in the weight vector;

w ( t 1)
i w
(t )
i  f r ( w , x , d ) x
(t )
i
(t )
i
(t ) (t )
21
Perceptron Learning Rule
 The perceptron is used for binary classification.
Training is done by error back-propagation algorithm.
 For simple perceptrons with linear threshold units (LTUs),

the desired outputs di(k) can take only ± 1 values. Then;
y (k )
i  sgn( w x
T
i
(k )
)d i
(k )
It is necessary to find a hyperplane that divides inputs that

have positive and negative targets (or outputs). 22
Linear Separability in Perceptron
decision
boundary I1 I1
I1
C2
?
C1
I2 I2 I2
I1 and I2 I1 or I2 I1 xor I2
 wixi  w0  0
i 1
w1x1 + w2x2 + w0 = 0
23
The condition for solvability of a pattern classification
problem by a simple perceptron depends on whether the
problem is linearly separable or not.
If a classification problem is linearly separable by a simple
perceptron then;
T
T
w x 0
i
w x 0
i
(desired output = +1) (desired output = -1)
24
Adaline (Widrow-Hoff) Learning Rule (1962)
When the two classes are not linearly separable, it may be

desirable to obtain a linear separator that minimizes the
mean squared error.
Adaline (Adaptive Linear Element);

 uses a linear neuron model
 uses the Least-Mean-Square (LMS) learning algorithm
 useful for robust linear classification and regression
25
To update the weights, we define a cost function E(w)
which measures the system’s performance error:
1 p (k )
E ( w)   (d  y ( k ) ) 2
2 k 1
For weight adjustment; w   w E (w)
E p
w j      (d k  wT x ( k ) ) x (jk )
w j k 1
26
Back-Propagation
One of the most important historical development in ANNs.

Applied to multilayer feedforward networks consisting of processing
elements with continuous differentiable activation functions.
Process:
 The input patterns is propagated from the input layer to the output
layer and it produces an actual output
 The error signals resulting from the difference between dk and yk are
back-propagated from the output layer to the previous layers for them to
update their weights.
27
Three-Layer Back-Propagation Network
l l m
y i  a(net i )  a( wiq z q )  a( wiq a ( v qj x j ))
yi q 1 q 1 j 1
i=1,2,...,n
wiq n
z q  a(net q )  a( v qj x j )
zq q=1,2,...,l
j 1
vqj
m
Xj J=1,2,...,m net q   v qj x j
j 1
xj xm
x1
These indicate the forward propagation of input signals through the

layers of neurons.
28
A cost function is defined as in Adaline Learning Rule;
1 n 1 n
E ( w)   (d i  y i )    d i  a (net i )
2 2
2 i 1 2 i 1
Then according to the gradient-descent method, the weights in the
hidden to output connections are updated by;
E
wiq  
wiq
  
wiq   d i  y i  a , (net i ) z q   oi z q
29
Hebb’s Learning Law (1949)
According to the law;
when an axonal input from Neuron A to neuron B causes

neuron B to immediately emit a pulse (fire) and this
situation happens repeatedly or persistently, then the
chance of that axonal input in terms of its ability to help
neuron B to fire in the future is somehow increased. It is a
rule used also for other learning rules.
30
According to Hebbian Hypothesis;
r  a (wiTx ) = yi
a(.): activation function of neuron
In this equation, the learning signal r is simply set as the

neuron’s current output. Then the incremet in the weights:
wij  a ( w x) x j  y i x j
T
i
(i = 1, 2,.........n); (j = 1, 2,..............m)
31
Tips about important learning factors
 It is suggested choosing the initial weights between

 3 / k i ,3 / k k 
(Wessels and Barnard, 1992)
 Learning constant ( ) is usually chosen experimentally for

each problem. A larger value could speed up the
convergence but might result in overshooting. Values
ranging from 10-3 to 10 have been used succesfully for many
applications.
32
An example about back-propagation...
W11 =0.2 ,W21 =0.1 (i=1,2)

w11 G11
W12
I1 Q1
=0.4, W22 =0. 2 (j=1,2,3)
W13
w13  : 0,3
=0.7, W23 =0.2 I(k=1,2)
=0.6, I2=0.7
Q2 1
 : 0,7
I2 w23
G32
Gw11 = 0.1,Gw21=0.3,Gw31 =0.4 (j=1,2,3) 1

f1  sigmoid
Gw12 = 0.2,Gw22 = 0.5,Gw32 = 0.3 ( k=1,2)
1  e  Net1 function
33
2
Net J   I i wij  Bij (for j=1)
i 1
2
Net1   I i wi1  B11 = Net1  I 1 w11  I 2 w21 (B11=0)
i 1
=0.6*0.2+0.7*0.1=0.19
1
A1  = 0.547
1  e 0.19
34
3
Net1   A j Gw j1= A1Gw11  A2 Gw21  A3Gw31
j 1
= 0.547*0.1+0.593*0.3+0.636*0.4
= 0.487
1
Q1   0.487
= 0.619 (final output)
1 e
Error of the first output : E1= (Q1 desired-Q1 real)
= 0.5-0.619 = -0.119
1 2
 1
 
2
Total cost (error) = ( E1 E 2 ) = ( 0 . 119 ) 2
 ( 0. 5449 ) 2
2 2
= 0.1555
35

Robotics 15

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robotics 15

Uploaded by

Copyright:

Available Formats

EXPERT SYSTEMS AND SOLUTIONS

 1943: ANNs were first studied by McCollach and Pitts

 Training or learning rules for updating the connecting

I. An integrated function associated with the input of a

II. A set of links, describing the neuron inputs, with

 Activation functions [a(f)] output an activation value as a function of

An ANN consists of a set of highly interconnected neurons

Single-Layer Feedforward Single Node with

Input Hidden Layers Output

Multilayer Feedforward Network Multilayer Recurrent Network

Generally, we can classify learning in ANNs in two broad

 Structure learning which focuses on the change in the

These two kinds of learning can be performed

 Supervised Learning (learning with a teacher)

 Reinforcement Learning (learning with a critic)

only less detailed information than supervised learning is available.

Supervised Learning Reinforcement Learning

The aim of the training process is to obtain a NN that

r = fr (wi, x, di ) (learning signal)

Hence, the increse in the weight vector;

 The perceptron is used for binary classification.

Training is done by error back-propagation algorithm.

 For simple perceptrons with linear threshold units (LTUs),

It is necessary to find a hyperplane that divides inputs that

(desired output = +1) (desired output = -1)

When the two classes are not linearly separable, it may be

Adaline (Adaptive Linear Element);

One of the most important historical development in ANNs.

These indicate the forward propagation of input signals through the

According to the law;

when an axonal input from Neuron A to neuron B causes

a(.): activation function of neuron

In this equation, the learning signal r is simply set as the

 It is suggested choosing the initial weights between

 Learning constant ( ) is usually chosen experimentally for

W11 =0.2 ,W21 =0.1 (i=1,2)

Gw11 = 0.1,Gw21=0.3,Gw31 =0.4 (j=1,2,3) 1

You might also like