You are on page 1of 35

EXPERT SYSTEMS AND SOLUTIONS

Email: expertsyssol@gmail.com
expertsyssol@yahoo.com
Cell: 9952749533
www.researchprojects.info
PAIYANOOR, OMR, CHENNAI
Call For Research Projects Final
year students of B.E in EEE, ECE,
EI, M.E (Power Systems), M.E
(Applied Electronics), M.E (Power
Electronics)
Ph.D Electrical and Electronics.
Students can assemble their hardware in our
Research labs. Experts will be guiding the
projects.

1
Learning in Neural Networks

Sibel KAPLAN

İZMİR-2004
2
Program...

 History
 Introduction
 Models of Processing Elements (neurons)
 Models of Synaptic Interconnections
 Learning Rules for Updating the Weights
 Supervised learning
 Perceptron Learning
 Adaline Learning
 Back Propagation (an example)
 Reinforcement Learning
 Unsupervised Learning
 Hebbian Learning Rule

3
History...

 1943: ANNs were first studied by McCollach and Pitts


1949: Donal Hebb used ANN to store knowledge
1958: Rosenblatt found perceptron and it was started to use for pattern
recognition
1960: Widrow-Marcian Hoff found ADALINE
1982: J.J. Hopfield published his study about single-layer feedback
neural network with symmetric weights
1990s: It was applied to different kinds of problems, softwares were
begun to be used

4
The Real and Artificial Neurons

5
 ANNs are systems that are constructed to make use of some
organizational principles resembling those of the human brain
 ANNs are good at tasks such as;
 pattern matching and classification
 function approximation
 optimization
 vector quantitization
 data clustering

6
The Neuron (Processing Element)

Bias
b
x1 w1
Activation
Local function
Field


Output
x2 w2 v  () y
Input
values
Summing
  Function (net input)

xm wm
weights
7
Models of ANNs are specified by three basic entitites:
 Models of the neurons themselves,
 Models of synaptic interconnections and structures,

 Training or learning rules for updating the connecting


weights

8
There are three important parameters about a neuron:

I. An integrated function associated with the input of a


neuron to calculate the net input (for M-P neuron):
m
fi neti = w
j 1
ij x j (t )  Qi

II. A set of links, describing the neuron inputs, with


weights W1, W2, …, Wm

9
III. Activation Function

 Activation functions [a(f)] output an activation value as a function of


its net input. Some commonly used activation functions are;
 Step Function
 Hard limiter (thresold function)
 Ramp function
 Unipolar sigmoid function
 Bipolar sigmoid function

10
Connections

An ANN consists of a set of highly interconnected neurons


such that each neuron output is connected through
weights to other neurons or to itself.

11
Basic types of connection geometries (Nelson
and Illingworth, 1991).

w11 x1
x1 y1 y1
w21
x2 y2 x2
y2
xm w23
yn
wnm xm
yn

Single-Layer Feedforward Single Node with


Single Layer Recurrent
Network Feedback to Itself
Network

12
Basic types of connection geometries (Nelson and
Illingworth, 1991).

x1 x1 y1
y1

x2 y2 x2 y2

xm
yn xm yn

Input Hidden Layers Output


Layer Layer

Multilayer Feedforward Network Multilayer Recurrent Network

13
Learning Rules

Generally, we can classify learning in ANNs in two broad


classes:
 Parameter learning which is concerned with updating of
the connecting weights

 Structure learning which focuses on the change in the


network structure, including the number of PEs and their
connection types.

These two kinds of learning can be performed


simultaneously or separately.
14
In weight learning, we have to develop learning rules to
efficiently guide the weight matrix W in approaching a
desired matrix that yields the desired network
performance. In general, learning rules are classified into
three categories:

 Supervised Learning (learning with a teacher)

 Reinforcement Learning (learning with a critic)

 Unsupervised learning

15
In supervised learning;
when input is applied to an ANN, the corresponding desired response of
the system is given. An ANN is supplied with a sequence of examples (x 1,
d1) , (x2, d2)...(xk, dk) of desired input-output pairs.

In reinforcement learning;

only less detailed information than supervised learning is available.


There is only a single bit of feedback information indicating whether the
output is right or wrong. That is, it just says how good or how bad a
particular output is and provides no hint as to what the right answer
should be.
16
In unsupervised learning;
there is no teacher to provide any feedback information. The
network must discover for itself patterns, features,
regularities, correlations or categories in the input data and
code for them in the output. While discovering these
features, the network undergoes changes in its parameters;
this process is called self-organizing.

17
Three Categories of Learning

Supervised Learning Reinforcement Learning

Unsupervised Learning

18
Machine Learning

Supervised Unsupervised
Data: Data:
Labeled examples Unlabeled examples
(input , desired output) (different realizations of the
input)
Problems:
classification Problems:
pattern recognition clustering
regression content addressable memory

NN models: NN models:
perceptron self-organizing maps (SOM)
adaline Hamming networks
back-propagation Hopfield networks
hierarchical networks

19
Knowledge about the learning task is given in the form of
examples called training examples.

The aim of the training process is to obtain a NN that


generalizes well, that is, that behaves correctly on new
instances of the learning task.

20
A general form of the weight learning rule indicates that the
incremet of the weight vector wi produced by the learning
step at time t is proportional to the product of the learning
signal r and the input x(t);
wi (t )  rx(t )

r = fr (wi, x, di ) (learning signal)

Hence, the increse in the weight vector;


w ( t 1)
i w
(t )
i  f r ( w , x , d ) x
(t )
i
(t )
i
(t ) (t )

21
Perceptron Learning Rule

 The perceptron is used for binary classification.

Training is done by error back-propagation algorithm.

 For simple perceptrons with linear threshold units (LTUs),


the desired outputs di(k) can take only ± 1 values. Then;

y (k )
i  sgn( w x
T
i
(k )
)d i
(k )

It is necessary to find a hyperplane that divides inputs that


have positive and negative targets (or outputs). 22
Linear Separability in Perceptron

decision
boundary I1 I1
I1

C2
?
C1

I2 I2 I2
I1 and I2 I1 or I2 I1 xor I2

 wixi  w0  0
i 1
w1x1 + w2x2 + w0 = 0

23
The condition for solvability of a pattern classification
problem by a simple perceptron depends on whether the
problem is linearly separable or not.
If a classification problem is linearly separable by a simple
perceptron then;
T
T
w x 0
i
w x 0
i

(desired output = +1) (desired output = -1)

24
Adaline (Widrow-Hoff) Learning Rule (1962)

When the two classes are not linearly separable, it may be


desirable to obtain a linear separator that minimizes the
mean squared error.

Adaline (Adaptive Linear Element);


 uses a linear neuron model
 uses the Least-Mean-Square (LMS) learning algorithm
 useful for robust linear classification and regression

25
To update the weights, we define a cost function E(w)
which measures the system’s performance error:
1 p (k )
E ( w)   (d  y ( k ) ) 2
2 k 1
For weight adjustment; w   w E (w)

E p
w j      (d k  wT x ( k ) ) x (jk )
w j k 1

26
Back-Propagation

One of the most important historical development in ANNs.


Applied to multilayer feedforward networks consisting of processing
elements with continuous differentiable activation functions.
Process:
 The input patterns is propagated from the input layer to the output
layer and it produces an actual output
 The error signals resulting from the difference between dk and yk are
back-propagated from the output layer to the previous layers for them to
update their weights.

27
Three-Layer Back-Propagation Network

l l m
y i  a(net i )  a( wiq z q )  a( wiq a ( v qj x j ))
yi q 1 q 1 j 1

i=1,2,...,n
wiq n
z q  a(net q )  a( v qj x j )
zq q=1,2,...,l
j 1

vqj
m
Xj J=1,2,...,m net q   v qj x j
j 1
xj xm
x1

These indicate the forward propagation of input signals through the


layers of neurons.

28
A cost function is defined as in Adaline Learning Rule;
1 n 1 n
E ( w)   (d i  y i )    d i  a (net i )
2 2

2 i 1 2 i 1
Then according to the gradient-descent method, the weights in the
hidden to output connections are updated by;

E
wiq  
wiq

  
wiq   d i  y i  a , (net i ) z q   oi z q
29
Hebb’s Learning Law (1949)

According to the law;

when an axonal input from Neuron A to neuron B causes


neuron B to immediately emit a pulse (fire) and this
situation happens repeatedly or persistently, then the
chance of that axonal input in terms of its ability to help
neuron B to fire in the future is somehow increased. It is a
rule used also for other learning rules.

30
According to Hebbian Hypothesis;
r  a (wiTx ) = yi

a(.): activation function of neuron

In this equation, the learning signal r is simply set as the


neuron’s current output. Then the incremet in the weights:
wij  a ( w x) x j  y i x j
T
i

(i = 1, 2,.........n); (j = 1, 2,..............m)
31
Tips about important learning factors

 It is suggested choosing the initial weights between


 3 / k i ,3 / k k 
(Wessels and Barnard, 1992)

 Learning constant ( ) is usually chosen experimentally for


each problem. A larger value could speed up the
convergence but might result in overshooting. Values
ranging from 10-3 to 10 have been used succesfully for many
applications.

32
An example about back-propagation...

W11 =0.2 ,W21 =0.1 (i=1,2)


w11 G11
W12
I1 Q1
=0.4, W22 =0. 2 (j=1,2,3)
W13
w13  : 0,3
=0.7, W23 =0.2 I(k=1,2)
=0.6, I2=0.7
Q2 1
 : 0,7
I2 w23
G32

Gw11 = 0.1,Gw21=0.3,Gw31 =0.4 (j=1,2,3) 1


f1  sigmoid
Gw12 = 0.2,Gw22 = 0.5,Gw32 = 0.3 ( k=1,2)
1  e  Net1 function

33
2
Net J   I i wij  Bij (for j=1)
i 1

2
Net1   I i wi1  B11 = Net1  I 1 w11  I 2 w21 (B11=0)
i 1
=0.6*0.2+0.7*0.1=0.19
1
A1  = 0.547
1  e 0.19

34
3
Net1   A j Gw j1= A1Gw11  A2 Gw21  A3Gw31
j 1
= 0.547*0.1+0.593*0.3+0.636*0.4
= 0.487
1
Q1   0.487
= 0.619 (final output)
1 e
Error of the first output : E1= (Q1 desired-Q1 real)

= 0.5-0.619 = -0.119
1 2
 1
 
2
Total cost (error) = ( E1 E 2 ) = ( 0 . 119 ) 2
 ( 0. 5449 ) 2
2 2
= 0.1555
35

You might also like