You are on page 1of 111

Unit-I

Neuro fuzzy and Genetic


Programming
Dr. G. Paavai Anand
Fuzzy
• In fuzzy mathematics, fuzzy logic is a form of
many-valued logic in which the truth values of
variables may be any real number between 0
and 1 both inclusive. It is employed to handle
the concept of partial truth, where the truth
value may range between completely true and
completely false.
Neural networks
• A neural network is a network or circuit
of neurons, or in a modern sense, an artificial
neural network, composed of artificial
neurons or nodes
Neuro Fuzzy
• A neuro-fuzzy system is a fuzzy system that
uses a learning algorithm derived from or
inspired by neural network theory to
determine its parameters (fuzzy sets and fuzzy
rules) by processing data samples.
Genetic Programming
• Genetic programming is a domain-
independent method that genetically breeds a
population of computer programs to solve a
problem.
• Specifically, genetic programming iteratively
transforms a population of computer
programs into a new generation of programs
by applying analogs of naturally
occurring genetic operations.
History of neural networks
• 1943: McCulloch and Pitts model neural networks
based on their understanding of neurology.
– Neurons embed simple logic functions:
• a or b
• a and b
• 1950s:
– Farley and Clark
• IBM group that tries to model biological behavior
• Consult neuro-scientists at McGill, whenever stuck
– Rochester, Holland, Haibit and Duda
History
• Perceptron (Rosenblatt 1958)
– Three layer system:
• Input nodes
• Output node
• Association layer
– Can learn to connect or associate a given input to a random
output unit
– In 1959, Bernard Widrow and Marcian Hoff of
Stanford
- developed models they called ADALINE and MADALINE. These models were
named for their use of Multiple ADAptive LINear Elements.
– MADALINE was the first neural network to be applied to a real world
problem. It is an adaptive filter which eliminates echoes on phone lines.
This neural network is still in commercial use.
• Minsky and Papert
– Showed that a single layer perceptron cannot learn the XOR of
two binary inputs
– Lead to loss of interest (and funding) in the field
History
• Perceptron (Rosenblatt 1958)
– Association units A1, A2, … extract features from user input
– Output is weighted and associated
– Function fires if weighted sum of input exceeds a
threshold.
History
• Back-propagation learning method (Werbos 1974)
– Three layers of neurons
• Input, Output, Hidden
– Better learning rule for generic three layer networks
– Regenerates interest in the 1980s
• Successful applications in medicine, marketing, risk
management, … (1990)
• In need for another breakthrough.
The Brain vs. Computer

1. 10 billion neurons 1. Faster than neuron (10-9 sec)


2. 60 trillion synapses cf. neuron: 10-3 sec
3. Distributed processing 3. Central processing
4. Nonlinear processing 4. Arithmetic operation (linearity)
5. Parallel processing 5. Sequential processing
Why Artificial Neural Networks?
There are two basic reasons why we are interested in
building artificial neural networks (ANNs):

• Technical viewpoint: Some problems such as


character recognition or the prediction of future
states of a system require massively parallel and
adaptive processing.

• Biological viewpoint: ANNs can be used to


replicate and simulate components of the human
(or animal) brain, thereby giving us insight into
natural information processing.
Artificial Neural Networks
• The “building blocks” of neural networks are the
neurons.
• In technical systems, we also refer to them as units or nodes.
• Basically, each neuron
 receives input from many other neurons.
 changes its internal state (activation) based on the current
input.
 sends one output signal to many other neurons, possibly
including its input neurons (recurrent network).
Artificial Neural Networks
• Information is transmitted as a series of electric
impulses, so-called spikes.

• The frequency and phase of these spikes encodes the


information.

• In biological systems, one neuron can be connected to as


many as 10,000 other neurons.

• Usually, a neuron receives its information from other


neurons in a confined area, its so-called receptive field.
How do ANNs work?
 An artificial neural network (ANN) is either a hardware
implementation or a computer program which strives to
simulate the information processing capabilities of its biological
exemplar. ANNs are typically composed of a great number of
interconnected artificial neurons. The artificial neurons are
simplified models of their biological counterparts.
 ANN is a technique for solving problems by constructing software
that works like our brains.
How do our brains work?
▪ The Brain is A massively parallel information processing system.
▪ Our brains are a huge network of processing elements. A typical brain contains a
network of 10 billion neurons.
How do our brains work?
▪ A processing element

Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do our brains work?
▪ A processing element

A neuron is connected to other neurons through about 10,000


synapses
How do our brains work?
▪ A processing element

A neuron receives input from other neurons. Inputs are combined.


How do our brains work?
▪ A processing element

Once input exceeds a critical level, the neuron discharges a spike ‐


an electrical pulse that travels from the body, down the axon, to
the next neuron(s)
How do our brains work?
▪ A processing element

The axon endings almost touch the dendrites or cell body of the
next neuron.
How do our brains work?
▪ A processing element

Transmission of an electrical signal from one neuron to the next is


effected by neurotransmitters.
How do our brains work?
▪ A processing element

Neurotransmitters are chemicals which are released from the first neuron
and which bind to the
Second.
How do our brains work?
▪ A processing element

This link is called a synapse. The strength of the signal that


reaches the next neuron depends on factors such as the amount of
neurotransmitter available.
How do ANNs work?

An artificial neuron is an imitation of a human neuron


How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
............
Input xm x2 x1

Processing ∑
∑= X1+X2 + ….+Xm =y

Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑ ∑= X1w1+X2w2 + ….+Xmwm
=y

Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑
Transfer Function
f(vk)
(Activation Function)

Output y
The output is a function of the input, that is
affected by the weights, and the transfer
functions
Artificial Neural Networks
 An ANN can:
1. compute any computable function, by the appropriate
selection of the network topology and weights values.
2. learn from experience!
▪ Specifically, by trial‐and‐error
From Biological Neuron to
Artificial Neuron

Dendrite Cell Body Axon


From Biology to
Artificial Neural Networks
Three types of layers: Input, Hidden, and
Output
Types of Layers
• The input layer.
– Introduces input values into the network.
– No activation function or other processing.
• The hidden layer(s).
– Perform classification of features
– Two hidden layers are sufficient to solve any problem
– Features imply more layers may be better
• The output layer.
– Functionally just like the hidden layers
– Outputs are passed on to the world outside the neural
network.
34
Perceptron Training - Threshold

1 if  wixi >t
output=
{ i=0
0 otherwise

 Linear threshold is used.


 W - weight value
 t - threshold value

35
• Bias is a constant which helps the model in
a way that it can fit best for the given data.
NN - Bias • In other words, Bias is a constant which
gives freedom to perform best. This
is Bias.

1 if  wixi >t
AND with a Biased input
{
output= i=0
0 otherwise
-1
W1 = 1.5

X W2 = 1 t = 0.0

W3 = 1
36 Y
Activation functions
• Transforms neuron’s input into output.
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation
levels through the network.
• Simple and easy to calculate

37
Standard activation functions

• The hard-limiting threshold function


– Corresponds to the biological paradigm
• either fires or not
• Sigmoid functions ('S'-shaped curves)
1
– The logistic function f(x) =
1 + e -ax
– The hyperbolic tangent (symmetrical)
– Both functions have a simple differential
– Only the shape is important
38
Parameter setting

• Number of layers
• Number of neurons
• too many neurons, require more training time
• Learning rate
• from experience, value should be small ~0.1
• Momentum term
• ..

39
Over-fitting

• With sufficient nodes can classify any


training set exactly
• May have poor generalisation ability.
• Cross-validation with some patterns
– Typically 30% of training patterns
– Validation set error is checked each epoch
– Stop training if validation error goes up

40
Training time

• How many epochs of training?


– Stop if the error fails to improve (has reached a
minimum)
– Stop if the rate of improvement drops below a
certain level
– Stop if the error reaches an acceptable level
– Stop when a certain number of epochs have
passed

41
Learning algorithm
While epoch produces an error
Present network with next inputs from epoch
Error = T – O
If Error <> 0 then
Wj = Wj + LR * Ij * Error
End If
End While

42
Learning algorithm
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1

43
Learning algorithm
Target Value, T : When we are training a network we not
only present it with the input but also with a value
that we require the network to produce. For
example, if we present the network with [1,1] for
the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1
44
Training Perceptrons
For AND
-1
A B Output
W1 = ?
00 0
01 0
x t = 0.0
W2 = ? 10 0
11 1
W3 = ?
y

•What are the weight values?


•Initialize with random weight values

45
Training Perceptrons
For AND
-1
A B Output
W1 = 0.3
00 0
01 0
x t = 0.0
W2 = 0.5 10 0
11 1
W3 =-0.4
y

I1 I2 I3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

46
Learning in Neural Networks
 Learn values of weights from I/O pairs
 Start with random weights
 Load training example’s input
 Observe computed input
 Modify weights to reduce difference
 Iterate over all training examples
 Terminate when weights stop changing OR when error is
very small

47
Decision boundaries

• In simple cases, divide feature space by


drawing a hyperplane across it.
• Known as a decision boundary.
• Discriminant function: returns different values
on opposite sides. (straight line)
• Problems which can be thus classified are
linearly separable.

48
Decision Surface of a Perceptron
x2 x2

+
+ + -
+ -
- x1
x1
+ - - +
-
Linearly separable Non-Linearly separable

• Perceptron is able to represent some useful functions


• AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1
• But functions that are not linearly separable (e.g. XOR)
are not representable

49
Linear Separability

X1
A
A
A B Decision
A Boundary
B
A B
B
A B

B
A B
X2
B
50
Rugby players & Ballet dancers

2 Rugby ?

Height (m)
Ballet?
1

50 120
Weight (Kg)
51
Hyperplane partitions

• A single Perceptron (i.e. output unit) with


connections from each input can perform,
and learn, a linear separation.
• Perceptrons have a step function activation.

52
Hyperplane partitions

• An extra layer models a convex hull


– “An area with no dents in it”
– Perceptron models, but can’t learn
– Sigmoid function learning of convex hulls
– Two layers add convex hulls together
– Sufficient to classify anything “sane”.
• In theory, further layers add nothing
• In practice, extra layers may be better

53
Different Non-Linearly
Separable Problems
Types of Exclusive-OR Classes with Most General
Structure
Decision Regions Problem Meshed regions Region Shapes

Single-Layer Half Plane A B


Bounded By B
A
Hyperplane B A

Two-Layer Convex Open A B


Or B
A
Closed Regions B A

Three-Layer Arbitrary
(Complexity A B
B
Limited by No. A
of Nodes) B A
54
Multilayer Perceptron (MLP)
Output Values

Output Layer
Adjustable
Weights

Input Layer

Input Signals (External Stimuli)

55
Solving the XOR Problem
o1

w11
Network x1 w13
Topology: w21 w01
y
2 hidden nodes w12 -1 w23
w03
1 output x2 w22
-1
w02 o2
Desired behavior: -1

x1 x2 o1 o2 y Weights:
0 0 0 0 0 w11= w12=1
1 0 0 1 1 w21=w22 = 1
0 1 0 1 1 w01=3/2; w02=1/2; w03=1/2
1 1 1 1 0 w13=-1; w23=1
How it works?
 Set initial values of the weights randomly.
 Input: truth table of the XOR
 Do
▪ Read input (e.g. 0, and 0)
▪ Compute an output (e.g. 0.60543)
▪ Compare it to the expected output. (Diff= 0.60543)
▪ Modify the weights accordingly.
 Loop until a condition is met
▪ Condition: certain number of iterations
▪ Condition: error threshold
Design Issues
 Initial weights (small random values ∈[‐1,1])
 Transfer function (How the inputs and the weights are
combined to produce output?)
 Error estimation
 Weights adjusting
 Number of neurons
 Data representation
 Size of training set
Transfer Functions
 Linear: The output is proportional to the total
weighted input.
 Threshold: The output is set at one of two values,
depending on whether the total weighted input is
greater than or less than some threshold value.
 Non‐linear: The output varies continuously but not
linearly as the input changes.
Error Estimation
 The root mean square error (RMSE) is a frequently-
used measure of the differences between values
predicted by a model or an estimator and the values
actually observed from the thing being modeled or
estimated
Weights Adjusting
 After each iteration, weights should be adjusted to
minimize the error.
– All possible weights
– Back propagation
Architecture
Feedforward Network
Feedforward networks often have one or more hidden layers of sigmoid neurons followed
by an output layer of linear neurons.
Multiple layers of neurons with nonlinear transfer functions allow the network to learn
nonlinear and linear relationships between input and output vectors.
The linear output layer lets the network produce values outside the range -1 to +1. On the
other hand, if you want to constrain the outputs of a network (such as between 0 and 1),
then the output layer should use a sigmoid transfer function (such as logsig).
Difference between Hebb rule and
perceptron learning rule?
• when the network responds correctly no
connection weights are modified in a
perceptron whereas we modify the weights in
Hebb learning for every input
Ch2: Adaline and Madaline
Adaline : Adaptive Linear neuron
Madaline : Multiple Adaline
2.1 Adaline (Bernard Widrow, Stanford Univ.)
Neuron:

105
Neuron model:

y = ( wT x )

Adaline: Neuron model with linear active function


( x ) = x  y = (wT x ) = wT x
106
2.4 Madaline : Many adaline
○ XOR function
This problem
cannot be solved
by an adaline.

Reason: w1 x1 + w2 x2 =  specifies is a line


in the ( x1 , x2 ) plane.
110
The two neurons in the hidden layer provides two
lines that can separate the plane into three regions.
The two regions containing (0,0) and (1,1) are
associated with the network output of 0. The central
region is associated with the network output of 1.
111

You might also like