You are on page 1of 68

University Of Khartoum

Department Of Electronics & Electrical


Engineering
Software & Control Engineering

EEE52511: NEURAL NETWORKS


& FUZZY SYSTEMS
By: Hiba Hassan Sayed
Lecture 2
17/4/2018 Hiba Hassan: U of K 2

Complexity of Human Neural System


• Biological information processing is robust and
fault-tolerant:
• Early on in life, we have our greatest number of
neurons, then daily thousands of them are lost.
Nevertheless, we continue to function for many
years without an associated decline in our
capabilities.
• Biological information processors are flexible:
• We do not need to be reprogrammed when we go
into a new environment; we adapt to the new
environment, i.e. we learn.
17/4/2018 Hiba Hassan: U of K 3

HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and
inconsistent data is possible with computer
programs under specific circumstances.
• Highly sophisticated programming and when the
context of such data has been analyzed in detail.
• We have native ability to handle uncertainty.
• The biological processing unit, the brain, is highly
parallel, small, compact and dissipates little power.
11/7/2017 Ustaza: Hiba Hassan 4

Neural Networks Approach


• How to formulate neural network solutions:
1) Understand and specify your problem in terms of
given inputs and desired outputs.
2) Take the simplest form of network you think might
be used to solve your problem, e.g. a simple
Perceptron.
3) Try to find appropriate connection weights
(including neuron thresholds) so that the network
produces the right outputs for each input in its
training data.
11/7/2017 Ustaza: Hiba Hassan 5

Cont.
4) Use different sets of data; the network is trained
on a set of training data, and its generalization
ability is tested using a new testing data.
5) If the network doesn’t perform well enough, go
back to stage 3 and try harder.
6) If the network still doesn’t perform well enough,
go back to stage 2 and try harder.
7) If the network still doesn’t perform well enough,
go back to stage 1 and try harder.
8) Problem solved – move on to next problem.
11/7/2017 Ustaza: Hiba Hassan 6

Cont.
• There are two important aspects of the network’s
operation to consider:
• Learning: The network must learn decision
boundaries from a set of training patterns so that
these training patterns are classified correctly.
• Generalization: After training, the network must
also be able to generalize, i.e. correctly classify test
patterns it has never seen before.
• Usually we want the neural network to learn in a
way that produces good generalization.
11/7/2017 Ustaza: Hiba Hassan 7

Cont.
• Sometimes, the training data may contain
errors (e.g., noise in the experimental
determination of the input values, or
incorrect classifications).
• In this case, learning the training data
perfectly may make the generalization
worse. There is an important tradeoff
between learning and generalization that
arises quite often.
17/4/2018 Hiba Hassan: U of K 8

Neuron Models
• When the input is a vector, the individual element
inputs are multiplied (dot product) by weights and
the weighted values are fed to the summing
junction.
• Then the output y is given by:

• It has the following neuron model.


17/4/2018 Hiba Hassan: U of K 9

Neuron model with vector input


17/4/2018 Hiba Hassan: U of K 10

A layer of neurons:
17/4/2018 Hiba Hassan: U of K 11

Multiple Layers Neurons


17/4/2018 Hiba Hassan: U of K 12

General FeedForward Artificial Neural


Networks Architecture (FFANN)
• FeedForward ANNs allow signal to travel one-
way only, from input to output.
• FeedForward ANNs tend to be straightforward
networks that associate inputs with outputs. They
are extensively used in pattern recognition.
• Figure above shows the architecture of a Multi-
Layer FeedForward neural network of log sigmoid
neurons it is a counterpart ‫ نظير‬to Multi-Layer
Perceptron (MLP) network.
17/4/2018 Hiba Hassan: U of K 13

Multiple Layers Neurons (cont.)


The above example has R1 inputs, S1 neurons in
the first layer, S2 neurons in the second layer, etc. It
is common for different layers to have different
numbers of neurons. The output of previous figure
is defined as follows :
a  f (LW3, 2 f (LW2,1 f (IW1,1P + b1) + b 2 ) + b3)
3 3 2 1

The layer that produces the network output is the


output layer and all the middle layers are called
the hidden layers.
17/4/2018 Hiba Hassan: U of K 14

TRANSFER (ACTIVATION)
FUNCTIONS
17/4/2018 Hiba Hassan: U of K 15

1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may get insight
into more complicated neurons.

y  b   xi wi
y
0
i
b   xi wi
0

i
17/4/2018 Hiba Hassan: U of K 16

2- Binary threshold neurons


• Developed by McCulloch-Pitts (1943): Also called
the hard limiter transfer function.

1
output

0 threshold weighted input


17/4/2018 Hiba Hassan: U of K 17

2- Binary threshold neurons (cont.)


• There are two equivalent ways to write the
equations for a binary threshold neuron:

z   xi wi z = b + å xi wi
q = -b i
i

1 if z  1 if z³0
y y
0 otherwise 0 otherwise
17/4/2018 Hiba Hassan: U of K 18

3- Rectified Linear Neurons


(linear threshold neurons)
They compute a linear weighted sum of their
inputs.
The output is a non-linear function of the total
input.
z = b + å xi wi
i
z if z >0 y

y = z
0 otherwise 0
4- Sigmoid neurons

z = b+ å xi wi
1
• They give a real-
y=
valued output that is
a smooth and i
1+ e
-z
bounded function of
their total input. 1
• Typically they use
the logistic function 0.5
• They have positive y
derivatives which 0
make learning 0 z
easy.
17/4/2018 Hiba Hassan: U of K 19
17/4/2018 Hiba Hassan: U of K 20

5- Softmax Transfer Function

• When we have several independent binary

attributes by which to classify the data, we need


to use a network with multiple logistic outputs.

• Then we have n output neurons, each one

corresponding to one class, and the target values


are 1 for the correct class, and 0 otherwise.
17/4/2018 Hiba Hassan: U of K 21

Cont.
• Each output neuron will produce a value between 0
& 1, example; 0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic
sigmoid was developed, the softmax activation
function.
• The softmax function has the effect of making the
maximum value of the outputs to be close to 1 and
the rest to be close to 0.
17/4/2018 Hiba Hassan: U of K 22

Softmax Transfer Function (cont.)

e zi
yi =
å e
zj

jÎgroup

• Where z is the value of each output node.


• A suitable cost function to use with softmax is the
negative log probability of the right answer.
• This is called cross entropy cost function, and it is
given by,
C = - å t j log y j
j
17/4/2018 Hiba Hassan: U of K 23

6. Radial Basis and Triangular Basis


transfer functions:

a(n)  exp( n )
2
1  n , if  1  n  1
a ( n)  
0, otherwise
17/4/2018 Hiba Hassan: U of K 24
17/4/2018 Hiba Hassan: U of K 25
17/4/2018 Hiba Hassan: U of K 26

In-class Assignment
• Given a single-input neuron with a weight of 2.3
and a bias of -3. For an input of 2, calculate the
output produced by the following transfer
functions:
I. Hard limit
II. Linear
III. Log-sigmoid
17/4/2018 Hiba Hassan: U of K 27

Learning in Artificial Neural Network:


• Learning aims to improve the performance of a
neural network.
• The memorization of patterns and the
subsequent response of the network can be
categorized into two general paradigms:
• Associative Mapping ‫تنتج نمط معين من وحدات الدخل‬
and
• Regularity Detection ‫تكتشف النموذج من انتظام الوحدات‬.
17/4/2018 Hiba Hassan: U of K 28

Definition of Learning in Artificial


Neural Network:
• Learning in the context of neural network is defined

as: Learning is a process by which the free


parameters of a neural network are adapted through
a process of stimulation ‫حث‬/‫ إثارة‬by the environment
in which the network is embedded. The type of
learning is determined by the manner in which the
parameter changes take place.
17/4/2018 Hiba Hassan: U of K 29

Learning Algorithm
• The learning algorithm is a prescribed set of well-
defined rule for the solution of a learning problem.
• In every learning algorithm, we must specify the
cost function.
• Cost function - is a way of using your training
data to determine values for your parameters
which produces an output function as accurate as
possible.
• The Learning paradigm ‫ نموذج‬is a model of the
environment in which the neural network
operates.
• There are three major learning paradigms.
17/4/2018 Hiba Hassan: U of K 30

1- Supervised Learning
• A teacher is present during the learning process
& the desired output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference
between the network’s computed output and the
expected output.
17/4/2018 Hiba Hassan: U of K 31

2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering
and adapting to the structural features in the input
patterns.
• The cost function is determined by the task
formulation.
• Most applications fall within the domain of
estimation problems such as statistical modeling,
compression, filtering, blind source separation and
clustering.
17/4/2018 Hiba Hassan: U of K 32

2- Unsupervised learning (cont.)


• Unsupervised or self-organised learning; the neural
network is presented with input data only; no target.
• It should discover significant features, or structure, in
the different input patterns.
• Thus they learn to classify the input data into
appropriate categories, or clusters.
• Unsupervised learning tends to follow the neuro-
biological organisation of the brain.
17/4/2018 Hiba Hassan: U of K 33

3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the
network.
• The teacher help by indicating if a computed
output is right or wrong.
• A reward is given for the right one & a penalty is
given for the wrong one.
• Data is usually not given, but generated by an
agent's interactions with the environment.
17/4/2018 Hiba Hassan: U of K 34

3- Reinforced Learning (cont.)


• At each point in time, the agent performs an action
and the environment generates an observation and
the instantaneous cost according to some dynamics.
• The aim is to discover a policy for selecting actions
that minimizes some measure of a long-term cost, i.e.
the expected cumulative cost.
• That is, the goal is to map situations to actions--so as
to maximize a numerical reward signal
• The environment's dynamics and the long-term cost
for each policy are usually unknown, but can be
estimated.
17/4/2018 Hiba Hassan: U of K 35

Cont.
• Tasks that fall within the paradigm of reinforcement
learning are control problems, games and other
sequential decision making tasks.
17/4/2018 Hiba Hassan: U of K 36

Two types of supervised learning


• Each training case consists of an input vector x
and a target output t.

• Regression: The target output is a real number or


a whole vector of real numbers.
• The price of a stock in 6 months time.
• The temperature at noon tomorrow.

• Classification: The target output is a class label.


• The simplest case is a choice between 1 and 0.
• We can also have multiple alternative labels.
17/4/2018 Hiba Hassan: U of K 37

Supervised Learning Example


• Here is an example of a Regression supervised
learning problem.
17/4/2018 Hiba Hassan: U of K 38

Example
• "Given this data, a friend has a house 750 square
feet - how much can they be expected to get?"
There are different approaches that can be used
to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of
doing supervised learning.
17/4/2018 Hiba Hassan: U of K 39

Cont.
• So, a training data is provided in which the actual
price of the house is known.
• The algorithm uses this to learn to predict prices
of houses for any other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
17/4/2018 Hiba Hassan: U of K 40

Example 2
• The following graph shows the number of times a
breast tumor is benign or malignant vs its tumor
size:
17/4/2018 Hiba Hassan: U of K 41

Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is
benign or malignant according to our trained network!
• Can you estimate diagnosis based on tumor size?
• This is an example of a classification problem
• Classify data into one of two discrete classes -
malignant or not.
• In classification problems, we may have
a discrete number of possible values for the output,
e.g. 0 – benign, 1 - type 1, 2 - type 2, 3 - type 4.
• In classification problems we can plot data in different
ways.
17/4/2018 Hiba Hassan: U of K 42

Classification Example (cont.)


• Notice that only the size attribute was used there.
• There may be other attributes to be used such as
age.
17/4/2018 Hiba Hassan: U of K 43

Cont.
• Based on that data, you can try and define separate
classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two
groups.
• Then, when you have an individual with a
specific tumor size and who is a specific age, you
can use that information to place them into one of
your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity
of cell shape…etc.
17/4/2018 Hiba Hassan: U of K 44

Supervised Learning
• A programmer specifies number of units in each layer
and connectivity between units, so the only unknown is
the set of weights associated with the connections.
17/4/2018 Hiba Hassan: U of K 45

Supervised Learning (Cont.)


Algorithm:
• Initialize the weights in the network (usually with
random values).
• Repeat until stopping criterion ‫ معيار‬is met.
• For each example in training set do:
• O=neural network output
• T=desired output (Teacher or Target)
• Update weights
Note: Each pass through all of the training examples
is called epoch ‫فترة زمنية‬.
17/4/2018 Hiba Hassan: U of K 46

Learning Rules:
• A learning rule, also known as training

algorithm, is defined as a procedure for


modifying the weights and biases of a network.

• The learning rule is applied to train the network to

perform some particular task.


17/4/2018 Hiba Hassan: U of K 47

Learning Rules
• These learning types may use different learning
rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even
further according to the rule used.
17/4/2018 Hiba Hassan: U of K 48

Perceptrons – the first NNs


• They are the first neural networks, introduced in
1950s by Frank Rosenblatt along with other
researchers.
• It was developed to perform pattern recognition,
hence it is a classifier.
• It is a fast and reliable network.
• It could be a single layer or multi layered.
• It has limited applications.
17/4/2018 Hiba Hassan: U of K 49

Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the
output neurons.
• It uses supervised training
• Example:
17/4/2018 Hiba Hassan: U of K 50
17/4/2018 Hiba Hassan: U of K 51

Cont.
• Based on that simple example, now we can
develop the learning rule for a perceptron.
• The perceptron , usually, uses a hard limit
activation function as shown in the following
figures.
17/4/2018 Hiba Hassan: U of K 52

Perceptrons
One perceptron neuron
17/4/2018 Hiba Hassan: U of K 53

One Perceptron layer


17/4/2018 Hiba Hassan: U of K 54

A layer of Perceptrons
17/4/2018 Hiba Hassan: U of K 55

Multilayer Perceptron
17/4/2018 Hiba Hassan: U of K 56

Perceptron Learning Rule


• First, we define the perceptron error e;
e = t – a,
Where; t = target, a = output.
• Hence, we update the weight via the following
rule:
wnew = w old + ep = w old + (t – a)p
For bias; bnew = b old +e
17/4/2018 Hiba Hassan: U of K 57

In-class Assignment
• Train a network to sort oranges from apples based
on 3 features; shape, texture and weight. Prototype
oranges (p1) and apples (p2) are:

• Assume the following weights & bias:


17/4/2018 Hiba Hassan: U of K 58

Perceptron Learning Rule: (Convergence


Theorem ‫) نظرية التقارب‬
• Perceptrons are trained on examples of desired
behavior. The desired behavior can be
summarized by a set of input/output pairs.

• Where p is network input & t is the corresponding


target. The objective is to reduce the error e
between the neuron response a, & the target
vector t (t – a).
17/4/2018 Hiba Hassan: U of K 59

Cont.
• The perceptron learning rule (e.g. learnp in Matlab)
calculates desired changes to the perceptron's weights
and biases given an input vector p, and the associated
error e.
• The target vector t must contain values of either 0 or 1,
as perceptrons (with hardlim transfer functions) can only
output such values.
• By carefully increasing the number of epochs, i.e. each
time learnp is executed, the perceptron has a better
chance of getting closer to the target values, & hence
converging.
17/4/2018 Hiba Hassan: U of K 60

The Decision Boundary


• The decision boundary is a line in the input space
(vector space); on one side of the line, the network
output is 0 while on the other side, the network output
is 1.
• Decision boundary Example: Suppose that we have
a 2- input perceptron with one neuron, as shown in the
next figure, & we want to calculate its decision
boundary.
• The decision boundary is determined by the input
vectors for which the net input n is zero:
17/4/2018 Hiba Hassan: U of K 61

Example

• We assume the following values for the


weights:
17/4/2018 Hiba Hassan: U of K 62

Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0

• Set p2 = 0

• Now, we can test one point to determine which side


of the boundary corresponds to a decision of 1.
17/4/2018 Hiba Hassan: U of K 63

p  2,0
• Consider the input T

The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
17/4/2018 Hiba Hassan: U of K 64

Decision Boundary (cont.)


17/4/2018 Hiba Hassan: U of K 65

Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
17/4/2018 Hiba Hassan: U of K 66

Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;

• That leads us to this graph.


17/4/2018 Hiba Hassan: U of K 67

Perceptron Learning Rule (Summary)


1. Choose initial weights randomly.
2. Present a randomly chosen pattern x.
3. Update weights using Delta rule:
wij (t+1) = wij (t) + ei * xj

where ei = (targeti - outputi)

4. Repeat steps 2 and 3 until the stopping criterion


(convergence, max number of iterations) is
reached.
17/4/2018 Hiba Hassan: U of K 68

Cont.
• The process of finding new weights (and biases)
can be repeated until there are no errors.
• Note that the perceptron learning rule is guaranteed
to converge in a finite number of steps for all
problems that can be solved by a perceptron.
• These include all classification problems that are
"linearly separable" ‫ تقبل الفصل خطيا‬.

You might also like