Lect 2

University Of Khartoum
Department Of Electronics & Electrical

Engineering
Software & Control Engineering
EEE52511: NEURAL NETWORKS

& FUZZY SYSTEMS
By: Hiba Hassan Sayed
Lecture 2
17/4/2018 Hiba Hassan: U of K 2
Complexity of Human Neural System

• Biological information processing is robust and
fault-tolerant:
• Early on in life, we have our greatest number of
neurons, then daily thousands of them are lost.
Nevertheless, we continue to function for many
years without an associated decline in our
capabilities.
• Biological information processors are flexible:
• We do not need to be reprogrammed when we go
into a new environment; we adapt to the new
environment, i.e. we learn.
HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and
inconsistent data is possible with computer
programs under specific circumstances.
• Highly sophisticated programming and when the
context of such data has been analyzed in detail.
• We have native ability to handle uncertainty.
• The biological processing unit, the brain, is highly
parallel, small, compact and dissipates little power.
11/7/2017 Ustaza: Hiba Hassan 4
Neural Networks Approach

• How to formulate neural network solutions:
1) Understand and specify your problem in terms of
given inputs and desired outputs.
2) Take the simplest form of network you think might
be used to solve your problem, e.g. a simple
Perceptron.
3) Try to find appropriate connection weights
(including neuron thresholds) so that the network
produces the right outputs for each input in its
training data.
Cont.
4) Use different sets of data; the network is trained
on a set of training data, and its generalization
ability is tested using a new testing data.
5) If the network doesn’t perform well enough, go
back to stage 3 and try harder.
6) If the network still doesn’t perform well enough,
go back to stage 2 and try harder.
7) If the network still doesn’t perform well enough,
go back to stage 1 and try harder.
8) Problem solved – move on to next problem.
Cont.
• There are two important aspects of the network’s
operation to consider:
• Learning: The network must learn decision
boundaries from a set of training patterns so that
these training patterns are classified correctly.
• Generalization: After training, the network must
also be able to generalize, i.e. correctly classify test
patterns it has never seen before.
• Usually we want the neural network to learn in a
way that produces good generalization.
Cont.
• Sometimes, the training data may contain
errors (e.g., noise in the experimental
determination of the input values, or
incorrect classifications).
• In this case, learning the training data
perfectly may make the generalization
worse. There is an important tradeoff
between learning and generalization that
arises quite often.
Neuron Models
• When the input is a vector, the individual element
inputs are multiplied (dot product) by weights and
the weighted values are fed to the summing
junction.
• Then the output y is given by:
• It has the following neuron model.

Neuron model with vector input

A layer of neurons:
Multiple Layers Neurons

General FeedForward Artificial Neural

Networks Architecture (FFANN)
• FeedForward ANNs allow signal to travel one-
way only, from input to output.
• FeedForward ANNs tend to be straightforward
networks that associate inputs with outputs. They
are extensively used in pattern recognition.
• Figure above shows the architecture of a Multi-
Layer FeedForward neural network of log sigmoid
neurons it is a counterpart ‫ نظير‬to Multi-Layer
Perceptron (MLP) network.
Multiple Layers Neurons (cont.)

The above example has R1 inputs, S1 neurons in
the first layer, S2 neurons in the second layer, etc. It
is common for different layers to have different
numbers of neurons. The output of previous figure
is defined as follows :
a  f (LW3, 2 f (LW2,1 f (IW1,1P + b1) + b 2 ) + b3)
3 3 2 1
The layer that produces the network output is the

output layer and all the middle layers are called
the hidden layers.
TRANSFER (ACTIVATION)
FUNCTIONS
1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may get insight
into more complicated neurons.
y  b   xi wi
y
0
i
b   xi wi
0
i
2- Binary threshold neurons

• Developed by McCulloch-Pitts (1943): Also called
the hard limiter transfer function.
1
output
0 threshold weighted input

2- Binary threshold neurons (cont.)

• There are two equivalent ways to write the
equations for a binary threshold neuron:
z   xi wi z = b + å xi wi
q = -b i
i
1 if z  1 if z³0
y y
0 otherwise 0 otherwise
3- Rectified Linear Neurons

(linear threshold neurons)
They compute a linear weighted sum of their
inputs.
The output is a non-linear function of the total
input.
z = b + å xi wi
i
z if z >0 y
y = z
0 otherwise 0
4- Sigmoid neurons
z = b+ å xi wi
1
• They give a real-
y=
valued output that is
a smooth and i
1+ e
-z
bounded function of
their total input. 1
• Typically they use
the logistic function 0.5
• They have positive y
derivatives which 0
make learning 0 z
easy.
5- Softmax Transfer Function
• When we have several independent binary
attributes by which to classify the data, we need

to use a network with multiple logistic outputs.
• Then we have n output neurons, each one
corresponding to one class, and the target values

are 1 for the correct class, and 0 otherwise.
Cont.
• Each output neuron will produce a value between 0
& 1, example; 0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic
sigmoid was developed, the softmax activation
function.
• The softmax function has the effect of making the
maximum value of the outputs to be close to 1 and
the rest to be close to 0.
Softmax Transfer Function (cont.)
e zi
yi =
å e
zj
jÎgroup
• Where z is the value of each output node.

• A suitable cost function to use with softmax is the
negative log probability of the right answer.
• This is called cross entropy cost function, and it is
given by,
C = - å t j log y j
j
6. Radial Basis and Triangular Basis

transfer functions:
a(n)  exp( n )
2
1  n , if  1  n  1
a ( n)  
0, otherwise
In-class Assignment
• Given a single-input neuron with a weight of 2.3
and a bias of -3. For an input of 2, calculate the
output produced by the following transfer
functions:
I. Hard limit
II. Linear
III. Log-sigmoid
Learning in Artificial Neural Network:

• Learning aims to improve the performance of a
neural network.
• The memorization of patterns and the
subsequent response of the network can be
categorized into two general paradigms:
• Associative Mapping ‫تنتج نمط معين من وحدات الدخل‬
and
• Regularity Detection ‫تكتشف النموذج من انتظام الوحدات‬.
Definition of Learning in Artificial

Neural Network:
• Learning in the context of neural network is defined
as: Learning is a process by which the free

parameters of a neural network are adapted through
a process of stimulation ‫حث‬/‫ إثارة‬by the environment
in which the network is embedded. The type of
learning is determined by the manner in which the
parameter changes take place.
Learning Algorithm
• The learning algorithm is a prescribed set of well-
defined rule for the solution of a learning problem.
• In every learning algorithm, we must specify the
cost function.
• Cost function - is a way of using your training
data to determine values for your parameters
which produces an output function as accurate as
possible.
• The Learning paradigm ‫ نموذج‬is a model of the
environment in which the neural network
operates.
• There are three major learning paradigms.
1- Supervised Learning
• A teacher is present during the learning process
& the desired output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference
between the network’s computed output and the
expected output.
2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering
and adapting to the structural features in the input
patterns.
• The cost function is determined by the task
formulation.
• Most applications fall within the domain of
estimation problems such as statistical modeling,
compression, filtering, blind source separation and
clustering.
2- Unsupervised learning (cont.)

• Unsupervised or self-organised learning; the neural
network is presented with input data only; no target.
• It should discover significant features, or structure, in
the different input patterns.
• Thus they learn to classify the input data into
appropriate categories, or clusters.
• Unsupervised learning tends to follow the neuro-
biological organisation of the brain.
3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the
network.
• The teacher help by indicating if a computed
output is right or wrong.
• A reward is given for the right one & a penalty is
given for the wrong one.
• Data is usually not given, but generated by an
agent's interactions with the environment.
3- Reinforced Learning (cont.)

• At each point in time, the agent performs an action
and the environment generates an observation and
the instantaneous cost according to some dynamics.
• The aim is to discover a policy for selecting actions
that minimizes some measure of a long-term cost, i.e.
the expected cumulative cost.
• That is, the goal is to map situations to actions--so as
to maximize a numerical reward signal
• The environment's dynamics and the long-term cost
for each policy are usually unknown, but can be
estimated.
Cont.
• Tasks that fall within the paradigm of reinforcement
learning are control problems, games and other
sequential decision making tasks.
Two types of supervised learning

• Each training case consists of an input vector x
and a target output t.
• Regression: The target output is a real number or

a whole vector of real numbers.
• The price of a stock in 6 months time.
• The temperature at noon tomorrow.
• Classification: The target output is a class label.

• The simplest case is a choice between 1 and 0.
• We can also have multiple alternative labels.
Supervised Learning Example

• Here is an example of a Regression supervised
learning problem.
Example
• "Given this data, a friend has a house 750 square
feet - how much can they be expected to get?"
There are different approaches that can be used
to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of
doing supervised learning.
Cont.
• So, a training data is provided in which the actual
price of the house is known.
• The algorithm uses this to learn to predict prices
of houses for any other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
Example 2
• The following graph shows the number of times a
breast tumor is benign or malignant vs its tumor
size:
Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is
benign or malignant according to our trained network!
• Can you estimate diagnosis based on tumor size?
• This is an example of a classification problem
• Classify data into one of two discrete classes -
malignant or not.
• In classification problems, we may have
a discrete number of possible values for the output,
e.g. 0 – benign, 1 - type 1, 2 - type 2, 3 - type 4.
• In classification problems we can plot data in different
ways.
Classification Example (cont.)

• Notice that only the size attribute was used there.
• There may be other attributes to be used such as
age.
Cont.
• Based on that data, you can try and define separate
classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two
groups.
• Then, when you have an individual with a
specific tumor size and who is a specific age, you
can use that information to place them into one of
your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity
of cell shape…etc.
Supervised Learning
• A programmer specifies number of units in each layer
and connectivity between units, so the only unknown is
the set of weights associated with the connections.
Supervised Learning (Cont.)

Algorithm:
• Initialize the weights in the network (usually with
random values).
• Repeat until stopping criterion ‫ معيار‬is met.
• For each example in training set do:
• O=neural network output
• T=desired output (Teacher or Target)
• Update weights
Note: Each pass through all of the training examples
is called epoch ‫فترة زمنية‬.
Learning Rules:
• A learning rule, also known as training
algorithm, is defined as a procedure for

modifying the weights and biases of a network.
• The learning rule is applied to train the network to
perform some particular task.

Learning Rules
• These learning types may use different learning
rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even
further according to the rule used.
Perceptrons – the first NNs

• They are the first neural networks, introduced in
1950s by Frank Rosenblatt along with other
researchers.
• It was developed to perform pattern recognition,
hence it is a classifier.
• It is a fast and reliable network.
• It could be a single layer or multi layered.
• It has limited applications.
Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the
output neurons.
• It uses supervised training
• Example:
Cont.
• Based on that simple example, now we can
develop the learning rule for a perceptron.
• The perceptron , usually, uses a hard limit
activation function as shown in the following
figures.
Perceptrons
One perceptron neuron
One Perceptron layer

A layer of Perceptrons
Multilayer Perceptron
Perceptron Learning Rule

• First, we define the perceptron error e;
e = t – a,
Where; t = target, a = output.
• Hence, we update the weight via the following
rule:
wnew = w old + ep = w old + (t – a)p
For bias; bnew = b old +e
In-class Assignment
• Train a network to sort oranges from apples based
on 3 features; shape, texture and weight. Prototype
oranges (p1) and apples (p2) are:
• Assume the following weights & bias:

Perceptron Learning Rule: (Convergence

Theorem ‫) نظرية التقارب‬
• Perceptrons are trained on examples of desired
behavior. The desired behavior can be
summarized by a set of input/output pairs.
• Where p is network input & t is the corresponding

target. The objective is to reduce the error e
between the neuron response a, & the target
vector t (t – a).
Cont.
• The perceptron learning rule (e.g. learnp in Matlab)
calculates desired changes to the perceptron's weights
and biases given an input vector p, and the associated
error e.
• The target vector t must contain values of either 0 or 1,
as perceptrons (with hardlim transfer functions) can only
output such values.
• By carefully increasing the number of epochs, i.e. each
time learnp is executed, the perceptron has a better
chance of getting closer to the target values, & hence
converging.
The Decision Boundary

• The decision boundary is a line in the input space
(vector space); on one side of the line, the network
output is 0 while on the other side, the network output
is 1.
• Decision boundary Example: Suppose that we have
a 2- input perceptron with one neuron, as shown in the
next figure, & we want to calculate its decision
boundary.
• The decision boundary is determined by the input
vectors for which the net input n is zero:
Example
• We assume the following values for the

weights:
Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0
• Set p2 = 0
• Now, we can test one point to determine which side

of the boundary corresponds to a decision of 1.
p  2,0
• Consider the input T
The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
Decision Boundary (cont.)

Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;
• That leads us to this graph.

Perceptron Learning Rule (Summary)

1. Choose initial weights randomly.
2. Present a randomly chosen pattern x.
3. Update weights using Delta rule:
wij (t+1) = wij (t) + ei * xj
where ei = (targeti - outputi)
4. Repeat steps 2 and 3 until the stopping criterion

(convergence, max number of iterations) is
reached.
Cont.
• The process of finding new weights (and biases)
can be repeated until there are no errors.
• Note that the perceptron learning rule is guaranteed
to converge in a finite number of steps for all
problems that can be solved by a perceptron.
• These include all classification problems that are
"linearly separable" ‫ تقبل الفصل خطيا‬.

Lect 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect 2

Uploaded by

Copyright:

Available Formats

University Of Khartoum

Department Of Electronics & Electrical

EEE52511: NEURAL NETWORKS

Complexity of Human Neural System

Neural Networks Approach

• It has the following neuron model.

Neuron model with vector input

Multiple Layers Neurons

General FeedForward Artificial Neural

Multiple Layers Neurons (cont.)

The layer that produces the network output is the

2- Binary threshold neurons

0 threshold weighted input

2- Binary threshold neurons (cont.)

3- Rectified Linear Neurons

5- Softmax Transfer Function

• When we have several independent binary

attributes by which to classify the data, we need

• Then we have n output neurons, each one

corresponding to one class, and the target values

Softmax Transfer Function (cont.)

• Where z is the value of each output node.

6. Radial Basis and Triangular Basis

Learning in Artificial Neural Network:

Definition of Learning in Artificial

as: Learning is a process by which the free

2- Unsupervised learning (cont.)

3- Reinforced Learning (cont.)

Two types of supervised learning

• Regression: The target output is a real number or

• Classification: The target output is a class label.

Supervised Learning Example

Classification Example (cont.)

Supervised Learning (Cont.)

algorithm, is defined as a procedure for

• The learning rule is applied to train the network to

perform some particular task.

Perceptrons – the first NNs

One Perceptron layer

Perceptron Learning Rule

• Assume the following weights & bias:

Perceptron Learning Rule: (Convergence

• Where p is network input & t is the corresponding

The Decision Boundary

• We assume the following values for the

• Now, we can test one point to determine which side

Decision Boundary (cont.)

• That leads us to this graph.

Perceptron Learning Rule (Summary)

where ei = (targeti - outputi)

4. Repeat steps 2 and 3 until the stopping criterion

You might also like