Professional Documents
Culture Documents
Weight Space
• For any given network, there are fixed number of
connections with associated weights.
• So, if there are n weights, then each configuration
of weights that defines an instance of the network
is a vector, W, of length n.
• W can be considered to be a point in an n-
dimensional weight space, where each axis is
associated with one of the connections in the
network.
24/4/2018 Hiba Hassan: U of K 4
Learning (cont.)
• The performance of the neural network is graded
based on the results of various metrics of the
testing set, such as, Mean square error, SNR, etc.
• Another method of estimating the error rate of the
neural network is Resampling.
• The idea is to iterate the training and testing
processes multiple times
• Two main techniques are used:
• Cross-Validation & Bootstrapping.
24/4/2018 Hiba Hassan: U of K 7
Overfitting (Cont.)
• Overfitting can also occur if a “good” training set
is not chosen
• A “good” training set must consist of:
• Samples that represent the general population.
• Samples that contain members of each class.
• Samples in each class that contain a wide range
of variations or noise effect.
24/4/2018 Hiba Hassan: U of K 9
input = x
24/4/2018 Hiba Hassan: U of K 10
Assignment
• Write a report explaining a single overfitting reducing
technique:
• Explain overfitting; causes and effect.
• Explain in detail the chosen reducing technique.
• Use Matlab or appropriate language, to illustrate
your chosen technique by applying to data sets of
your choice.
• Students should form pairs of 2. Maximum of 5
groups are allowed to investigate a single
technique. However, every student pair are
required to use different data sets.
Perceptron Networks
• Which of these problems may be solved by a perceptron
network? Justify?
24/4/2018 Hiba Hassan: U of K 13
Example
• We have a classification problem with four classes
of input vector.
• The four classes are
• class 1: , class 2:
• class 3: , class 4:
Solution
• To solve a problem with four classes of input vector
we will need a perceptron with at least two neurons.
• Hence;
24/4/2018 Hiba Hassan: U of K 16
Solution (cont.)
• The light circles indicate class 1 vectors, the light
squares indicate class 2 vectors, the dark circles
indicate class 3 vectors, and the dark squares
indicate class 4 vectors.
24/4/2018 Hiba Hassan: U of K 17
Solution (cont.)
• We try to divide the input space into the four
categories.
Solution (cont.)
• The weight vectors should be orthogonal to the decision
boundaries and point toward the regions where the
neuron outputs are 1.
• We can choose the target classes to be:
24/4/2018 Hiba Hassan: U of K 19
Cont.
• Hence, if we select the following weight vectors:
Final Solution
• Hence, to solve our problem we need a 2 neuron
perceptron network with the following weights
and biases:
24/4/2018 Hiba Hassan: U of K 21
GRADIENT DESCENT
LEARNING
24/4/2018 Hiba Hassan: U of K 22
Cont.
• For a target (t) & an actual output (o), the error is
given by the following mean square error cost
function,
Where,
24/4/2018 Hiba Hassan: U of K 27
24/4/2018 Hiba Hassan: U of K 28
Wi (t o) xi
• Also called the Least Mean Square, LMS,
method.
24/4/2018 Hiba Hassan: U of K 32
Cont.
• An adaptive linear system responds to changes in
its environment as it is operating.
• These networks are often used in error
cancellation, signal processing, and control
systems. For example, they are used by many
long distance phone lines for echo
cancellation.
• The pioneering الرائدwork in this field was done
by Widrow and Hoff, who gave the name
ADALINE to adaptive linear elements.
24/4/2018 Hiba Hassan: U of K 37
Cont.
• Multiple layer ADALINE is called MADALINE.
• The Widrow-Hoff rule can only train single-layer
linear networks. This is not much of a disadvantage;
single-layer linear networks are just as capable as
multilayer linear networks.
• For every multilayer linear network, there is an
equivalent single-layer linear network.
24/4/2018 Hiba Hassan: U of K 39
BACKPROPAGATION
ALGORITHM
24/4/2018 Hiba Hassan: U of K 40
BackPropagation Algorithm
• The objective of this algorithm is to minimize the
error between the target and actual output and to
find Δw.
• The error is calculated at every iteration and is
back propagated through the layers of the ANN to
adapt the weights.
• The weights are adapted such that the error is
minimized.
• Once the error has reached a justified minimum
value, the training is stopped.
24/4/2018 Hiba Hassan: U of K 41
Cont.
• The configuration for training a neural network using
the BP algorithm is shown in the figure below.
24/4/2018 Hiba Hassan: U of K 42
Cont.
• We need to obtain the following algorithm to adapt
the weights between the output (k) and hidden (j)
layers:
Cont.
• Adaptation between input (i) and hidden (j) layers :
• Where,
• And,
24/4/2018 Hiba Hassan: U of K 45
Backpropagation Algorithm
• The following ANN model is used to derive the
backpropagation algorithm:
24/4/2018 Hiba Hassan: U of K 46
BP (cont.)
• The backpropagation has two steps,
• Forward propagation, and
• Backward propagation.
• Our ANN model has the following assumptions:
• A two-layer multilayer NN model, i.e. with 1 set of
hidden neurons.
• Neurons in layer i are fully connected to layer j and
neurons in layer j are fully connected to layer k.
• Input layer neurons have linear activation functions
and hidden and output layer neurons have logistic
activation functions (sigmoids).
24/4/2018 Hiba Hassan: U of K 47
Cont.
• The firing angle used here is c=1.
• Bias weights are used with bias signals of 1 for hidden
(j) and output layer (k) neurons.
• In many ANN models, bias weights (θ) with bias signals
of 1 are used to speed up the convergence process.
• The learning parameter is given by the symbol η and is
usually fixed a value between 0 and 1, however, in
many applications nowadays an adaptive η is used.
• Usually η is set large in the initial stage of learning and
reduced to a small value at the final stage of learning.
• A momentum term α is also used in the G.D.R. to avoid
local minima.
24/4/2018 Hiba Hassan: U of K 49
Steps of BP Algorithm
• Step 1: Obtain a set of training patterns.
• Step 2: Set up neural network model: No. of Input
neurons, Hidden neurons, and Output Neurons.
• Step 3: Set learning rate η and momentum rate α
• Step 4: Initialize all connection Wji , Wkj and bias
weights θj θk to random values.
• Step 5: Set minimum error, Emin
• Step 6: Start training by applying input patterns one
at a time and propagate through the layers then
calculate total error.
24/4/2018 Hiba Hassan: U of K 50
Cont.
• Step 7: Backpropagate error through output and
hidden layer and adapt weights.
• Step 8: Backpropagate error through hidden and
input layer and adapt weights.
• Step 9: Check it Error < Emin
• If not repeat Steps 6-9. If yes stop training.
24/4/2018 Hiba Hassan: U of K 51
Cont.
• The training patterns of this ANN is the XOR
example as given in the following table:
24/4/2018 Hiba Hassan: U of K 53
Cont.
• The ANN model and its initial weights,
Cont.
• This error is now backpropagated through the layers
following the error signal equations given as follows:
• Between output (k) and hidden (j) layer
• Thus
• Between hidden (j) and input (i) layer :
• = -0.0035
24/4/2018 Hiba Hassan: U of K 57
Cont.
• Now we have calculated the error signal between
layers (k) and (j)
= -0.0064
24/4/2018 Hiba Hassan: U of K 58
Cont.
• This is the increment of the weight after the first
iteration for the weight between layers k and j.
• Now this change in weight is added to the actual
weight as follows
Cont.
• Similarly for the weights between layers j and i, the
adaptation follows