You are on page 1of 37

Recognition Based on Artificial Neuron Networks (1)

Artificial neuron networks


• Ideas stem from the operation of human neural
networks.
• Networks of interconnected nonlinear computing
elements called neurons.

1
Recognition Based on Artificial Neuron Networks (2)
 Perceptron:

w and x are n-dimensional column vectors and wTx is the dot


product of the two vectors. We refer to w as a weight vector
and, as above, to wn+1 as a bias. Giving any pattern vector x from
a vector population, find a set of weights with the property:

2
7. Recognition Based on Artificial Neuron Networks (3)

For the classes are linearly separable:


The perceptron algorithm for finding (or training) w is simple as following

Let α > 0 denote a correction increment (also called the learning increment or the
learning rate),
let w(1) be a vector with arbitrary values, and
let wn+1(1) be an arbitrary constant.
Then, do the following for k = 2, 3,…: For a pattern vector, x(k), at step k,

3
Recognition Based on Artificial Neuron Networks (4)

4
Recognition Based on Artificial Neuron Networks (5)
The notation in previous equations can be simplified if we
add a 1 at the end of every pattern vector and include the
bias in the weight vector. That is, we define
x = [x1, x2, …, xn, 1]T
and
w= [w1, w2,…, wn, wn+1]T
Then,

5
Recognition Based on Artificial Neuron Networks (6)
The perceptron algorithm can be modified as: For any
pattern vector, x(k), at step k

6
Recognition Based on Artificial Neuron Networks (7)
For nonseparable pattern classes:
Let r denote the response we want the perceptron to have
for any pattern during training. The output of the perceptron
r is either +1 or −1. We want to find the augmented weight
vector, w, that minimizes the mean squared error (MSE)
between the desired and actual responses of the perceptron.
The perceptron algorithm for finding w is based on the
least-mean-squared-error (LMSE) algorithm as

A typical range for α is 0.1 < α < 1.0, w(1) is arbitrary.

7
Recognition Based on Artificial Neuron Networks (8)
 Artificial Neuron:
Neural networks are interconnected perceptron-like
computing elements called artificial neurons. These neurons
perform the same computations as the perceptron, but with
different activation function.

Activation function:

8
Recognition Based on Artificial Neuron Networks (9)

9
Recognition Based on Artificial Neuron Networks (10)
 Forward pass
through a fully
connected
feedforward
neural network

11
NN with 4 layers
Recognition Based on Artificial Neuron Networks (11)
The outputs of the layer 1 are the components of input vector
x, n = n1 is the dimensionality of x:

The computation performed by neuron i in layer l is given by

for i = 1, 2,…, n and l = 2,…, L. Quantity zi(l) is called the net


(or total) input to neuron i in layer l. The reason for this
terminology is that zi(l) is formed using all outputs from layer
(l −1).
The output (activation value) of neuron i in layer l is given by

13
Recognition Based on Artificial Neuron Networks (12)
Example: (Example 12.10, pp. 948-949, [1])

14
Recognition Based on Artificial Neuron Networks (13)

Matrix form for forward pass:


Implementation of previous equations by using matrix operations giving
computationally faster.
The number of outputs in layer 1 is always of the same dimension as an input
pattern, x, so its vector form is

The matrix, W(l), that contains all the weights in layer l, each rows contains the
weights for one of the nodes in layer l:

15
Recognition Based on Artificial Neuron Networks (14)

Then, we can obtain all the sum-of-products computations, zi(l), for layer l
simultaneously:

where a(l−1) is a column vector of dimension nl−1×1 containing the outputs of


layer l−1, b(l) is a column vector of dimension nl×1 containing the bias values of
all the neurons in layer l, and z(l) is an n×1 column vector containing the net input
values, zi(l), i=1, 2,…nl, to all the nodes in layer l.

16
Recognition Based on Artificial Neuron Networks (15)

Because the activation function is applied to each net input independently of the
others, the outputs of the network at any layer can be expressed in vector form
as:

17
Recognition Based on Artificial Neuron Networks (16)
Example: (Example 12.11, p. 951, [1])

18
Recognition Based on Artificial Neuron Networks (17)

For multiple pattern vectors, beginning by arrange all input pattern vectors as
columns of a single matrix, X, of dimension n × np, where n is the dimensionality
of the pattern vectors and np is the number of pattern vectors. It follows

where each column of matrix A(1) contains the initial activation values (i.e., the
vector values) for one pattern. Then the input matrix for all neurons and all
pattern vectors at layer l is

where W(l) is given as before and B(l) is an n × np matrix whose columns are
duplicates of b(l), the bias vector containing the biases of the neurons in layer l.

20
Recognition Based on Artificial Neuron Networks (18)

The outputs at layer l follow

where activation function h is applied to each element of matrix Z(l).


Summarizing the dimensions in our matrix formulation: X and A(1) are of size n ×
np, Z(l) is of size n × np, W(l) is of size nl × nl-1, A(l−1) is of size nl−1 × np, B(l) is of size
nl × np, and A(l) is of size nl × np.

21
Recognition Based on Artificial Neuron Networks (19)

The equations above are used to classify each of a set of patterns into one of nL
pattern classes. Each column of output matrix A(L) contains the activation values
of the nL output neurons for a specific pattern vector. The class membership of
that pattern is given by the location of the output neuron with the highest
activation value.

It is assumed in this section that we know the weights and biases of the network.
These are obtained during training using backpropagation.

22
Recognition Based on Artificial Neuron Networks (20)

 Backpropagation to train deep neural networks


 Training a neural network refers to using one or more
sets of training patterns to estimate the network
parameters (weights and biases) in a multilayer network.
In this section, we develop steps for training by
backpropagation.

23
7. Recognition Based on Artificial Neuron Networks (21)

 This training by backpropagation involves four basic steps:


(1) inputting the pattern vectors;
(2) a forward pass through the network to classify all the patterns of the
training set and determine the classification error;
(3) a backward (backpropagation) pass that feeds the output error back
through the network to compute the changes required to update the
parameters; and
(4) updating the weights and biases in the network.

These steps (epochs) are repeated until the error reaches an acceptable level.

24
Recognition Based on Artificial Neuron Networks (22)

 Equations of backpropagation
Given a set of training patterns and a multilayer feedforward neural network
architecture, the approach is to find the network parameters that minimize an
error (also called cost or objective) function.

Defining the error function for a neural network as the average of the
differences between desired and actual responses. Let r denote the desired
response for a given pattern vector, x, and let a(L) denote the actual response
of the network to that input.

26
Recognition Based on Artificial Neuron Networks (23)
The activation values of neuron j in the output layer is
aj(L). The error of that neuron is defined as

for j = 1, 2,…, nL, where rj is the desired response of


output neuron aj(L) for a given pattern x. The output
error with respect to a single x is defined as:

The total network output error over all training patterns


is defined as the sum of the errors of the individual
patterns. The job is to find the weights that minimize
this total error.
27
Recognition Based on Artificial Neuron Networks (24)

The key objective is to find a scheme to adjust all weights in a network using
training patterns. In order to do this, it needs to know how E changes with
respect to the weights and bias in the network in terms of quantities can be
computed

28
Recognition Based on Artificial Neuron Networks (25)
Finally, updating the network parameters using gradient
descent:

and

for l = L−1, L−2,…2, where the a’s are computed in the


forward pass, and the δ’s are computed during
backpropagation. As with the perceptron, α is the learning
rate constant. A typical range for α is 0.1 < α < 1.0.

29
Recognition Based on Artificial Neuron Networks (26)

Matrix form for backpropagation:


Arranging all the pattern vectors as columns of matrix X, and package the
weights of layer l as matrix W(l). Using D(l) to denote the matrix equivalent of
δ(l), the vector containing the errors in layer l.
Beginning at the network output and proceed backward.

30
Recognition Based on Artificial Neuron Networks (27)

Then, rewriting

This nL × 1 column vector δ(L) contains the activation values of all the
output neurons for one pattern vector. To account for all np patterns
simultaneously we form a matrix D(L), whose columns are the δ(L)

Each column of A(L) is the network output for one pattern. Similarly, each
column of R is a binary vector with a 1 in the location corresponding to the
class of a particular pattern vector, and 0’s elsewhere. All matrices are of size nL
× np.
31
Recognition Based on Artificial Neuron Networks (28)

Similarly, for layer l

Finally, the equations for updating the network parameters (weights and
biases) at layer l:

and

where δk(l) is the kth column of matrix D(l). The matrix B(l) of size nl × np by
concatenating b(l) np times in the horizontal direction:
32
Recognition Based on Artificial Neuron Networks (29)

Matrix formulation for training a feedforward, fully connected multilayer


neural network using backpropagation is summarized as bellow. Steps 1–4 are
for one epoch of training. X, R, and the learning rate parameter α, are provided to
the network for training.

The network is initialized by specifying weights, W(1), and biases, B(1), as


small random numbers.

During training, these steps 1–4 are repeated for a number of specified
epochs, or until a predefined measure of error is deemed to be small enough.

33
Recognition Based on Artificial Neuron Networks (30)

The measure of error is the mean squared error (MSE),


which is based on actual values of E. This value is
obtained (for one pattern) by squaring the elements of a
column of the matrix (A(L) − R), adding them, and
dividing by the result by 2. Repeating this operation for
all columns and dividing the result by the number of
patterns in X gives the MSE over the entire training set.
34
XOR gate by NN
Recognition Based on Artificial Neuron Networks (31)

37

You might also like