ML W5 Pla NN

Recognition Based on Artificial Neuron Networks (1)
Artificial neuron networks

• Ideas stem from the operation of human neural
networks.
• Networks of interconnected nonlinear computing
elements called neurons.
1
 Perceptron:
w and x are n-dimensional column vectors and wTx is the dot

product of the two vectors. We refer to w as a weight vector
and, as above, to wn+1 as a bias. Giving any pattern vector x from
a vector population, find a set of weights with the property:
2
7. Recognition Based on Artificial Neuron Networks (3)
For the classes are linearly separable:

The perceptron algorithm for finding (or training) w is simple as following
Let α > 0 denote a correction increment (also called the learning increment or the
learning rate),
let w(1) be a vector with arbitrary values, and
let wn+1(1) be an arbitrary constant.
Then, do the following for k = 2, 3,…: For a pattern vector, x(k), at step k,
3
4
The notation in previous equations can be simplified if we
add a 1 at the end of every pattern vector and include the
bias in the weight vector. That is, we define
x = [x1, x2, …, xn, 1]T
and
w= [w1, w2,…, wn, wn+1]T
Then,
5
The perceptron algorithm can be modified as: For any
pattern vector, x(k), at step k
6
For nonseparable pattern classes:
Let r denote the response we want the perceptron to have
for any pattern during training. The output of the perceptron
r is either +1 or −1. We want to find the augmented weight
vector, w, that minimizes the mean squared error (MSE)
between the desired and actual responses of the perceptron.
The perceptron algorithm for finding w is based on the
least-mean-squared-error (LMSE) algorithm as
A typical range for α is 0.1 < α < 1.0, w(1) is arbitrary.
7
 Artificial Neuron:
Neural networks are interconnected perceptron-like
computing elements called artificial neurons. These neurons
perform the same computations as the perceptron, but with
different activation function.
Activation function:
8
9
 Forward pass
through a fully
connected
feedforward
neural network
11
NN with 4 layers
The outputs of the layer 1 are the components of input vector
x, n = n1 is the dimensionality of x:
The computation performed by neuron i in layer l is given by
for i = 1, 2,…, n and l = 2,…, L. Quantity zi(l) is called the net

(or total) input to neuron i in layer l. The reason for this
terminology is that zi(l) is formed using all outputs from layer
(l −1).
The output (activation value) of neuron i in layer l is given by
13
Example: (Example 12.10, pp. 948-949, [1])
14
Matrix form for forward pass:

Implementation of previous equations by using matrix operations giving
computationally faster.
The number of outputs in layer 1 is always of the same dimension as an input
pattern, x, so its vector form is
The matrix, W(l), that contains all the weights in layer l, each rows contains the
weights for one of the nodes in layer l:
15
Then, we can obtain all the sum-of-products computations, zi(l), for layer l
simultaneously:
where a(l−1) is a column vector of dimension nl−1×1 containing the outputs of

layer l−1, b(l) is a column vector of dimension nl×1 containing the bias values of
all the neurons in layer l, and z(l) is an n×1 column vector containing the net input
values, zi(l), i=1, 2,…nl, to all the nodes in layer l.
16
Because the activation function is applied to each net input independently of the
others, the outputs of the network at any layer can be expressed in vector form
as:
17
Example: (Example 12.11, p. 951, [1])
18
For multiple pattern vectors, beginning by arrange all input pattern vectors as
columns of a single matrix, X, of dimension n × np, where n is the dimensionality
of the pattern vectors and np is the number of pattern vectors. It follows
where each column of matrix A(1) contains the initial activation values (i.e., the
vector values) for one pattern. Then the input matrix for all neurons and all
pattern vectors at layer l is
where W(l) is given as before and B(l) is an n × np matrix whose columns are
duplicates of b(l), the bias vector containing the biases of the neurons in layer l.
20
The outputs at layer l follow
where activation function h is applied to each element of matrix Z(l).

Summarizing the dimensions in our matrix formulation: X and A(1) are of size n ×
np, Z(l) is of size n × np, W(l) is of size nl × nl-1, A(l−1) is of size nl−1 × np, B(l) is of size
nl × np, and A(l) is of size nl × np.
21
The equations above are used to classify each of a set of patterns into one of nL
pattern classes. Each column of output matrix A(L) contains the activation values
of the nL output neurons for a specific pattern vector. The class membership of
that pattern is given by the location of the output neuron with the highest
activation value.
It is assumed in this section that we know the weights and biases of the network.
These are obtained during training using backpropagation.
22
 Backpropagation to train deep neural networks

 Training a neural network refers to using one or more
sets of training patterns to estimate the network
parameters (weights and biases) in a multilayer network.
In this section, we develop steps for training by
backpropagation.
23
7. Recognition Based on Artificial Neuron Networks (21)
 This training by backpropagation involves four basic steps:

(1) inputting the pattern vectors;
(2) a forward pass through the network to classify all the patterns of the
training set and determine the classification error;
(3) a backward (backpropagation) pass that feeds the output error back
through the network to compute the changes required to update the
parameters; and
(4) updating the weights and biases in the network.
These steps (epochs) are repeated until the error reaches an acceptable level.
24
 Equations of backpropagation
Given a set of training patterns and a multilayer feedforward neural network
architecture, the approach is to find the network parameters that minimize an
error (also called cost or objective) function.
Defining the error function for a neural network as the average of the
differences between desired and actual responses. Let r denote the desired
response for a given pattern vector, x, and let a(L) denote the actual response
of the network to that input.
26
The activation values of neuron j in the output layer is
aj(L). The error of that neuron is defined as
for j = 1, 2,…, nL, where rj is the desired response of

output neuron aj(L) for a given pattern x. The output
error with respect to a single x is defined as:
The total network output error over all training patterns

is defined as the sum of the errors of the individual
patterns. The job is to find the weights that minimize
this total error.
27
The key objective is to find a scheme to adjust all weights in a network using
training patterns. In order to do this, it needs to know how E changes with
respect to the weights and bias in the network in terms of quantities can be
computed
28
Finally, updating the network parameters using gradient
descent:
and
for l = L−1, L−2,…2, where the a’s are computed in the

forward pass, and the δ’s are computed during
backpropagation. As with the perceptron, α is the learning
rate constant. A typical range for α is 0.1 < α < 1.0.
29
Matrix form for backpropagation:

Arranging all the pattern vectors as columns of matrix X, and package the
weights of layer l as matrix W(l). Using D(l) to denote the matrix equivalent of
δ(l), the vector containing the errors in layer l.
Beginning at the network output and proceed backward.
30
Then, rewriting
This nL × 1 column vector δ(L) contains the activation values of all the
output neurons for one pattern vector. To account for all np patterns
simultaneously we form a matrix D(L), whose columns are the δ(L)
Each column of A(L) is the network output for one pattern. Similarly, each
column of R is a binary vector with a 1 in the location corresponding to the
class of a particular pattern vector, and 0’s elsewhere. All matrices are of size nL
× np.
31
Similarly, for layer l
Finally, the equations for updating the network parameters (weights and
biases) at layer l:
and
where δk(l) is the kth column of matrix D(l). The matrix B(l) of size nl × np by
concatenating b(l) np times in the horizontal direction:
32
Matrix formulation for training a feedforward, fully connected multilayer

neural network using backpropagation is summarized as bellow. Steps 1–4 are
for one epoch of training. X, R, and the learning rate parameter α, are provided to
the network for training.
The network is initialized by specifying weights, W(1), and biases, B(1), as

small random numbers.
During training, these steps 1–4 are repeated for a number of specified
epochs, or until a predefined measure of error is deemed to be small enough.
33
The measure of error is the mean squared error (MSE),

which is based on actual values of E. This value is
obtained (for one pattern) by squaring the elements of a
column of the matrix (A(L) − R), adding them, and
dividing by the result by 2. Repeating this operation for
all columns and dividing the result by the number of
patterns in X gives the MSE over the entire training set.
34
XOR gate by NN
37

ML W5 Pla NN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML W5 Pla NN

Uploaded by

Copyright:

Available Formats

Recognition Based on Artificial Neuron Networks (1)

Artificial neuron networks

w and x are n-dimensional column vectors and wTx is the dot

For the classes are linearly separable:

A typical range for α is 0.1 < α < 1.0, w(1) is arbitrary.

The computation performed by neuron i in layer l is given by

for i = 1, 2,…, n and l = 2,…, L. Quantity zi(l) is called the net

Matrix form for forward pass:

where a(l−1) is a column vector of dimension nl−1×1 containing the outputs of

The outputs at layer l follow

where activation function h is applied to each element of matrix Z(l).

 Backpropagation to train deep neural networks

 This training by backpropagation involves four basic steps:

for j = 1, 2,…, nL, where rj is the desired response of

The total network output error over all training patterns

for l = L−1, L−2,…2, where the a’s are computed in the

Matrix form for backpropagation:

Similarly, for layer l

Matrix formulation for training a feedforward, fully connected multilayer

The network is initialized by specifying weights, W(1), and biases, B(1), as

The measure of error is the mean squared error (MSE),

You might also like