Professional Documents
Culture Documents
Unsupervised learning
Vector quantization
2
Exercises on the past lecture
In a neural network model, given an input weight matrix as follows:
0.140000 0.250000 0.350000 0.470000
0.150000 0.810000 0.200000 0.350000
0.260000 0.240000 0.250000 0.830000
0.840000 0.930000 0.620000 0.590000
and given a hidden layer weight matrix as follows:
0.500000 0.300000 0.800000 0.600000
0.900000 0.800000 0.400000 0.100000
Assume activation function in the hidden layer threshold function with threshold 0.7
and activation function in the output layer sigmoidal function
at a given input vector x: Calculate the output on the output nodes
0.100000 0.500000 0.800000 0.900000
3
Exercises on the past lecture
2.000000 2.000000
1.000000 1.000000
2.000000 2.000000
3.000000 0.000000
with its desired output as:
0.000000 0.000000
0.000000 1.000000
Assume threshold theta=0.200000 and learning rate eta=0.400000
Apply perceptron training on node number 1.
7
for x=
2.000000 1.000000 2.000000 3.000000
net output:
2.500000
Actual output yi=
1.000000
e is :
-1.000000
DW=eta*e*x
DW=
X1 X2
Assume the corresponding desired output is
1 1 d1 d2
0 1 1 0
1 0 Assume a learning rate 0.2
0 1
9
Exercises on the past lecture
1.000
Given the input x= 0.000
1.000
net output=W*X
=[0.870000,0.330000,]
Actual output yi=[0.704746,0.581759,]
e on the output layer=yi*(1-yi)(di-yi)
=[0.061436,-0.141551,]
DW=eta*e*x
0.012 0.000 0.012
-0.028 -0.000 -0.028
W(new)=W(old)+DW
0.352 0.790 0.542
0.132 0.310 0.142
10
Exercises on the past lecture
1.000
Given the input x=
1.000
0.000
net output=W*X
=[1.142287,0.441690,]
Actual output yi=[0.758099,0.608662,]
e on the output layer=yi*(1-yi)(di-yi)
=[-0.139024,0.093214,]
DW=eta*e*x
-0.028 -0.028 -0.000
0.019 0.019 0.000
W(new)=W(old)+DW
0.324 0.762 0.542
0.150 0.329 0.142
11 Layers and learning
Multi-layer perceptrons
12 Layers and learning
Multi-layer perceptrons
The possible advantages of such a system were obvious to Minsky and
Papert, as well as to other theorists of the time.
The units of the intermediate layers can act as feature detectors;
The geometry of the way in which a multi-layer system can partition
the input space becomes much richer and more complex.
In fact, it can be shown that – given the right number of layers and
units – any partitioning of the input space is possible.
Obviously this opens the door to much more powerful perceptron
classifiers.
13 Layers and learning
Multi-layer perceptrons
The units in the intermediate layers of a multi-layer system are not
directly part of the input or the output of the system: they are purely
internal.
For this reason, they are usually referred to as the hidden units or the
hidden layers.
In 1986 Rumelhart, Hinton and Williams presented a new algorithm,
published in The PDP Papers, known as error backpropagation or the
generalized delta rule for the training of multi-layer perceptrons.
14 Layers and learning
Multi-layer perceptrons: structure
Input layer. This can contain as many units as are necessary to represent
the input pattern. In some practical applications, inputs of many
thousands of units can be found.
Hidden layers. There may be as many hidden layers as required.
Practically speaking, however, it is never necessary to use more than
two, and one hidden layer is generally sufficient for nearly all
problems.
Hidden layers may contain any number of units.
Output layer. As in the two-layer perceptron, the final layer represents
the output of the system. It can contain as many units as required.
15 Layers and learning
Multi-layer perceptrons: structure
Units. all the units in the hidden and output layers are required to have
continuous activation functions, such as the unipolar sigmoid of
Equation 2.10, or the bipolar version of Equation 2.14.
The units of the input layer, being only dummies, do not strictly require
an activation function.
Weights and connections. Like the single-layer perceptron, multi-layer
systems invariably have a fully connected topology: all the units of a
layer are connected to all the units in the succeeding layer, with no
recurrent connections or links between units within a layer.
16 Layers and learning
Backpropagation: the learning rules
As you’ve already learned, the regime most commonly used to train
multi-layer perceptrons is error backpropagation, a version of the delta
rule, which we’ve already covered in some detail.
Let us consider the perceptron illustrated in Figure 2.36, which has o
output units, m hidden units and n input units.
17 Layers and learning
Backpropagation: the learning rules
All the hidden and output units have the same activation function, the unipolar
sigmoid of Equation 2.10.
Now recall that the purpose of a supervised training rule, as in all perceptrons,
is to alter the network weights so as to bring the actual output closer and closer
to the desired output
In vector-matrix terms each step of the process has to calculate a new matrix by
building a matrix ΔW of the changes to the weight matrix W and then applying
these changes by adding ΔW to W.
But instead of a single weight matrix W, as in all the perceptrons you’ve seen
so far, the network has two: W1, an m × n matrix containing the weights
connecting the input layer with the hidden layer, and W2, an o × m matrix of
the weights between the hidden and output layers.
Training these two matrices requires somewhat different rules, we will consider
each separately, starting with W2.
18 Layers and learning
Training w2
Rather than consider all the units in the output layer of our model network, let’s
first just consider one, unit ui, along with the vector of weights wi that feed into
it – row i in W2.
The training rule has to calculate a vector Δwi of changes to this set of weights
and then apply them to wi. How?
19 Layers and learning
Training W2
All we have to do is apply the delta rule in exactly the same way as we did with
the single-layer perceptrons above. ui is a member of the output layer, so we
know what its desired output is supposed to be – it is given as part of the
training set.
So the error signal ei on ui is simply:
and the vector of changes to ui’s weight vector is:
The new weight vector wi(t + 1) for ui after this training step is just:
20 Layers and learning
Training w1
Consider the unit uj illustrated in the figure
But uj is a hidden unit, so we have no ready way
of calculating the error signal on it.
It has been proved mathematically that the error
signal ej on a unit uj in the hidden layer is given by:
Node 6
Node 10
As each pattern is presented, the errors the network makes for each are added
together to give a cumulative error for the entire set Enet.
27 Layers and learning
Backpropagation-the process
Stopping condition. Again as with the delta rule, the network will almost
certainly never be able to give an absolutely perfect response, so in
backpropagation the stopping condition also depends on the cumulative
network error Enet for the entire training set.
After each cycle, Enet is compared to a certain pre-decided maximum
acceptable error Emax: if Enet has fallen below Emax then training stops; if
not, then the patterns x1, x2, x3 ... xn are presented again.
28 Layers and learning
Backpropagation- the training algorithm
1. Set all the elements of W1 and W2 to random values.
2. Set a counter j to 1; set Enet to 0.
3. Extract xj from the training set and clamp the activations of the input units of
the perceptron to the values of its elements.
4. Feedforward phase:
1. calculate the net inputs of all the hidden units;
2. calculate the vector xh of activations of all the hidden units by applying Equation 2.10 to
each unit;
3. calculate the net inputs of all the output units;
4. calculate the vector xo of activations of all the output units by applying Equation 2.10 to
each unit.
29 Layers and learning
Backpropagation- the training algorithm
5. Calculate the network error Ej for this cycle, using either Equation 2.21 or
2.22. Add it to the cumulative network error Enet.
6. Feedback phase:
1. calculate the error signal on each output unit;
2. calculate the error signal on each hidden unit;
3. calculate the weight adjustment vector Dwjin for the incoming weight vectors of each
unit in the output layer;
4. adjust the weight matrix W2 by adding each vector Dwjout to the appropriate row in W2;
5. calculate the weight adjustment vector Dwjin for the incoming weight vectors of each
unit in the hidden layer;
6. adjust the weight matrix W1 by adding each vector Dwjout to the appropriate row in W1.
30 Layers and learning
model.summary()
39
Predicting Wine Type
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train,epochs=10, batch_size=1, verbose=1)
40
Predicting Wine Type
y_pred = model.predict(X_test)
print(y_pred[:5])
print(y_test[:5])
[[9.1432929e-03] [9.9844098e-01]
[1.2862682e-04] [1.6250061e-06]
[2.4732523e-09]] [0 1 0 0 0]
unsupervised learning.
43 Unsupervised learning in layers and
lattices
Unsupervised learning, if we compare with supervised learning we might
have:
In supervised learning, there is a set of patterns or exemplars, each one
accompanied by a teaching input that indicates the correct, or desired,
answer.
Learning takes place by adjusting the system according to some measure
of the difference between the expected, correct answer and the actual
answer, for each exemplar.
In unsupervised learning a set of exemplars is also provided, but no
desired response.
44 Unsupervised learning in layers and
lattices
Unsupervised learning
At first sight, one might ask how can unsupervised learning possibly
work?
If there is no indication of what the right or wrong response to an input
pattern is, how can a system improve its performance by making changes
in the ‘right’ direction?
For example, in a classification task, how can we correctly assign an input
pattern to the right category without teaching feedback?
Inspired from the natural system, creatures do possess some sort of
elementary system for categorizing the world and making sensible
responses to it.
45 Unsupervised learning in layers and
lattices
Unsupervised learning
One clue comes from considering any large set of data. In normal
circumstances, this will always contain regularities of some sort.
Let’s take a very simple example: the height and weight data for a large
set of people. Set out in a table, this would look something like this:
46 Unsupervised learning in layers and
lattices
Unsupervised learning
What general types of regularity might one find inside our height and weight
data set?
Clusters of closely related patterns. For example, men are generally
heavier and taller than women, so one might expect to find a cluster
corresponding to each of these two groups.
Frequency. One might discover that tall people occurred much more
frequently in the set than short people, or vice versa.
Relative variance. There might be a much greater range of weights in the
set than heights.
Ordering. The patterns may be ordered in various ways.
47 Unsupervised learning in layers and
lattices
Unsupervised learning
The task of unsupervised learning is to detect these regularities without
any input or prior knowledge of the set.
To illustrate this, let’s stick with our set of height and weight data and just
concentrate on the idea of clustering
We can illustrate the idea graphically by going back to our tried and tested
picture of an input or pattern space
Our height and weight vectors x1, x2, x3 ... xn will obviously lie dotted
around a two-dimensional space, illustrated in the next figure, and if
clusters exist they should be clearly visible.
48 Unsupervised learning in layers and
lattices
Unsupervised learning
49 Unsupervised learning in layers and
lattices
Unsupervised learning
Now consider the new data pattern x, illustrated in the figure as a vector.
Which cluster should it be assigned to?
We would naturally want to put x in the cluster it was closest to – or, to
put it another way, it is the shortest distance from.
From the figure, x obviously belongs in Cluster 1: this is absolutely
straightforward, surely.
To us, distance seems an easily known concept: I’m relatively near to one
thing and far away from another – simple.
But in mathematics there are many different ways of measuring distance,
some of them far from simple.
Euclidean distance, and angular distance.
50 Unsupervised learning in layers and
lattices
Unsupervised learning
the Euclidean distance d between two vectors x1 and x2 is given by: