Professional Documents
Culture Documents
MATLAB Project I Single Layer Perceptrons: Michael J. Knapp
MATLAB Project I Single Layer Perceptrons: Michael J. Knapp
MATLAB Project I
Single Layer Perceptrons
Michael J. Knapp
I. Introduction
The overriding theme of this project is to train and test single layer
perceptrons using Rosenblatt’s training algorithm. Since the project is very
intensive on vector and matrix mathematics, MATLAB was chosen since it was
originally designed as a matrix laboratory. MATLAB also includes powerful
built-in graphing features that make it ideal for visualizing data. This also
removes any cumbersome code from the project and allows the focus to be
placed on the SLP itself.
For the run-time metric, I specifically chose CPU time over system
time. This was done since in today’s multi-tasking computing environments
there is no guarantee on how much CPU time an application will receive from
the kernel. As such, MATLAB’s cputime feature was used to keep track of how
much CPU time the MATLAB process received during an interval and not just the
elapsed system time.
II. Part A.
a) Statement of the Problem
To approach this problem, I decided to break the problem into two parts.
The first part is general logical flow of the program as follows:
a. Translate the uni-polar matrices of ‘A’, ‘B’, ‘C’ and ‘D’ into
vectors by a row-major translation.
b. Set the input to be A, B, C and D and the desired output to 1, 0, 0,
0 respectively
c. Take the current CPU time and start the SLP function
d. Output the time after the function returns
e. Verify the weights classify the data by setting the calculated
output Y equal to one if the dot product of the weight vector and
the input vector is greater than or equal to zero, zero otherwise.
The second part is the generic SLP training algorithm function that was
described in the introduction and covered in detail in Appendix I.
c) Results
For this portion, the important information chosen was just the total
number of epochs the algorithm ran, the CPU time it took to run and which
characters were successfully classified by the network after training. Since
all of these trial run for so few epochs, and the algorithm converging
guarantees a final error of zero, error was not deemed important for this
part. If detailed information about all data from each epoch is desired.
For α = 0.85
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 6 | 0.046875 | 0.072991 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 10 | 0.015625 | 0.116489 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 12 | 0.093750 | 0.092258 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 9.2 | 0.0500 | 0.0887 | ‘A’ ‘B’ ‘C’ ‘D’
For α = 0.75
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 6 | 0.031250 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 6 | 0.015625 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 6 | 0.0375 | 0.0568 | ‘A’ ‘B’ ‘C’ ‘D’
Knapp 4
For α = 0.5
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 7 | 0.062500 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 8 | 0.046875 | 0.035385 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 7 | 0.031250 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 7 | 0.046875 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 9 | 0.062500 | 0.040342 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 7.6 | 0.0500 | 0.0377 | ‘A’ ‘B’ ‘C’ ‘D’
For α = 0.2
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 10 | 0.062500 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 10 | 0.046875 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 8 | 0.062500 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 8.8 | 0.0531 | 0.0082 | ‘A’ ‘B’ ‘C’ ‘D’
For α = 0.1
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 14 | 0.046875 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 14 | 0.093750 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 14 | 0.078125 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 13.2 | 0.0688 | 0.0027 | ‘A’ ‘B’ ‘C’ ‘D’
d) Conclusion
The first point to note is how short all of the trials lasted. In the best
case, where α = 0.75, only 1.5 passes were required through the input data
and in the worst case where α = 0.1, barely over 3 passes were needed through
the input data. This does however correspond to a factor of two increase in
epochs ran. As one would expect, the elapsed time increased by a factor of
just over 1.8 here as well. This is in line with epochs and elapsed time
being linearly related.
One thing that easily derived from the algorithm, but worth mentioning
is how the mean squared error of ΔW is always zero for a training point the
last time the algorithm passes through it. This is intuitive since the
algorithm has converged once a full pass can be made through the training
points with no change in weights and hence no error. Here this meant that
upwards of half the epochs in this part had no error since the trials were so
short in duration.
fall within a certain range it determines the number of epochs from the
observed totals and forces the error to the range observed and if a higher
precision were used, then the differences would be seen.
III. Part B.
a) Statement of the Problem
This part asks us to generate a relatively large set of points that are
separated by, and thus not falling on, a line. Since a line is linear, the
two classes are linearly separable. Hence, we are designing a three-input,
one output single layer perceptron, where one of this input is, as usual, x0
= 1. Once the network is trained, we are to generate a series of test points
and make sure they are appropriately classified. If not, we add them to our
training set and repeat.
To display the results, I employed MATLAB’s plot feature. I first plot all
of the original training points. I then plot all of the test points
disparately regardless of whether they failed classification and became
training points. I then plot the separating plane and finally plot the
learned separating plane according to the definition w0 + w1*x + w2*y = 0.
For this run, I felt that the mean squared error was not as important an
error measure as the number of training points. Since 1000 points are used as
per the project specification, any point over 1000 is a test point that was
misclassified and forced the network to have to be retrained with the new set
of points, which gives a real feel for the error as opposed to just listing a
numerical mean squared error.
c) Results
For α = 0.50
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 | 1445560 | 55.86 | 1000 | y = 0.508015*x + 1.517851
2 |16056640 | 565.33 | 1000 | y = 0.500945*x + 1.952659
3 | 617589 | 26.58 | 1000 | y = 0.500915*x + 1.928097
---------+---------+------------------+---------- +--------------------------
Average | 6039929 | 224.78 | 1000.00 | y = 0.503292*x + 1.799536
Knapp 7
For α = 0.30
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |29803146 | 1037.84 | 1000 | y = 0.500858*x + 1.958729
2 | 2087857 | 78.02 | 1000 | y = 0.500538*x + 1.980784
3 |10634854 | 373.00 | 1000 | y = 0.501288*x + 1.956244
---------+---------+------------------+---------- +--------------------------
Average |14175286 | 496.29 | 1000.00 | y = 0.500895*x + 1.965252
For α = 0.20
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |18960794 | 678.81 | 1005 | y = 0.501190*x + 1.887784
2 |13613850 | 485.11 | 1000 | y = 0.501521*x + 1.893265
3 | 8795781 | 314.70 | 1000 | y = 0.500414*x + 1.986367
---------+---------+------------------+---------- +--------------------------
Average |13790142 | 492.87 | 1001.67 | y = 0.501042*x + 1.922472
For α = 0.10
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |12409023 | 440.02 | 1000 | y = 0.502540*x + 1.873927
2 |10242057 | 365.23 | 1000 | y = 0.502869*x + 1.826852
3 | 2954272 | 109.16 | 1000 | y = 0.505318*x + 1.754559
---------+---------+------------------+---------- +--------------------------
Average | 8535117 | 304.80 | 1000.00 | y = 0.503576*x + 1.818446
For α = 0.05
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 | 6086848 | 218.53 | 1000 | y = 0.502416*x + 1.839083
2 | 4968506 | 177.94 | 1000 | y = 0.502180*x + 1.847443
3 | 1005184 | 40.60 | 1000 | y = 0.507828*x + 1.509208
---------+---------+------------------+---------- +--------------------------
Average | 4020179 | 145.69 | 1000.00 | y = 0.504141*x + 1.731911
d) Conclusion
Regarding the error, the most surprising thing was how few of the networks
had to be retrained because of failing to classify the test points. At first
I thought this to be an error with my code, so I tried only having 100 points
to start with and found that indeed my code was correct. Given this, I feel
that for a problem of this nature where there are only two linearly separable
classes, 1000 training points are far more than needed. Also, although it is
obvious, having to retrain a network is undesirable as it will then
approximately double the training time since the original run was fruitless.
Knapp 8
IV. Part C.
a) Statement of the Problem
For testing the data I felt it would better show the weaknesses of an
artificial neural network (ANN) if instead of stopping the tests of the noise
tolerance when one class failed, I stopped when all classes failed. I felt
this would show if the ANN had a particular weakness and may bring to light
class encoding to circumvent these weaknesses.
c) Results
d) Conclusion
The most interesting point I found was how the last element in the
training set was the most susceptible to noise. Meaning, this was usually the
first element to fail classification with increasing levels of noise. I
surmise this is due to it being the last element to be accounted for on the
first pass and that fact propagates through and manifests itself as stated.
Also of note is that the second element is consistently the least susceptible
to noise, which does not follow the converse of the logic used for the last
element’s susceptibility.
One very interesting thing I found was that sometimes a character would
not classify at one noise level and then later classify at a higher noise
level. I would suspect this is due to where the noise is located. Looking at
the matrices for the four characters, we see that many of the points stay the
same between some classes. This would intuitively mean that the neural
network has to classify on less than the full 18 x 18 points. So, if the
noise is clustered in areas that are distinct between classes, as opposed to
areas that are the same between classes.
I also noticed that these trials, even more so than in part A, had a
tendency to take the same number of epochs given the α, however as α
decreases the variance in epochs between trials increases.
I also noticed that the more epochs in a trial tended to make the
classes less susceptible to error. This follows intuitively since each epoch
brings us closer to the solution so if more epochs bring us closer to the
solution, then we have more room for noise. Specifically looking at α = 0.40
trials 1 and 2 have low noise tolerance, but trial 3 has a much higher noise
tolerance.
Once again, run-time and epochs seem linearly related as one would
expect. Also as α decreases the average mean squared error decreases and not
in correlation with the number of epochs. This seems counter-intuitive and
will need further investigation.
V. Conclusion
VI. References
VII. Appendix I
The core SLP function’s MATLAB source code is provided here as a reference.
Since this is the core functionality, it should be trivial to design the
calling programs to use this function, which implements Rosenblatt’s training
algorithm.
if (K ~= k)
disp('Error: X and D do not match.');
return;
end
% 2. For each pair (xk,dk) from the training set, do Steps 3–5;
time = cputime; % Get the current CPU time
totalmse = 0; % total the M.S.E.
maxrun = 0; % longest run without updating W
% 4. Increment t=t+1;
t = t + 1;
if (nnz(dW) == 0)
not_done = not_done - 1;
else
not_done = k;
end
showstats = false;
if (t <= 25)
showstats = true;
end
if (maxrun < (k - not_done))
maxrun = (k - not_done);
showstats = true;
end
if (showstats == true)
disp(sprintf('%d\t\t%f\t\t%f\t\t%d', t, mse(dW), (cputime - time), (k
- not_done)));
end
end
VIII. Appendix II
- Processor(s)
Number of processors 1
Number of cores 2 per processor
Number of threads 2 (max 2) per processor
Name Intel Core 2 Duo E6600
Code Name Conroe
Specification Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Package Socket 775 LGA
Family/Model/Stepping 6.F.6
Extended Family/Model 6.F
Core Stepping B2
Technology 65 nm
Core Speed 2397.6 MHz
Multiplier x Bus speed 9.0 x 266.4 MHz
Rated Bus speed 1065.6 MHz
Stock frequency 2400 MHz
Instruction sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T
L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache 4096 KBytes, 16-way set associative, 64-byte line size
- System
Mainboard Vendor Intel Corporation
Mainboard Model DG965SS
BIOS Vendor Intel Corp.
BIOS Version MQ96510J.86A.1176.2006.0906.1633
BIOS Date 09/06/2006
- Memory SPD
Module 1 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin
Module 2 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin
- Software
Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600)
DirectX Version 9.0c
MATLAB Version 7.2.0.232 (R2006a)