You are on page 1of 13

Knapp 1

MATLAB Project I
Single Layer Perceptrons

Michael J. Knapp

CAP6615, Neural Networks for Computing


Department of Computer and Information Science and
Engineering
University of Florida

Instructor: Dr. G. Ritter


Department of Computer and Information Science and
Engineering
University of Florida, Gainesville, FL 32611

Date: 4 October 2006


Knapp 2

I. Introduction

The overriding theme of this project is to train and test single layer
perceptrons using Rosenblatt’s training algorithm. Since the project is very
intensive on vector and matrix mathematics, MATLAB was chosen since it was
originally designed as a matrix laboratory. MATLAB also includes powerful
built-in graphing features that make it ideal for visualizing data. This also
removes any cumbersome code from the project and allows the focus to be
placed on the SLP itself.

Since Rosenblatt’s algorithm can be generalized from a single output,


single layer perceptron to a multi-output, single layer perceptron I wanted
to write a core SLP function that would accept the input data X and the
desired output D and produce the matrix of weights based on the dimensions of
the parameters. Since each part requires its own program, by using this
centralized function each of the three parts of this problem becomes more
focused on the process that part is asking for as opposed to being inundated
with having three disparate incarnations of the SLP algorithm.

I also wanted to explore how varying α affected the number of epochs


required by the algorithm. Therefore, before collecting data for the results
section, I stepped through the algorithm from 1.0 down to 0.01 to gauge where
the algorithm decreased in time to where the time began increasing again. I
then intelligently chose α values to show this phenomenon.

For the run-time metric, I specifically chose CPU time over system
time. This was done since in today’s multi-tasking computing environments
there is no guarantee on how much CPU time an application will receive from
the kernel. As such, MATLAB’s cputime feature was used to keep track of how
much CPU time the MATLAB process received during an interval and not just the
elapsed system time.

For the error during a training session, I looked at two interesting


measures of error. The first was the general mean squared error for the ΔW,
which is an indicator of how much, on average squared, the weights were
adjusted for that epoch. This is a good measure of how far we are from
convergence since it indicates the level of activity that epoch. The next
error information I look at is the average of the M.S.E. for all epochs. This
is a strong indicator of how fast we converged since if we spend a lot of
time with a small M.S.E. the average M.S.E. gets smaller.

Depending on the behavior of each trial, different information was


chosen to be important and only the important information is presented in
this report. However, detailed information for each run of each trial at each
epoch is available and a soft-copy can be provided to the reader if
requested. Also, the complete code listing of the program from each part can
be provided as well.

II. Part A.
a) Statement of the Problem

This part is to design a single layer, single output perceptron and to


train it with Rosenblatt’s algorithm, that is able to classify the binary
image representation of the character ‘A’ into one class and characters ‘B’,
‘C’ and ‘D’ into the class not ‘A’. After training is completed, we must
verify the input weights correctly classify the data by verifying each of the
characters classify correctly.
Knapp 3

b) Approach & Algorithm

To approach this problem, I decided to break the problem into two parts.
The first part is general logical flow of the program as follows:

a. Translate the uni-polar matrices of ‘A’, ‘B’, ‘C’ and ‘D’ into
vectors by a row-major translation.
b. Set the input to be A, B, C and D and the desired output to 1, 0, 0,
0 respectively
c. Take the current CPU time and start the SLP function
d. Output the time after the function returns
e. Verify the weights classify the data by setting the calculated
output Y equal to one if the dot product of the weight vector and
the input vector is greater than or equal to zero, zero otherwise.

The second part is the generic SLP training algorithm function that was
described in the introduction and covered in detail in Appendix I.

c) Results

For this portion, the important information chosen was just the total
number of epochs the algorithm ran, the CPU time it took to run and which
characters were successfully classified by the network after training. Since
all of these trial run for so few epochs, and the algorithm converging
guarantees a final error of zero, error was not deemed important for this
part. If detailed information about all data from each epoch is desired.

For α = 0.85
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 9 | 0.046875 | 0.080772 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 6 | 0.046875 | 0.072991 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 10 | 0.015625 | 0.116489 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 12 | 0.093750 | 0.092258 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 9.2 | 0.0500 | 0.0887 | ‘A’ ‘B’ ‘C’ ‘D’

For α = 0.75
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 6 | 0.031250 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 6 | 0.015625 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 6 | 0.046875 | 0.056827 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 6 | 0.0375 | 0.0568 | ‘A’ ‘B’ ‘C’ ‘D’
Knapp 4

For α = 0.5
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 7 | 0.062500 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 8 | 0.046875 | 0.035385 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 7 | 0.031250 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 7 | 0.046875 | 0.037582 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 9 | 0.062500 | 0.040342 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 7.6 | 0.0500 | 0.0377 | ‘A’ ‘B’ ‘C’ ‘D’

For α = 0.2
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 10 | 0.062500 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 8 | 0.046875 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 10 | 0.046875 | 0.008738 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 8 | 0.062500 | 0.007892 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 8.8 | 0.0531 | 0.0082 | ‘A’ ‘B’ ‘C’ ‘D’

For α = 0.1
Trial # | Epochs | Elapsed Time (s) | Avg. MSE | Characters Classified
---------+--------+------------------+----------+-----------------------
1 | 14 | 0.046875 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
2 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’
3 | 14 | 0.093750 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
4 | 14 | 0.078125 | 0.002688 | ‘A’ ‘B’ ‘C’ ‘D’
5 | 12 | 0.062500 | 0.002631 | ‘A’ ‘B’ ‘C’ ‘D’
---------+--------+------------------+----------+-----------------------
Average | 13.2 | 0.0688 | 0.0027 | ‘A’ ‘B’ ‘C’ ‘D’

d) Conclusion

The first point to note is how short all of the trials lasted. In the best
case, where α = 0.75, only 1.5 passes were required through the input data
and in the worst case where α = 0.1, barely over 3 passes were needed through
the input data. This does however correspond to a factor of two increase in
epochs ran. As one would expect, the elapsed time increased by a factor of
just over 1.8 here as well. This is in line with epochs and elapsed time
being linearly related.

One thing that easily derived from the algorithm, but worth mentioning
is how the mean squared error of ΔW is always zero for a training point the
last time the algorithm passes through it. This is intuitive since the
algorithm has converged once a full pass can be made through the training
points with no change in weights and hence no error. Here this meant that
upwards of half the epochs in this part had no error since the trials were so
short in duration.

Here, the error proved to be of particular interest. For each α, every


case where the number of epochs to converge was the same, the average mean
squared error was the same. This is counter-intuitive since the random
initialization of the weights is supposed to add non-determinism to the
algorithm, but this seems to be a sign of some determinism. I hypothesize
this is due to the total number of epochs being so few that if the weights
Knapp 5

fall within a certain range it determines the number of epochs from the
observed totals and forces the error to the range observed and if a higher
precision were used, then the differences would be seen.

Finally, the idea introduced in class that varying α is somewhat of an


art is reinforced here. The optimal α was found to be ~0.75 with a sharper
increase varying α larger and a gradual increase α smaller.

III. Part B.
a) Statement of the Problem

This part asks us to generate a relatively large set of points that are
separated by, and thus not falling on, a line. Since a line is linear, the
two classes are linearly separable. Hence, we are designing a three-input,
one output single layer perceptron, where one of this input is, as usual, x0
= 1. Once the network is trained, we are to generate a series of test points
and make sure they are appropriately classified. If not, we add them to our
training set and repeat.

b) Approach & Algorithm

My approach was to first generate the 1000 points by generating the x


values ~uniform [0,1] and then mapping them to [b1,b2). Once the x values
were generated, I then generate the y values uniformly about the separating
plane. This yields a healthy random population enclosed in a rectangle about
the separating place. I then test the pairs with the equation of the
separating plane to classify them as one if they are above the plane and zero
if they are below the plane, and finally train the network.

Once the network is trained, I generate a set of test points ~uniform


[b1,b2) and then generate their corresponding y value ~uniform [separating
plane - 0.1, separating plane + 0.1] since the most difficult points to
classify are near the plane. If at least one points fails classification, I
add the entire test set to the training data and repeat the entire training
process, if all classify I display the results.

To display the results, I employed MATLAB’s plot feature. I first plot all
of the original training points. I then plot all of the test points
disparately regardless of whether they failed classification and became
training points. I then plot the separating plane and finally plot the
learned separating plane according to the definition w0 + w1*x + w2*y = 0.

For this run, I felt that the mean squared error was not as important an
error measure as the number of training points. Since 1000 points are used as
per the project specification, any point over 1000 is a test point that was
misclassified and forced the network to have to be retrained with the new set
of points, which gives a real feel for the error as opposed to just listing a
numerical mean squared error.

c) Results

For reference, I have included an example output from trial 1 where α =


0.20. This is a good example because it not only shows how the data was
represented visually, but it also shows how when more than 5 test points are
needed how they are kept distinct from the original training data in the
display.
Knapp 6

For α = 0.50
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 | 1445560 | 55.86 | 1000 | y = 0.508015*x + 1.517851
2 |16056640 | 565.33 | 1000 | y = 0.500945*x + 1.952659
3 | 617589 | 26.58 | 1000 | y = 0.500915*x + 1.928097
---------+---------+------------------+---------- +--------------------------
Average | 6039929 | 224.78 | 1000.00 | y = 0.503292*x + 1.799536
Knapp 7

For α = 0.30
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |29803146 | 1037.84 | 1000 | y = 0.500858*x + 1.958729
2 | 2087857 | 78.02 | 1000 | y = 0.500538*x + 1.980784
3 |10634854 | 373.00 | 1000 | y = 0.501288*x + 1.956244
---------+---------+------------------+---------- +--------------------------
Average |14175286 | 496.29 | 1000.00 | y = 0.500895*x + 1.965252

For α = 0.20
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |18960794 | 678.81 | 1005 | y = 0.501190*x + 1.887784
2 |13613850 | 485.11 | 1000 | y = 0.501521*x + 1.893265
3 | 8795781 | 314.70 | 1000 | y = 0.500414*x + 1.986367
---------+---------+------------------+---------- +--------------------------
Average |13790142 | 492.87 | 1001.67 | y = 0.501042*x + 1.922472

For α = 0.10
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 |12409023 | 440.02 | 1000 | y = 0.502540*x + 1.873927
2 |10242057 | 365.23 | 1000 | y = 0.502869*x + 1.826852
3 | 2954272 | 109.16 | 1000 | y = 0.505318*x + 1.754559
---------+---------+------------------+---------- +--------------------------
Average | 8535117 | 304.80 | 1000.00 | y = 0.503576*x + 1.818446

For α = 0.05
Trial # | Epochs | Elapsed Time (s) | Trn. Pts. | Equation Learned
---------+---------+------------------+-----------+--------------------------
1 | 6086848 | 218.53 | 1000 | y = 0.502416*x + 1.839083
2 | 4968506 | 177.94 | 1000 | y = 0.502180*x + 1.847443
3 | 1005184 | 40.60 | 1000 | y = 0.507828*x + 1.509208
---------+---------+------------------+---------- +--------------------------
Average | 4020179 | 145.69 | 1000.00 | y = 0.504141*x + 1.731911

d) Conclusion

This part really illuminated the effect of α on the network. Depending on


the α chosen, the effect was making the network converge in longer times on
average. Once again, we see what appears to be a non-linear effect of α on
the run time. Also, based on the varying times for each α it appears that the
initial values of W do effect the convergence as well; This was something not
seen previously. The relation between α and the y-intercept seems to be
parabolic, with a maxima when α = 0.30. The converse appears true for the
slope of the line, where while it still appears parabolic, the minima is when
α = 0.30.

Regarding the error, the most surprising thing was how few of the networks
had to be retrained because of failing to classify the test points. At first
I thought this to be an error with my code, so I tried only having 100 points
to start with and found that indeed my code was correct. Given this, I feel
that for a problem of this nature where there are only two linearly separable
classes, 1000 training points are far more than needed. Also, although it is
obvious, having to retrain a network is undesirable as it will then
approximately double the training time since the original run was fruitless.
Knapp 8

Possibly a genetic algorithm could be used in these cases to attempt to use


the resultant weights as the input to the next training attempt.

Finally, this is an excellent exercise to see visually, in two


dimensions, how SLPs classify data. The neural network initially starts with
random weights, having no idea where the separating plane lies and after some
finite number of epochs, it is able to choose a plane that linearly separates
the data. While simply finding a line may not seem impressive, I find it
quite remarkable that knowing nothing other than some data points, nearly the
exact function used to generate the data can be found.

IV. Part C.
a) Statement of the Problem

The problem here is to design a single layer, multi-output perceptron that


classifies the characters ‘A’, ‘B’, ‘C’ and ‘D’ into four disparate classes
using the generalized Rosenblatt algorithm. Once our network converged for
the four classes, we were to add noise by flipping bits on an increasing
percentage of bits until the algorithm no longer classified the data.

b) Approach & Algorithm

For my approach, I studied the generalized Rosenblatt algorithm and read


it as training one set of inputs fed to m, where m is the number of output
nodes, single layer, single output perceptrons. For this problem I chose to
stay with the same representation of the data outlines in section I. I also
chose to take the problem at face value and have one output node per class
and not employ and encoding to define the classes, this left the network with
four output nodes.

For testing the data I felt it would better show the weaknesses of an
artificial neural network (ANN) if instead of stopping the tests of the noise
tolerance when one class failed, I stopped when all classes failed. I felt
this would show if the ANN had a particular weakness and may bring to light
class encoding to circumvent these weaknesses.

The overall flow of the program was:


a. Create row-major vectors of uni-polar character matrices and create
class vectors for desired output D.
b. Obtain the weight vector by running the SLP function on the
aforementioned input, output pairs.
c. Initialize noise level to 0.00%
d. Create a random mask, ~uniform [0,1], of the input vector’s size and
any value above the noise level sets the bit to one, zero otherwise
e. Exclusive-or (XOR) the vector with the mask to toggle the bits, i.e.
simulate noise at the specified level.
f. Perform a dot product and apply the hard-limiter to weights with the
noisy input to arrive at the output class values.
g. Verify pattern classification for each character and display which
characters classified. If no characters classified correctly then
exit, otherwise increase the noise level by 5.0% and go to d. Note
the noise mask is kept constant at each noise level.
Knapp 9

c) Results

For α = 0.80 Maximum Noise Tolerance


Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’
---------+--------+------------------+----------+------+------+------+------
1 | 23 | 0.140625 | 0.083542 | 0.30 | 0.45 | 0.35 | 0.20
2 | 23 | 0.171875 | 0.083542 | 0.45 | 0.45 | 0.40 | 0.35
3 | 19 | 0.109375 | 0.092943 | 0.15 | 0.45 | 0.35 | 0.00
4 | 19 | 0.140625 | 0.092943 | 0.50 | 0.40 | 0.35 | 0.35
5 | 23 | 0.140625 | 0.083542 | 0.40 | 0.45 | 0.40 | 0.05
---------+--------+------------------+----------+------+------+------+------
Average | 21.4 | 0.140625 | 0.087302 | 0.36 | 0.44 | 0.37 | 0.19

For α = 0.65 Maximum Noise Tolerance


Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’
---------+--------+------------------+----------+------+------+------+------
1 | 19 | 0.140625 | 0.060912 | 0.30 | 0.25 | 0.25 | 0.15
2 | 19 | 0.093750 | 0.060912 | 0.35 | 0.35 | 0.15 | 0.15
3 | 19 | 0.140625 | 0.060912 | 0.40 | 0.30 | 0.15 | 0.15
4 | 19 | 0.109375 | 0.060912 | 0.20 | 0.30 | 0.20 | 0.15
5 | 19 | 0.125000 | 0.060912 | 0.30 | 0.30 | 0.20 | 0.15
---------+--------+------------------+----------+------+------+------+------
Average | 19.0 | 0.121875 | 0.060912 | 0.31 | 0.30 | 0.19 | 0.15

For α = 0.50 Maximum Noise Tolerance


Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’
---------+--------+------------------+----------+------+------+------+------
1 | 23 | 0.109375 | 0.031421 | 0.10 | 0.40 | 0.25 | 0.10
2 | 27 | 0.125000 | 0.029017 | 0.10 | 0.40 | 0.30 | 0.10
3 | 23 | 0.140625 | 0.031421 | 0.05 | 0.30 | 0.30 | 0.05
4 | 23 | 0.187500 | 0.028344 | 0.00 | 0.10 | 0.15 | 0.05
5 | 27 | 0.156250 | 0.023960 | 0.25 | 0.40 | 0.25 | 0.00
---------+--------+------------------+----------+------+------+------+------
Average | 24.6 | 0.143750 | 0.028833 | 0.10 | 0.32 | 0.25 | 0.06

For α = 0.40 Maximum Noise Tolerance


Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’
---------+--------+------------------+----------+------+------+------+------
1 | 17 | 0.093750 | 0.017759 | 0.25 | 0.20 | 0.00 | 0.00
2 | 13 | 0.046875 | 0.025723 | 0.15 | 0.05 | 0.00 | 0.15
3 | 27 | 0.156250 | 0.017605 | 0.30 | 0.30 | 0.30 | 0.20
4 | 17 | 0.109375 | 0.020047 | 0.20 | 0.15 | 0.00 | 0.20
5 | 26 | 0.140625 | 0.021889 | 0.10 | 0.25 | 0.20 | 0.00
---------+--------+------------------+----------+------+------+------+------
Average | 20.0 | 0.109375 | 0.020605 | 0.20 | 0.19 | 0.10 | 0.11

For α = 0.30 Maximum Noise Tolerance


Trial # | Epochs | Elapsed Time (s) | Avg. MSE | ‘A’ | ‘B’ | ‘C’ | ‘D’
---------+--------+------------------+----------+------+------+------+------
1 | 23 | 0.093750 | 0.015026 | 0.05 | 0.10 | 0.45 | 0.00
2 | 27 | 0.140625 | 0.014992 | 0.05 | 0.25 | 0.05 | 0.00
3 | 23 | 0.171875 | 0.015417 | 0.35 | 0.40 | 0.35 | 0.35
4 | 23 | 0.140625 | 0.015541 | 0.00 | 0.05 | 0.05 | 0.00
5 | 19 | 0.140625 | 0.017038 | 0.00 | 0.00 | 0.00 | 0.00
---------+--------+------------------+----------+------+------+------+------
Average | 23.0 | 0.1375 | 0.015603 | 0.09 | 0.16 | 0.18 | 0.07
Knapp 10

d) Conclusion

The most interesting point I found was how the last element in the
training set was the most susceptible to noise. Meaning, this was usually the
first element to fail classification with increasing levels of noise. I
surmise this is due to it being the last element to be accounted for on the
first pass and that fact propagates through and manifests itself as stated.
Also of note is that the second element is consistently the least susceptible
to noise, which does not follow the converse of the logic used for the last
element’s susceptibility.

One very interesting thing I found was that sometimes a character would
not classify at one noise level and then later classify at a higher noise
level. I would suspect this is due to where the noise is located. Looking at
the matrices for the four characters, we see that many of the points stay the
same between some classes. This would intuitively mean that the neural
network has to classify on less than the full 18 x 18 points. So, if the
noise is clustered in areas that are distinct between classes, as opposed to
areas that are the same between classes.

I also noticed that these trials, even more so than in part A, had a
tendency to take the same number of epochs given the α, however as α
decreases the variance in epochs between trials increases.

I also noticed that the more epochs in a trial tended to make the
classes less susceptible to error. This follows intuitively since each epoch
brings us closer to the solution so if more epochs bring us closer to the
solution, then we have more room for noise. Specifically looking at α = 0.40
trials 1 and 2 have low noise tolerance, but trial 3 has a much higher noise
tolerance.

Once again, run-time and epochs seem linearly related as one would
expect. Also as α decreases the average mean squared error decreases and not
in correlation with the number of epochs. This seems counter-intuitive and
will need further investigation.

V. Conclusion

Neural networks bring something completely new to the table, non-


determinism as well as non-linearity. They are able to start with no
knowledge of a function used to generate training points and train to behave
as a linear approximation to that function. Also, once the network is trained
only a vector dot product and then Boolean condition is required to generate
the output from the input. This is extremely powerful as a multiply-
accumulate unit (MAC) can be driven at extreme speed, while implementing the
actual function in more traditional methods can not. This allows data to flow
forward in real-time.

VI. References

[1] Ritter, Gerhard X. (personal communication, August 30, 2006)


Knapp 11

VII. Appendix I

The core SLP function’s MATLAB source code is provided here as a reference.
Since this is the core functionality, it should be trivial to design the
calling programs to use this function, which implements Rosenblatt’s training
algorithm.

% This is a function to train a SLP. It takes as input a unipolar vector X,


% which is an [n,k] matrix where n is the number of inputs to the SLP and k
% is the number of training sets. Y is [1,k] unipolar for the output class
% and alpha is the learning parameter

function W = slp(X, D, alpha)

% Add x0 = 1 to input and get values for n, k, m


[n,k] = size(X);
X = [ones(1,k); X];
[n,K] = size(X);
[m,k] = size(D);

if (K ~= k)
disp('Error: X and D do not match.');
return;
end

% 1. Initialize t=0, learning rate parameter a, and set weights


W(0)=arbitrary values
% i.e. wi(0)=arbitrary for i=0,...,n;
t = 0;
W = rand(n,m); % n dimensional column vector

% Prepare for the loop (2-5)


disp(sprintf('Epoch#\tM.S.E.\t\tElapsed Time (s)\tIterations w/o change'));
not_done = k; % not_done = k since we need k iterations where weights dont
change

% 2. For each pair (xk,dk) from the training set, do Steps 3–5;
time = cputime; % Get the current CPU time
totalmse = 0; % total the M.S.E.
maxrun = 0; % longest run without updating W

while (not_done > 0)

i = mod(t,k) + 1; % current index

% 3. Compute ?W(t)=aekxk, i.e. ?wi(t)=aekxik for i=0,...,n, where


% ek=dk-yk and yk=f(?i=0...nwi(t)xik);
if (m > 1)
y = ((W' * X(:,i)) >= 0);
e = (D(:,i) - y);
dW = (alpha * e * X(:,i)')';
else
y = (dot(W, X(:,i)) >= 0); % optimized for speed
dW = alpha * (D(i) - y) * X(:,i);
end
Knapp 12

% 4. Increment t=t+1;
t = t + 1;

% 5. Update W(t)=W(t-1)+?W(t-1), i.e. wi(t)=wi(t-1)+?wi(t-1) for


i=0,...,n;
W = W + dW;

% 6. If no weight changes occurred during last epoch (iteration of Steps 3–


5),
% or other stopping condition is true, then stop; otherwise, repeat from
Step 2.

if (nnz(dW) == 0)
not_done = not_done - 1;
else
not_done = k;
end

if (cputime - time) > 3600


disp(sprintf('Giving up, over one hour of CPU time used.\n'));
not_done = 0;
end

% Display data about this iteration


thismse = mse(dW); % !!! Comment these out if runtime too long
totalmse = totalmse + thismse;

showstats = false;
if (t <= 25)
showstats = true;
end
if (maxrun < (k - not_done))
maxrun = (k - not_done);
showstats = true;
end

if (showstats == true)
disp(sprintf('%d\t\t%f\t\t%f\t\t%d', t, mse(dW), (cputime - time), (k
- not_done)));
end

end

disp(sprintf('Average M.S.E. is %f for %d epochs', (totalmse / t), t)); %


Display mean M.S.E.
disp(sprintf('Total CPU Time: %f (seconds).', (cputime - time))); % Display
elapsed time
Knapp 13

VIII. Appendix II

Experimental test bench specifications, which were partially generated using


CPUID’s CPU-Z 1.37 available at http://www.cpuid.com/cpuz.php:

- Processor(s)
Number of processors 1
Number of cores 2 per processor
Number of threads 2 (max 2) per processor
Name Intel Core 2 Duo E6600
Code Name Conroe
Specification Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Package Socket 775 LGA
Family/Model/Stepping 6.F.6
Extended Family/Model 6.F
Core Stepping B2
Technology 65 nm
Core Speed 2397.6 MHz
Multiplier x Bus speed 9.0 x 266.4 MHz
Rated Bus speed 1065.6 MHz
Stock frequency 2400 MHz
Instruction sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T
L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache 4096 KBytes, 16-way set associative, 64-byte line size

- Chipset & Memory


Northbridge Intel P965/G965 rev. C2
Southbridge Intel 82801HB (ICH8) rev. 02
Memory Type DDR2
Memory Size 1024 MBytes

- System
Mainboard Vendor Intel Corporation
Mainboard Model DG965SS
BIOS Vendor Intel Corp.
BIOS Version MQ96510J.86A.1176.2006.0906.1633
BIOS Date 09/06/2006

- Memory SPD
Module 1 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin
Module 2 DDR2, PC2-4300 (266 MHz), 512 MBytes, Mushkin

- Software
Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600)
DirectX Version 9.0c
MATLAB Version 7.2.0.232 (R2006a)

You might also like