NNs PDF
NNs PDF
In this Exercise
1. Introduction
ANNs 2.
3.
4.
5.
6.
ANNs and Matlab
Part #1 (Fitting a Function / Regression)
Part #2 (Pattern Recognition / Classification)
Part #3 (Clustering)
References
An Artificial Neural Network (ANN from now on) is a mathematical structure which consists of inter-
connected artificial neurons that mimics, in a much smaller scale, the way a biological neural network
(or brain) works. An ANN has the ability to learn from data, either in a supervised or an unsupervised
fashion and can be used in tasks such as regression, classification, clustering and more.
A typical artificial neuron is depicted in Figure 1. The scalar inputs pi are transmitted through con-
nections that multiply their strength by the scalar weight wi to form the product wipi, again a scalar.
All the weighted inputs wipi are added and to Σwipi we also add the scalar bias b. The result is the ar-
gument of the transfer function f, which produces the output a. The bias is much like a weight, ex-
cept that it has a constant input of 1.
p1 w1
p2 w2
a
pi
wi
Σ f
wn
pn b
a=f(Σwipi+b)
1
Note that wi and bi are both adjustable scalar parameters of the neuron. The central idea of neu-
ral networks is that such parameters can be adjusted so that the network exhibits some desired or
interesting behavior. Thus, you can train the network to do a particular job by adjusting the weight or
bias parameters, or perhaps the network itself will adjust these parameters to achieve some desired
end. The most commonly used transfer functions are the hard-limit (or step), the linear and the sig-
moid (or logistic), all depicted in Figure 2.
Figure 2: Commonly used transfer functions: hard-limit (left), linear (middle) and sigmoid (right).
The most common ANN architecture consists of many neurons organized in layers. Each neuron in
a layer is connected to all the neurons of the next layer. We can distinguish between input, hidden
and output layers. There is only one input and output layer whereas more than one hidden layers are
Kokkoras F. | Paraskevopoulos K. P a g e |1
International Hellenic University Artificial Neural Networks
allowed. Note also that it is common for the number of inputs to a layer to be different from the
number of neurons. Some other definitions you should aware of are:
learning rules: procedures for modifying the weights and biases of a network, that is, meth-
ods of deriving the next changes that might be made in an ANN
training: a procedure whereby a network is actually adjusted to do a particular job
There are two broad categories of learning rules. In supervised learning the network is provided
with a set of examples (the training set) of proper network behavior (pairs of input and known/cor-
rect output (target)). As the inputs are applied to the network, the calculated network outputs are
compared to the targets. The learning rule is then used to adjust the weights and biases of the net-
work in order to move the network outputs closer to the targets. In unsupervised learning, the
weights and biases are modified in response to network inputs only. There are no target outputs
available.
For the purpose of training, the input data are divided into three sets:
training: it is used for adjusting the weights and biases
validation: it is used to decide when to stop the training process, to avoid overfitting, a situa-
tion where the network memorizes the training data, rather than learning the law that gov-
erns them
testing: it is used to measure the performance of the trained network – it is important that
this data do not participate in the training process
The most common forms of ANNs are the multilayered perceptrons, the self-organized maps
(SOM or Kohonen Networks) and the associative memories (also known as Hopfield networks).
Matlab (R2008a -v.7.6- was used for the tutorial) provides the Neural Networks Toolbox, a set of
tools that include GUIs, wizards and functions that allow any user level (from novice to expert) to use
and experiment with neural networks with minimal effort. The easiest way to use the toolbox is
through the GUIs that perform certain tasks. In this tutorial we will see the following tasks:
fit a function / regression
recognize patterns / classification
cluster data
The second easiest way to use the toolbox is through basic command-line operations. The com-
mand-line operations offer more flexibility than the GUIs, but with some added complexity. In this
tutorial we will use both GUIs and command line operations.
In the following exercises we will see all the steps pertained to the ANN usage to solve regression,
classification and clustering problems. These steps include: collecting data, creating the network,
configuring the network, initializing the weights and biases, network training, network validation and
network use. Note though that you cannot use any type of ANN for any type of problem.
The datasets used in the following were taken from the Bren School of Information and Computer
Science at the University of California, Irvine (Repository Of Machine Learning Databases) [1]. They
are available at:
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/
The following exercises are adaptations of Matlab's tutorials to local needs.
Kokkoras F. | Paraskevopoulos K. P a g e |2
International Hellenic University Artificial Neural Networks
Neural networks are good at fitting functions. In fact, there is proof that a fairly simple neural net-
work can fit any practical function.
Problem: Using data from a housing application [1] we want to design a network that can predict the
value of a house (in $1000s), given 13 pieces of geographical and real estate information. The dataset
consists of 506 example homes (records) for which we have 13 items of data and their associated
market values. The dataset is available at:
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/housing/
The file housing.data contains the data whereas housing.names describes what this data is about.
Take your time to familiarize yourself with the dataset. The better you know your data the more in-
sight can give you during ANN modeling.
Kokkoras F. | Paraskevopoulos K. P a g e |3
International Hellenic University Artificial Neural Networks
bitrary) in one hidden layer. The network has one output neuron, because there is only one target
value associated with each input vector.
net = newfit(houseInputs,houseTargets,20);
More neurons require more computation, but they allow the network to solve more compli-
cated problems. More layers require more computation, but their use might result in the network
solving complex problems more efficiently. Beware though that adding too many neurons com-
pared to the dataset size might result in overfitting phenomena (more on this later).
3. Train the network. The network uses the default Levenberg-Marquardt algorithm for training
(backpropagation-like). The application randomly divides input vectors and target vectors into
three sets as follows:
60% are used for training,
20% are used to validate that the network is generalizing and to stop training before overfitting.
t 20% are used as a completely independent test of network generalization.
To train the network, enter:
net=train(net,houseInputs,houseTargets)
During training, the training window (right) opens.
This window displays training progress and allows
you to interrupt training at any point by clicking Stop
Training.
The train function presents all the input vectors to
the network at once in a batch. Alternatively, you can
present the input vectors one at a time using the
adapt function.
As you can see in the screenshot, the training
stopped when the validation error increased for six
iterations, which occurred at iteration (or epoch) 13.
Kokkoras F. | Paraskevopoulos K. P a g e |4
International Hellenic University Artificial Neural Networks
Improving Results
This example demonstrated some simple commands you can use to solve many types of problems.
However, if your first attempt does not meet your needs or expectations do not hesitate to retry. If
the network is not sufficiently accurate, you can try initializing the network and the training again.
Each time your initialize a feed-forward network, the network parameters are different and might
produce different solutions.
To re-initialize the network type: net = init(net); where net is your neural network.
To re-train the network type: net = train(net,houseInputs,houseTargets);
As a second approach, you can increase the number of hidden neurons above 20. Larger numbers
of neurons in the hidden layer give the network more flexibility because the network has more pa-
rameters it can optimize. You should increase the layer size gradually. If you make the hidden layer
too large, you might cause the problem to be under-characterized and the network must optimize
more parameters than there are data vectors to constrain these parameters. Your network will be
overfitted. It will memorize the training examples and although it will demonstrate high accuracy on
Kokkoras F. | Paraskevopoulos K. P a g e |5
International Hellenic University Artificial Neural Networks
the training data, it will work poorly on never-seen data. A better approach is to try using additional
training data. Providing additional data for the network is more likely to produce a network that gen-
eralizes well to new data.
If you are dissatisfied with the network's performance on the original or new data, you can take
any of the following steps (by using the buttons on the left):
Kokkoras F. | Paraskevopoulos K. P a g e |6
International Hellenic University Artificial Neural Networks
Train it again.
Increase the number of neurons.
Get a larger training data set.
8. If you are satisfied with the network performance, click Next to go to the Save Results window.
You can use the buttons on this screen to save your results. You can also click Generate M-File to
create an M-file that can be used to reproduce all of the previous steps from the command line.
Creating an M-file can be helpful if you want to learn how to use the command-line functionality
of the toolbox to customize the training process.
9. When you have saved your results, click Finish.
Neural networks are also good at recognizing patterns. In this exercise we will classify a tumor as be-
nign or malignant, based on uniformity of cell size, clump thickness, mitosis, etc. [3]. Out dataset
consists of 699 example cases for which we have 9 items of data and the correct classification as be-
nign or malignant. This breast cancer databases was originally obtained from the University of Wis-
consin Hospitals, Madison from Dr. William H. Wolberg. It is available online at:
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/
Note that this dataset contains some missing values (the numbers are replaced with question
mark characters '?'). For the needs of this tutorial, we have replaced the question marks with the
number 3.5. The dataset is also available in Matlab but except from the missing values issue, all
numbers are divided by 10. Finally, the class designators are 2 and 4 in the original dataset but 0 and
1 in the Matlab's dataset. We will also use these designators (0 and 1). More on this right in the fol-
lowing text.
Kokkoras F. | Paraskevopoulos K. P a g e |7
International Hellenic University Artificial Neural Networks
For convenience, we write this dataset in the way it will be written in a text file. Inputs will be:
0101
0011
whereas target classes will either be:
0110 (direct indication of the class)
or
1001 (1st class – we put 1 if this is our case, 0 otherwise)
0110 (2nd class – we put 1 if this is our case, 0 otherwise)
The next section demonstrates how to train a network from the command line, after you have
defined the problem.
Kokkoras F. | Paraskevopoulos K. P a g e |8
International Hellenic University Artificial Neural Networks
the input vectors to the network at once in a batch. Alternatively, you can present the input vec-
tors one at a time using the adapt function. See Matlab's Help System for more on this.
This training stopped when the validation error increased for 6 iterations, which occurred at
epoch 21.
4. To find the validation error, click 'Performance' in the training window. A plot of the training er-
rors, validation errors and test errors appears, as shown in the figure on the right. The best valida-
tion performance occurred at iteration 15 and the network at this iteration (epoch) is returned.
5. To analyze the network response, click 'Confusion'
in the training window. A display of the confusion
matrix appears (figure on the right) that shows vari-
ous types of errors that occurred for the final trained
network.
There are 4 tables: each one displays the network
response for the training, validation, testing and all
datasets. The diagonal (green) cells in each table
show the number of cases that were correctly classi-
fied, and the off-diagonal cells (red) show the mis-
classified cases. The blue cell in the bottom right
shows the total percent of correctly classified cases
(in green text) and the total percent of misclassified
cases (in red text). The results for all three data sets (training, validation and testing) show very
good recognition (that is, the network response is satisfactory). If you needed even more accurate
results, you could try any of the following approaches:
Reset the initial network weights and biases to new values with init (re-build) and train again.
Increase the number of hidden neurons.
Increase the number of training vectors.
Increase the number of input values, if more relevant information is available.
Try a different training algorithm.
To get more experience in command-line operations you can open a plot window (such as the
performance plot) during training and watch it animate.
Kokkoras F. | Paraskevopoulos K. P a g e |9
International Hellenic University Artificial Neural Networks
Kokkoras F. | Paraskevopoulos K. P a g e | 10
International Hellenic University Artificial Neural Networks
lower right blue squares illustrate the overall accuracies. In these squares, green text is used for
the correct responses and red for the incorrect.
9. In the Neural Network Toolbox Pat-
tern Recognition Tool under the Plots
pane, click 'Receiver Operating Char-
acteristic' (ROC). The colored lines
(green and blue) in each axis repre-
sent the ROC curves for each of the
two categories of this problem. The
ROC curve is another visualization of
the quality of our network. It is a plot
of the true positive rate (sensitivity)
versus the false positive rate (1 -
specificity) as the threshold is varied.
A perfect test would show points in
the upper-left corner, with 100%
sensitivity and 100% specificity. For
this simple problem, the network performs almost perfectly.
10.In the Neural Network Toolbox Pattern Recognition Tool, click 'Next' to evaluate the network. At
this point, you can test the network against new data. If you are dissatisfied with the network's
performance on the original or new data, you can train it again, increase the number of neurons,
or perhaps get a larger training data set.
11.When you are satisfied with the network performance, click 'Next'. Use the buttons on this screen
to save your results in the workspace. If you click Generate M-File, the tool creates an M-file, with
commands that recreate the steps that you have just performed from the command line. Gener-
ating an M-file is a good way to learn how to use the command-line operations of the Neural
Network Toolbox software.
Kokkoras F. | Paraskevopoulos K. P a g e | 11
International Hellenic University Artificial Neural Networks
Clustering data is another excellent application for neural networks. This process involves grouping
data by similarity. For example, you might perform:
1
It's easy to transpose an array (turn rows in to columns). Open the dataset in MS Excel, Copy the array and Paste Special
selecting Transpose. You can do the transposition in Matlab as well but providing the exact steps is beyond the scope of this
section.
Kokkoras F. | Paraskevopoulos K. P a g e | 12
International Hellenic University Artificial Neural Networks
4. During training, the training window opens (see figure on the right) and displays the training pro-
gress. To interrupt training at any point, click the 'Stop Training' button.
5. For SOM training, the weight vector associated with each neuron moves to become the center of
a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology
should also move close to each other in the input space. The default topology is hexagonal. To
view it, click 'SOM Topology' from the network training window.
In this figure (left), each of the hexagons represents a neu-
ron. The grid is 6-by-6, so there are a total of 36 neurons in this
network. There are four elements in each input vector, so the
input space is four-dimensional. The weight vectors (cluster cen-
ters) fall within this space.
Because this SOM has a two-dimensional topology, you can
visualize in two dimensions the relationships among the four-
dimensional cluster centers. One visualization tool for the SOM
is the weight distance matrix (also called the U-matrix).
6. To view the U-matrix, click 'SOM Neighbor Dis-
tances' in the training window. In this figure
(right), the blue/purple hexagons represent
the neurons. The red lines connect neighboring
neurons. The colors in the regions containing
the red lines indicate the distances between
neurons. The darker colors represent larger
distances, and the lighter colors represent
smaller distances.
A band of dark segments crosses from the
lower-center region to the upper-right region.
The SOM network appears to have clustered
the flowers into two distinct groups, each one
located on one side of this dark band.
7. Now click the 'SOM Sample Hits' button to see
another visualization (see figure on the right).
The plot displayed calculates the classes for
each flower and shows the number of flowers
in each class. Areas of neurons with large
numbers of hits indicate classes representing
similar highly populated regions of the feature
space. Areas with few hits indicate sparsely
populated regions of the feature space. Sum-
ming up all numbers results to 150. (Why?)
As mentioned in earlier parts of this tutorial, during training, you can have a plot window open
(such as the SOM weight position plot) and watching it animate as the training proceeds.
Kokkoras F. | Paraskevopoulos K. P a g e | 13
International Hellenic University Artificial Neural Networks
Note: If you have past experience with the IRIS dataset, you will probably know that there exist
three classes: Iris-Setosa, Iris-Versicolor
and Iris-Virginica (you can verify this by
looking into the dataset). You probably
wonder why our clustering network
identified only two. To better under-
stand the situation (and the value of fa-
miliarize yourself with the data), check
some visualization of them in 2D (figure
on the left) using each time 2 of the 4
available dimensions.
As you can see, only one of the clas-
ses is linearly separable from the others
(the one displayed with blue points). The
other two are not separable, so our ANN failed to identify them and considered them as a single class
(cluster).
Unlike clustering (which fails to identify the three clusters in the IRIS data), you can train a highly
accurate (>97%) classification ANN if you work with the IRIS data like in Part#2 of this tutorial. You
will also need target data since this will be a supervised training. We have them prepared already for
your convenience in the file irisTargets_x.dat. Try it yourself as an exercise.
The situation discussed here does not, in any case, reduce the usefulness of the capability of
ANNs to identify clusters in unknown datasets. For this particular dataset, you can find other tech-
niques in literature (for example, fuzzy clustering) that produce better clustering results.
This simple example can be solved with the Neural Network Pattern Recognition Tool (nprtool)
or Clustering Tool (nctool). Here we follow the later approach. You can try the former as an exercise.
Kokkoras F. | Paraskevopoulos K. P a g e | 14
International Hellenic University Artificial Neural Networks
3. In the Select Data window click 'Next' to continue to the Network Size window. The size of the
two-dimensional map is set to 10. This map represents one side of a two-dimensional grid. The to-
tal number of neurons is 100. You can change this number in another run if you want.
4. Click Next. The Train Network window appears.
5. Click Train. The training runs for the maxi-
mum number of epochs, which is 200.
6. Investigate some of the visualization tools
for the SOM. Under the Plots pane, click
SOM Sample Hits. This figure (right) shows
how many of the training data are associat-
ed with each of the neurons (cluster cen-
ters). The topology is a 10-by-10 grid, so
there are 100 neurons. The maximum num-
ber of hits associated with any neuron is 32.
Thus, there are 31 input vectors in that clus-
ter. We have marked the four classes with
red circles. As you ca see, the two classes on
the right are not well-separable.
7. You can also visualize the SOM by displaying weight places (also referred to as component
planes). Click SOM Weight Planes in the Neural Network Toolbox Clustering Tool. This figure
shows a weight plane for each element of
the input vector (two, in this case). They are
visualizations of the weights that connect
each input to each of the neurons. (Darker
colors represent larger weights.) If the con-
nection patterns of two inputs were very
similar, you can assume that the inputs are
highly correlated. In this case, input 1 has
connections that are very different than
those of input 2.
8. In the Neural Network Toolbox Clustering Tool, click Next to evaluate the network. At this point
you can test the network against new data. If you are dissatisfied with the network's performance
on the original or new data, you can increase the number of neurons, or perhaps get a larger
training data set.
9. When you are satisfied with the network performance, click Next. Use the buttons on this screen
to save your results. You can export them into the Matlab's workspace (Save Results button) or
use the Generate M-file button to create an M-file with commands that recreate the steps that
you have just performed from the command line. Generating an M-file is a good way to learn how
to use the command-line operations of the Neural Network Toolbox software.
10.When you have saved your results, click Finish.
Kokkoras F. | Paraskevopoulos K. P a g e | 15
International Hellenic University Artificial Neural Networks
References
1. Repository Of Machine Learning Databases:
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/
2. D. Harrison and D.L. Rubinfeld, "Hedonic prices and the demand for clean air", J. Environ. Eco-
nomics & Management, Vol. 5, 1978, pp.81-102.
3. O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News,
Volume 23, Number 5, September 1990, pp 1 & 18.
4. R.A. Fisher, "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part
II, pp.179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).
5. Matlab Help
Kokkoras F. | Paraskevopoulos K. P a g e | 16