Professional Documents
Culture Documents
1. Objective
Objective of this exercise is to test the applicability of ANN in solving a problem of short-term
hydrologic forecasting and other problems. The problem to be solved is related to the Sieve basin,
a tributary of the River Arno upstream of Florence (Italy). This exercise is based on the earlier
exercise MT-1 (model trees).
2. Software used
In this exercise the following software will be used:
MS-Excel for data analysis, data preparation and visualization
NeuralMachine neural network software
Weka data mining tool for building M5 model trees
Java Runtime Environment – needed to run Weka.
Note: in order to use the Excel’s Data analysis options, select the Tools/Add-ins option and check
Analysis Tool Pack option.
Such experiments were already performed for this model, and for this exercise it will be accepted
that the final form of the model (referred to as Model 1) will have 9 input parameters that can be
presented as follows:
Q (t+1) = F (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 ) (Model 1)
1
(Later, we will see that some of the variables have not much influence on the output and hence can
be removed, so the model that you will have to build yourself as an assignment, will be referred as
Model 2.)
It is assumed that the header in ARFF files has the following form:
@attribute REt real
@attribute REt-1 real
@attribute REt-2 real
@attribute REt-3 real
@attribute REt-4 real
@attribute REt-5 real
@attribute Qt real
@attribute Qt-1 real
@attribute Qt-2 real
@attribute Qt+1 real
Specify
ARFF
Specify
filenames
2
Specify file Sieve-Q1-Train.ARFF as the training file. Use the verification file Sieve-Q1-
Verif.ARFF for both verification and cross-validation. Specify the project title – it will appear in
the reports.
Click OK when you are done. You will be asked to specify the project name. Use “SieveQ1”.
If you click the “Tabular and 3D display of data” tab you can view the data and plot it.
Click ‘Set model’ button on the left. You can specify the number of hidden nodes and other
learning parameters. Consult Lecture Notes, if necessary, to understand their meaning. Set
parameter Kappa to zero.
Select the
model type
here
3
Click the “Train model” button on the left.
To improve
speed, set this
to 20
Click to bring
the control
panel here
Start training the network by clicking the Start button. First use the default values of learning
parameters. In order to see the control panel on the “Train model” screen and change the learning
parameters, click ‘Place control panel here’ button.
If the training (cross-validation) error is not decreasing, try changing the learning parameters in
such a way that the training error starts to decrease. Try slight increase of Kappa. In case ANN
training is not progressing try also using the Shake button to reinitialize weights, and reducing the
number of hidden nodes. Try to achieve the value of the Mean square error of around 30.
When you succeed and the network is trained, click Stop button. (NeuralMachine stops after
20000 iterations.)
In order to test the performance of the trained network click the Verify button. The calculated
output values and the errors will be placed in the RPT file that you can view and import to Excel.
On the “Set model” tab page choose “Radial basis function”. At first, choose Sigma (that is, the
parameter in the Gaussian function) equal to 10. If network does not show the good performance,
select for Sigma the option “Varying” and train the network again. Observe the final value of the
Sigma parameter.
4
7. Building an artificial neural network with Weka tool
Use the data files prepared for the MT-1 exercise (SIEVE-Q1-ANN-TRAIN.PAT and SIEVE-Q1-
ANN-VERIF.ARFF correspondingly.
Click to set
the parameters
Click to read
more about
the parameters
Click on More to display a dialog box with the information about the parameters. It also explains
how to add hidden nodes to the network.
Choose the number of hidden nodes and the learning parameters. For the time being, you may
simply use the accepted values. Change GUI parameter to True.
Click OK, and then Start. The following network topology will be displayed.
5
Click on Start. In a minute or two the network will be trained.
Use "delimited" and "comma" options for importing data. Arrange data in columns in the
following way:
Column No. Column header Contents
1 Time the instance number (time)
2 Measured the measured Q(t+1)
3 Predicted the predicted Q(t+1) by ANN
Create a Chart using the "Line" option, specify the necessary headers and the thickness and color
of lines so that the chart is printed well on the black-and-white printer (this chart is given in the
DDM\Ex-MT1 folder in the file ExcelChartToPrint-Example.XLS. The following gives an
example of a plot.
6
Predicting discharge: using ANNs
Discharge Q(t+1)
350
Measured
300
Method 1
250
Method 2
200
150
100
50
0
0 50 100 150 200 250 300 350
Time [hrs]
When all results of modelling are ready, we can calculate the measure of closeness between
different approaches, and the errors. Actually, errors were already calculated by NeuralMachine or
Weka, but if needed, you can use Excel to calculate additional errors in order to compare the
results better.
The MT-1 exercise provides several quality measures used in hydrology. Calculate the RMSE
error using Excel.
x1 x2 Y (output)
----------------------
0 0 false
0 1 true
1 0 true
1 1 false
This is the so-called XOR (exclusive OR) problem (the result is true if and only if two inputs are
different). In mathematical logic it is written like this:
Y = x1 XOR x2
If you plot the data in the coordinates (x1, x2) you will see that the example are not linearly
separable.
7
Use NeuralMachine with the linear neurons to build a classification model for this training data
set (do not use any verification set). Make a conclusion about the ability of the linear model to
solve this problem.
Use NeuralMachine with the sigmoidal neurons to build such classification model. Try different
number of hidden nodes. Explain, why the ANN with non-linear functions is able to do it. Explain
how MLP ANN, basically handling the real numbers, is able to solve a classification problem
where output is not a real number.
Create a project file in NeuralMachine to build classification model for this data. Make a
conclusion about the performance of ANN.
Part 2. Model for Q(t+1) with less input parameters (Model 2) – Submit report
using the pruned M5 tree (build in the MT-1 exercise) as a basis, propose a simplified structure
for the ANN model for Q(t+1).
train and verify this model (both MLP and RBF);
draw conclusions about choosing the variables for the model and its performance for
predicting Q(t+1).
11. Report
Submit report either on the Model Tree exercise or on the ANN exercise.
8
The report has to contain a brief description of the data-driven modelling experiments (max 4 A4
pages) for Model 2 (with the number of inputs less than 9), and contain among other things:
for NeuralMachine experiments, the printout of the NeuralMachine screens showing the
training and the verification plots;
one Excel plot comparing the measured and predicted flow;
the RMSE calculated in the Excel sheet
conclusions.