You are on page 1of 9

Hydroinformatics

Data-driven modelling. Exercise ANN-1.


Prediction of flow using artificial neural networks (ANN)

D.P. Solomatine, B. Bhattacharya

1. Objective
Objective of this exercise is to test the applicability of ANN in solving a problem of short-term
hydrologic forecasting and other problems. The problem to be solved is related to the Sieve basin,
a tributary of the River Arno upstream of Florence (Italy). This exercise is based on the earlier
exercise MT-1 (model trees).

2. Software used
In this exercise the following software will be used:
 MS-Excel for data analysis, data preparation and visualization
 NeuralMachine neural network software
 Weka data mining tool for building M5 model trees
 Java Runtime Environment – needed to run Weka.

3. Preparing the scene


Create the working folder H:\DDM\EX-ANN-1 and copy to it all files and folders from
\\Edu1\HYDROINF\DDM\EX-ANN-1.

Install NeuralMachine from the server at \\edu1\hydroinf\NeuralMachine. Read in the


accompanying text file how to upgrade the license.

Note: in order to use the Excel’s Data analysis options, select the Tools/Add-ins option and check
Analysis Tool Pack option.

4. Choice of variables (based on MT-1 exercise)


The analysis of data performed in the MT-1 exercise allowed for finding inter-dependencies and is
the basis for selecting the input variables for the initial model setup. After that the model should be
trained, verified and depending on the results it may be necessary to change the set of the
variables. Model then should be trained and verified again probably several times (adding and
removing different variables), until the modelling results are found to be satisfactory.

Such experiments were already performed for this model, and for this exercise it will be accepted
that the final form of the model (referred to as Model 1) will have 9 input parameters that can be
presented as follows:

Q (t+1) = F (REt, REt-1, REt-2, REt-3, REt-4, REt-5, Qt, Qt-1, Qt-2 ) (Model 1)

1
(Later, we will see that some of the variables have not much influence on the output and hence can
be removed, so the model that you will have to build yourself as an assignment, will be referred as
Model 2.)

5. Preparing the data


You can use the data prepared for the MT-1 exercise.
Splitting the data into training and verification data sets. If total number of examples is 2154,
the last 1854 examples will be used for training and the first 300 examples for verification. Such
split allows for the reasonable representation of low and high flows in both data sets.
NeuralMachine can read ARFF files.

It is assumed that the header in ARFF files has the following form:
@attribute REt real
@attribute REt-1 real
@attribute REt-2 real
@attribute REt-3 real
@attribute REt-4 real
@attribute REt-5 real
@attribute Qt real
@attribute Qt-1 real
@attribute Qt-2 real
@attribute Qt+1 real

6. Building an artificial neural network with NeuralMachine


In this exercise we will use NeuralMachine and Weka tools. More advanced tool, NeuroSolutions
you will be able to use during the MSc studies.

6.1.1. Multi-layer perceptron

Study the Help of NeuralMachine.


Start NeuralMachine. Click on ‘Start new project’ and create the new project with the ARFF input
files that contain the data on Sieve catchment. The screen you will have to fill looks like this:

Specify
ARFF

Specify
filenames

2
Specify file Sieve-Q1-Train.ARFF as the training file. Use the verification file Sieve-Q1-
Verif.ARFF for both verification and cross-validation. Specify the project title – it will appear in
the reports.

Click OK when you are done. You will be asked to specify the project name. Use “SieveQ1”.

If you click the “Tabular and 3D display of data” tab you can view the data and plot it.

Click ‘Set model’ button on the left. You can specify the number of hidden nodes and other
learning parameters. Consult Lecture Notes, if necessary, to understand their meaning. Set
parameter Kappa to zero.

Select the
model type
here

Set the initial


values for the
learning
parameters
here

3
Click the “Train model” button on the left.

To improve
speed, set this
to 20

Click to bring
the control
panel here

Start training the network by clicking the Start button. First use the default values of learning
parameters. In order to see the control panel on the “Train model” screen and change the learning
parameters, click ‘Place control panel here’ button.

If the training (cross-validation) error is not decreasing, try changing the learning parameters in
such a way that the training error starts to decrease. Try slight increase of Kappa. In case ANN
training is not progressing try also using the Shake button to reinitialize weights, and reducing the
number of hidden nodes. Try to achieve the value of the Mean square error of around 30.

When you succeed and the network is trained, click Stop button. (NeuralMachine stops after
20000 iterations.)

In order to test the performance of the trained network click the Verify button. The calculated
output values and the errors will be placed in the RPT file that you can view and import to Excel.

Answer the following questions:


 what conclusion can you make about the convergence of MLP ANN in general?.
 what parameters in your view should be changed in order to push the backpropagation
algorithm towards convergence?
If necessary, consult the Lecture Notes.

6.1.2. Radial basis function network

NeuralMachine makes it possible to build RBF networks as well.

On the “Set model” tab page choose “Radial basis function”. At first, choose Sigma (that is, the 
parameter in the Gaussian function) equal to 10. If network does not show the good performance,
select for Sigma the option “Varying” and train the network again. Observe the final value of the
Sigma parameter.

Answer the following questions:


 what conclusion can you make about the convergence of RBF ANN in general?.
 what parameters should be changed in order to improve the accuracy of RBF network?
If necessary, consult the Lecture Notes.

4
7. Building an artificial neural network with Weka tool
Use the data files prepared for the MT-1 exercise (SIEVE-Q1-ANN-TRAIN.PAT and SIEVE-Q1-
ANN-VERIF.ARFF correspondingly.

In Weka, in the classifier area, select Neural network.

Click to set
the parameters

Weka allows for building only a MLP network.

Click to read
more about
the parameters

Click on More to display a dialog box with the information about the parameters. It also explains
how to add hidden nodes to the network.

Choose the number of hidden nodes and the learning parameters. For the time being, you may
simply use the accepted values. Change GUI parameter to True.

Click OK, and then Start. The following network topology will be displayed.

5
Click on Start. In a minute or two the network will be trained.

Save the results buffer and the predicted values.

8. Using MS Excel to plot and evaluate the modelling results


Start MS Excel and open the saved TST file with NeuralMachine modelling results, and the
ARFF.

Use "delimited" and "comma" options for importing data. Arrange data in columns in the
following way:
Column No. Column header Contents
1 Time the instance number (time)
2 Measured the measured Q(t+1)
3 Predicted the predicted Q(t+1) by ANN

8.1. Assessing the quality of prediction

8.1.1. Visual inspection

Create a Chart using the "Line" option, specify the necessary headers and the thickness and color
of lines so that the chart is printed well on the black-and-white printer (this chart is given in the
DDM\Ex-MT1 folder in the file ExcelChartToPrint-Example.XLS. The following gives an
example of a plot.

6
Predicting discharge: using ANNs
Discharge Q(t+1)

350
Measured
300
Method 1
250
Method 2
200

150

100

50

0
0 50 100 150 200 250 300 350
Time [hrs]

When all results of modelling are ready, we can calculate the measure of closeness between
different approaches, and the errors. Actually, errors were already calculated by NeuralMachine or
Weka, but if needed, you can use Excel to calculate additional errors in order to compare the
results better.

8.1.2. Formal measures of the prediction quality

The MT-1 exercise provides several quality measures used in hydrology. Calculate the RMSE
error using Excel.

8.2. Comparing the ANN and the M5 tree


Compare the performance of ANN and the M5 model tree: read the RPT or ARFF results files into
Excel and plot the results against the measured data and results of Model tree. Make a conclusion
about the applicability of the ANN and M5 in flow forecasting.

9. Using ANN in classification

9.1. XOR problem


Consider the following data:

x1 x2 Y (output)
----------------------
0 0 false
0 1 true
1 0 true
1 1 false

This is the so-called XOR (exclusive OR) problem (the result is true if and only if two inputs are
different). In mathematical logic it is written like this:

Y = x1 XOR x2

If you plot the data in the coordinates (x1, x2) you will see that the example are not linearly
separable.

7
Use NeuralMachine with the linear neurons to build a classification model for this training data
set (do not use any verification set). Make a conclusion about the ability of the linear model to
solve this problem.

Use NeuralMachine with the sigmoidal neurons to build such classification model. Try different
number of hidden nodes. Explain, why the ANN with non-linear functions is able to do it. Explain
how MLP ANN, basically handling the real numbers, is able to solve a classification problem
where output is not a real number.

9.2. Solving classification problem


In the exercise folder you will find files with the data for predicting Q(t+6). The training data set
contains 1854 examples and verification – 300 examples. There are 2 inputs (Pt and Qt) and one
output – the class of Q(t+6). There are 3 classes: Low, Medium and High.

Create a project file in NeuralMachine to build classification model for this data. Make a
conclusion about the performance of ANN.

10. Assignment: summary


Part 1. Model for Q(t+1) presented above (Model 1) (Do not submit report on Part 1)
 build MLP and RBF ANN models representing the rainfall-runoff relationship, as described
above;
 verify their performance using the supplied test (verification) set;
 analyze the results and draw conclusions about the model performance in solving the
presented problem of flow prediction.

Part 2. Model for Q(t+1) with less input parameters (Model 2) – Submit report
 using the pruned M5 tree (build in the MT-1 exercise) as a basis, propose a simplified structure
for the ANN model for Q(t+1).
 train and verify this model (both MLP and RBF);
 draw conclusions about choosing the variables for the model and its performance for
predicting Q(t+1).

Optional: Part 3. Model for Q(t+3)


 build the ANN model for predicting Q(t+3) (that is the discharge 3 hours ahead). Analysis
performed for this case study has shown that it is reasonable to use the following 6 input
parameters:
@attribute REt real
@attribute REt-1 real
@attribute REt-2 real
@attribute REt-3 real
@attribute Qt real
@attribute Qt-1 real
@attribute Qt+3 real

Optional: Part 4 (optional)


 for Q(t+3) try to use data transformation (e.g., Box-Cox or logarithmic) in order to improve
the performance of this model.

11. Report
Submit report either on the Model Tree exercise or on the ANN exercise.

8
The report has to contain a brief description of the data-driven modelling experiments (max 4 A4
pages) for Model 2 (with the number of inputs less than 9), and contain among other things:
 for NeuralMachine experiments, the printout of the NeuralMachine screens showing the
training and the verification plots;
 one Excel plot comparing the measured and predicted flow;
 the RMSE calculated in the Excel sheet
 conclusions.

You might also like