You are on page 1of 16

Term Paper 2017

INTRODUCTION

River floods can cause loss of life, devastating damage to properties and adverse economic and
environmental impacts. Although flood risks cannot be eliminated completely, real time flood forecasting
models, as an important and integral part of a flood warning service, can help to provide timely flood
warnings with an adequate lead time for the public to minimize flood damages. Therefore, correct and
reliable flood forecasting is a major task for flood hazard mitigation. There are many approaches are
employed in flood forecasting. Among them, two main approaches are employed at present. The first one
is based on the relationship between rainfall and runoff simply known as physical modelling. However, it
is difficult to develop a fully physically based forecasting model due to the complex nature of floods and
the varied responses to them. The second approach is based on modelling statistical relationship between
the hydrologic input and output. However, the relationship between rainfall and runoff is notoriously
nonlinear during the storm events. Therefore, it is also difficult to construct a statistical model using
conventional regression techniques. To provide an alternative approach for accurate flood forecasting,
artificial neural network (ANN) has been suggested, which has the powerful capability to model nonlinear
and complex systems without clear physical explanation. In back propagation neural networks (BPNN)
are used to forecast the stream flow. To achieve better forecasting accuracy and efficiency. In radial basis
function neural network (RBFNN) are respectively utilized to provide the 3 hours and 6 hours ahead
forecasting. Except BPNs, as a single hidden layer feedforward neural network, extreme learning machine
has also been used for flood forecasting. Moreover, self-organizing-map (SOM) network has also been
applied in flood forecasting. Besides NN, support vector machines (SVM) have also been utilized for flood
forecasting in some literatures. Although ANNs have been successfully applied in flood forecasting, it can
still not go beyond one or two hidden layers for the problematic non-convex optimization. In recent years,
deep learning has become a successful technique for solving complex problems by using a series of
multilayer architectures. A deep neural network can bring better performance, which renewed interest in
ANN. However, until now the typical autoencoder based deep neural network naming Stacked
Autoencoder (SAE) has not been considered in flood forecasting. Under this circumstance, this paper
proposes a deep learning algorithm with SAE and back propagation neural networks (BPNN), which
simultaneously takes advantages of the powerful feature representation capability of SAE and superior
predicting capacity of BPNN. To further improve the nonlinearity simulation capability, we first classify
all the data into several categories by the K-means clustering module. Then, multiple SAE-BP modules
are adopted to simulate their corresponding categories of data, which weakens the non-linearity of the data
and generates more accurate forecasting results

Dept of CSE, GMRIT Page 1


Term Paper 2017

BASIC CONCEPT
ARTIFICIAL NEURAL NETWORKS:
An Artificial Neural Network (ANN) is an information processing paradigm that is
inspired by the way biological nervous systems, such as the brain, process information. It is composed of a
large number of highly interconnected processing elements (neurons) working in unison to solve specific
problems.

Basic Structure of Artificial Neural Network

The idea of ANNs is based on the belief that working of human brain by making the right
connections, can be imitated using silicon and wires as living neurons and dendrites. The human brain is
composed of 86 billion nerve cells called neurons. They are connected to other thousand cells
by Axons. Stimuli from external environment or inputs from sensory organs are accepted by dendrites.
These inputs create electric impulses, which quickly travel through the neural network. A neuron can
then send the message to other neuron to handle the issue or does not send it forward.

ANNs are composed of multiple nodes which imitate biological neurons of human brain in the
form of layers. The neurons are connected by links and they interact with each other. The nodes can take
input data and perform simple operations on the data. The result of these operations is passed to other
neurons. The output at each node is called its activation or node value. The commonest type of artificial
neural network consists of three groups, or layers, of units: a layer of "input" units is connected to a layer
of "hidden" units, which is connected to a layer of "output" units.

 Input Layer
 Hidden Layer
 Output Layer

Dept of CSE, GMRIT Page 2


Term Paper 2017

The activity of the input units represents the raw information that is fed into the network.

The activity of each hidden unit is determined by the activities of the input units and the weights on the
connections between the input and the hidden units.

The behavior of the output units depends on the activity of the hidden units and the weights between the
hidden and output units.

For the above general model of artificial neural network, the net input can be calculated as follows −
Yin = x1.w1 + x2.w2 + x3.w3 … xm.wm

i.e., Net input yin = ∑im xi.wi


The output can be calculated by applying the activation function over the net input.
Y = F(yin)

Dept of CSE, GMRIT Page 3


Term Paper 2017

Output = function (net input calculated)

Y = f ( X1.w1 + X2.w2 + b )

Each input has associated with weight ‘w’. The neuron applies a function ‘f’ to the weighted sum
of inputs. Activation Function: A function used to transform the activation level of a unit (neuron) into an
output signal.

Back Propagation Neural Network(BPNN):


Backpropagation is a method used in artificial neural networks to calculate the error contribution
of each neuron after a batch of data is processed. This is used by an enveloping optimization algorithm to
adjust the weight of each neuron, completing the learning process for that case. Technically it calculates
the gradient of the loss function. It is commonly used in the gradient descent optimization algorithm. It is
also called backward propagation of errors, because the error is calculated at the output and distributed
back through the network layers.

Radial Basis Function Network:


Radial basis function network is an artificial neural network that uses radial basis functions
as activation functions. The output of the network is a linear combination of radial basis functions of the
inputs and neuron parameters. Radial basis function networks have many uses, including function
approximation, time series prediction, classification, and system control.

Dept of CSE, GMRIT Page 4


Term Paper 2017

Self-Organizing Map (SOM):


A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural
network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-
dimensional), discretized representation of the input space of the training samples, called a map, and is
therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural
networks as they apply competitive learning as opposed to error-correction learning (such as back
propagation with gradient descent), and in the sense that they use a neighborhood function to preserve
the topological properties of the input space.

Support Vector Machine (SVM):


Support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-
dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good
separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of
any class (so-called functional margin), since in general the larger the margin the lower the generalization
error of the classifier.

Extreme Learning Machine:


Extreme learning machines are feedforward neural network for classification, regression,
clustering, sparse approximation, compression and feature learning with a single layer or multi layers of
hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden
nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they
are random projection but with nonlinear transforms), or can be inherited from their ancestors without
being changed. In most cases, the output weights of hidden nodes are usually learned in a single step,
which essentially amounts to learning a linear model.

Dept of CSE, GMRIT Page 5


Term Paper 2017

Advantages of neural networks:


 Neural network models required less formal statistical training to develop.
 Any data set can be analyzed using conventional and logistic regression can also be used to
develop a neural network based on the prediction model.
 Neural networks can be trained using both continuous and categorical input and output variables
 Network tends to work best when the data have been normalized.
 Neural network model can implicitly detect complex nonlinear relationship between independent
and dependent variables.
 Neural network model has the ability to detect all possible interaction between predictor
variable.
 Neural network can be developed multiple different algorithms.

Dept of CSE, GMRIT Page 6


Term Paper 2017

DESIGN

Architecture of SAE-BP:
A stacked autoencoder is a neural network consisting of multiple layers of
sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer.
Autoencoders can be stacked to form a deep network by feeding the latent representation (output code) of
the autoencoder found on the layer below as input to the current layer. The unsupervised pre-training of
such an architecture is done one layer at a time. Each layer is trained as a denoising autoencoder by
minimizing the error in reconstructing its input (which is the output code of the previous layer). Once the
first layers are trained, we can train the -th layer because we can now compute the code or latent
representation from the layer below.
Once all layers are pre-trained, the network goes through a second stage of training called fine-
tuning. Here we consider supervised fine-tuning where we want to minimize prediction error on a supervised
task. For this, we first add a logistic regression layer on top of the network (more precisely on the output
code of the output layer). We then train the entire network as we would train a multilayer perceptron. At this
point, we only consider the encoding parts of each auto-encoder. This stage is supervised, since now we use
the target class during training.
This can be easily implemented in Thea no, using the class defined previously for a autoencoder. We can
see the stacked autoencoder as having two facades: a list of autoencoders, and an MLP. During pre-training
we use the first facade, i.e., we treat our model as a list of autoencoders, and train each autoencoder
separately. In the second stage of training, we use the second facade. These two facades are linked because:

 the autoencoders and the sigmoid layers of the MLP share parameters, and
 the latent representations computed by intermediate layers of the MLP are fed as input to the
autoencoders.

The architecture of SAE-BP integrated algorithm consists of one input layer, one output layer and K
hidden layers. However, this kind of deep network easily gets stuck in poor solutions when it is trained by
traditional back propagation algorithm with random initialization. Therefore, during the implementation
process, we use SAE to obtain better initialization.

Dept of CSE, GMRIT Page 7


Term Paper 2017

Dept of CSE, GMRIT Page 8


Term Paper 2017

METHODOLOGY
A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in
which the outputs of each layer is wired to the inputs of the successive layer. An autoencoder is an
unsupervised network that aims to extract the non-linear features. Specifically speaking, an autoencoder is
composed of three layers: the input layer, a hidden layer and the output layer. For the k-th Sparse
Autoencoder of SAE, we suppose the network parameters as (W,b) = W(k,1) , W(k,2), b(k,1), b(k,2), which
denote the parameters W(1),W(2),b(1),b(2) for the k-th autoencoder. For W(1) ij , it connects the j-th unit in
inupt layer with the ith unit in hidden layer of the autoencoer. Similarly, W (2) ij represents the weights
connecting the j-th unit in hidden layer with the i-th layer unit in output layer. b(1)i represents the bais of
ith unit in hidden layer and b(2)i is the bias of the ith unit in output layer. Then, the encoding step for the
stacked autoencoder is given by running the encoding step of each layer in forward order:

For the k-th Sparse Autoencoder of SAE, we suppose the network parameters as

(W,b)=W(k,1),W(k,2),b (k,1),b (k,2)

W_{ij} - connects the j-th unit in input layer with the ith unit in hidden layer of the autoencoder.

b_i - represents the bais of ith unit in hidden layer

Encoding step of each layer in forward order:

a(l) = f(z(l))

z(l+1) = W(1,1)a(l) + b(l,1)

a(l) i is the activation of the i-th unit in layer l

z(1) i is the input of this unit

The decoding step of each autoencoder in reverse order:

a(n+l) = f(z(n+l))

z(n+l+1) = W(n−l,2)a(n+l) + b(n−l,2)

a(l) i is the activation of the i-th unit in layer l

z(1) i is the output of this unit

Where f(.):R→R is the sigmoid function:

f(z)=1/(1 + exp(−z))

Dept of CSE, GMRIT Page 9


Term Paper 2017

Sigmoid function:
A sigmoid function is a bounded differentiable real function that is defined for all real input values
and has a non-negative derivative at each point.

In general, a sigmoid function is real-valued, monotonic, and differentiable having a non-


negative first derivative which is bell shaped. A sigmoid function is constrained by a pair of horizontal
asymptotes as

If we define a training set as {(x(1), x`(1)),...,(x(m), m`(m))}, a square error cost function J(W,b) will be:

m- number of units of layer L

ρ - sparsity param

Kullback-Leibler Divergence:

Dept of CSE, GMRIT Page 10


Term Paper 2017
This way, the hidden layer of the autoencoder results in a non-linear compact representation of the
input layer. A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise
training. When the parameters W(1,1),W(1,2), b(1,1),b(1,2) of first sparse autoencoder is trained, the
parameters W(1,1),b(1,1) of the first layer is used to transform the raw input into a non-linear compact
features. Then, these features are used as the input of the second autoencoder. Repeat for subsequent
layers, using the output of each layer as input for the subsequent layer. This method trains the parameter
so for each layer individually while freezing parameters for the remainder of the model.

As the prediction process is also nonlinear, we add a BP network to SAE rather than directly make
prediction by using outputs of SAE. As shown Fig. 1, after the K −1 hidden layers have been trained by
SAE, the K −1 hidden layer of the deep network is directly connect to back propagation neural network,
which is responsible for making prediction. Finally, back propagation algorithm is used to produce better
results by tuning the parameters of all layers. Considering the imbalance of data distribution, we first
classify all the data into several categories by the K-means clustering. Then, multiple SAE-BP models are
respectively trained by each categories of data, which will further weaken the non-linearity of the data and
obtain more accurate forecasting results.

Data Preparation:

The problem solved at this paper is to predict the stream flow for next 6 hour by using data
gathered in the previous years. To be more specific, we chose the flow data in flood periods from 1998 to
2010 and the sampling rate of the data is one hour. 6482 samples of data from 1998 to 2008 are selected as
training samples and 1676 samples in the remaining two years are used as testing samples. We utilize two
hidden layers in SAE-BP model, whose number of unit are respectively 20 and 15. The number K of K-
means 4. The selection of prediction factors is very important for intelligent flood forecasting methods,
which should follows two principles. First, the selected factors should be those affecting the flooding
process significantly, such as basin rainfall, upstream flow. Second, the values of these factors can be
easily obtained before flooding. According to these two principles, we choose the rainfall of the earlier 4-7
hours from six upstream stations and current station as candidate prediction factors. In addition, the
runoffs of earlier 4-7 hours from current station and the runoffs of all stations in the last 4 hours are also
selected as the prediction factors. In order to avoid the effects of scale in the deep network architecture,
we normalized these prediction factor variables to [0,1] interval. The normalization method is shown in
following equation:

Vi = (αi −minαi) / (maxαi −minαi)

αi - a value to normalize for the i-th variable

minαi - the minimum value in the training set.

Maxαi - is the maximum

Dept of CSE, GMRIT Page 11


Term Paper 2017

Evaluation:
To evaluate the performance of the proposed model, we use the mean squared errors (MSE)
criterion and deterministic coefficient (DC), which are defined as follows:

Where y0(i) is the actual measured value, yc(i) is the predicted value and y0 is mean value of the actual
value in is the total sample number used for forecasting. In the field of hydrology forecasting, the DC is
generally used as one of the evaluation measures, whose range is [0, 1]. The closer to 1, the higher is the
prediction precision.

Dept of CSE, GMRIT Page 12


Term Paper 2017

RESULTS
The results of SAE-BP deep network were compared with those of following methods: BPNN,
RBFNN, extreme learning machine (ELM) and SVM, which can be observed in Table I. The table shows
the average MSE and DC for the best configuration of each implemented method. From the table, we can
see that SAE-BPN model achieves the best result. As random initialization of BPNN and ELM leads
different results every time, we conducted the experiment ten times and took the average.

MODEL MSE DC
SVM 4930 0.816
BP Neural Network 6999 0.707
RBF Neural Network 7295 0.695
Extreme Learning Machine 6807 0.715
SAE-BP 3644 0.848
SAE-BP + Kmeans 2877 0.88

The MSE and DC can only evaluate the whole prediction results of the models. To better analysis
the prediction results, we illustrate their line charts. The results indicate that the SVM model and RBF
network model perform better in the low flow section while BP network achieves better performance in
high flow section. Meanwhile, ELM model is not stable with large deviations. Comparing with these
traditional models, SAEBP makes more accurate prediction especially for high flow section.

Dept of CSE, GMRIT Page 13


Term Paper 2017

CONCLUSION
Deep learning approach based on the use of stacked autoencoder (SAE) and back
propagation neural networks to predict the stream flow for the next 6 hours. The proposed
approach has been compared with other state of the art methods. The results show that the
integrated algorithm based SAE and BPN outperforms other approaches in terms of MSE and DC
and has better line chart. However, there is still much room for improvement of the SAE-BP model
because we do not take a full consideration of the imbalance of the data distribution.

Dept of CSE, GMRIT Page 14


Term Paper 2017

REFERENCES

1. C. Cheng W. Niu Z. Feng J. Shen K. Chau "Daily reservoir runoff forecasting

method using artificial neural network based on quantum-behaved particle swarm


optimization" <em>Water</em> vol. 7 no. 8 pp. 4232-4246 Cheesemen, P. and J. Stutz,
“Bayesian classification: Theory and Results”, In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth
and R. Uthunsamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press /
MIT Press, 1996.
2. C. Yang C. Chen "Application of integrated backpropagation network and self

organizing map for flood forecasting" <em>Hydrological processes</em> vol. Han J.


and M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, San Francisco,
2001.
3. F. Liu, F. Xu and S. Yang, "A Flood Forecasting Model Based on Deep Learning
Algorithm via Integrating Stacked Autoencoders with BP Neural Network," 2017
IEEE Third International Conference on Multimedia Big Data (BigMM), Laguna
Hills, CA, USA, 2017, pp. 58-61. doi: 10.1109/BigMM.2017.29
4. G. Huang Q. Zhu C. Siew "Extreme learning machine: theory and applications"

<em>Neurocomputing</em> vol. 70 no. 1 pp. 489-501 2006. .


5. G. Lin L. Chen "A non-linear rainfall-runoff model using radial basis function

network" <em>Journal of Hydrology</em> vol. 289 no. 1 pp. 1-8 2004.


6. G. Lin G. Chen "A systematic approach to the input determination for neural

network rainfall-runoff models" <em>Hydrological processes</em> vol. 22 no. 14


pp. 2524-2530 2008.
7. G. Lin M. Wu "An rbf network with a two-step learning algorithm for developing a

reservoir inflow forecasting model" <em>Journal of hydrology</em> vol. 405 no. 3


pp. 439-450 2011.

Dept of CSE, GMRIT Page 15


Term Paper 2017

8. M. Wu, G. Lin, and H. Lin, “Improving the forecasts of extreme streamflow by

support vector regression with the data extracted by self-organizing map,”


Hydrological Processes, vol. 28, no. 2, pp. 386–397, 2014.

Dept of CSE, GMRIT Page 16

You might also like