You are on page 1of 41

Journal Pre-proofs

Research papers

Long lead-time daily and monthly streamflow forecasting using machine


learning methods

M. Cheng, F. Fang, T. Kinouchi, I.M. Navon, C.C. Pain

PII: S0022-1694(20)30836-2
DOI: https://doi.org/10.1016/j.jhydrol.2020.125376
Reference: HYDROL 125376

To appear in: Journal of Hydrology

Received Date: 21 April 2020


Accepted Date: 31 July 2020

Please cite this article as: Cheng, M., Fang, F., Kinouchi, T., Navon, I.M., Pain, C.C., Long lead-time daily and
monthly streamflow forecasting using machine learning methods, Journal of Hydrology (2020), doi: https://
doi.org/10.1016/j.jhydrol.2020.125376

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.


Long lead-time daily and monthly streamflow forecasting
using machine learning methods
M. Chenga , F. Fanga,∗, T. Kinouchib , I.M. Navonc , C.C. Paina
a
Applied Modelling and Computation Group, Department of Earth Science and Engineering, Imperial
College London, SW7 2BP, UK
b
School of Environment and Society, Tokyo Institute of Technology, Yokohama, Japan
c
Department of Scientific Computing, Florida State University, Tallahassee, FL, 32306-4120, USA

Abstract

Long lead-time streamflow forecasting is of great significance for water resources planning

and management in both the short and long terms. Despite of some studies using ma-

chine learning methods in streamflow forecasting, only few studies have been conducted to

explore long lead-time forecasting capabilities of these methods, and gain an insight into

systematic comparison of model forecasting performance in both the short and long terms.

In this work, an artificial neural network (ANN) and a long short term memory (LSTM), a

powerful tool for learning long-term temporal dependencies and capturing nonlinear rela-

tionship, have been adopted to forecast streamflow at daily and monthly scales for a long

lead-time period. For long lead-time streamflow forecasting, a recursive forecasting pro-

cedure, which takes the last one-step-ahead forecast as a new input for the next-step-ahead

forecast, is used in the ANN and LSTM forecasting systems. Two models are trained and

validated for streamflow forecasting using the rainfall and runoff datasets collected from

the Nan River Basin, Thailand, covering the period 1974 to 2014. To further explore the

impact of parameter settings on model performance, two parameters, i.e. the length of time


Corresponding author
Email address: f.fang@imperial.ac.uk (F. Fang)

Preprint submitted to Elsevier August 6, 2020


lag and the number of maximum epochs, are examined in the ANN and LSTM models.

The main findings are highlighted here. First, with an optimal setting up of model parame-

ters, both the ANN and LSTM model can provide accurate daily forecasting (up to 20 days

ahead). Second, in comparison to the ANN model, the LSTM model exhibits better model

performance in long lead-time daily forecasting, but less satisfactory in multi-monthly fore-

casting due to lack of large monthly training dataset. Third, the selection of the length of

the time lag and number of maximum epochs used in both ANN and LSTM modelling

are the key for long lead-time streamflow forecasting at daily and monthly scales. These

findings suggest that the LSTM could be advance in daily streamflow forecasting and thus

would be helpful to assist in strategy decisions in water resource management.


Keywords: Streamflow forecast, Artificial neural network, Long short term memory,

Long lead-time, Machine learning

1 1. Introduction

2 Accurate streamflow forecasting plays a crucial role for timely and effective water re-

3 source management, irrigation management decisions, flood risk evaluation, scheduling re-

4 leases and many other applications (Ni et al., 2019; Fathian et al., 2019; Tongal and Booij,

5 2018; Shafizadeh-Moghadam et al., 2018; Yaseen et al., 2015). Streamflow forecasts in

6 hourly, daily, monthly or even longer are important for optimizing the system or planning

7 for future expansion or reduction in the short and long terms (Kisi and Cimen, 2011; Liang

8 et al., 2019). As streamflow exhibits strong nonlinear dependence on hydrometeorological

9 and anthropogenic factors, it involves tremendous variability in spatial and temporal spaces

10 and is difficult to forecast in both the short and long terms (Milly et al., 2005; Nourani and

2
11 Komasi, 2013; Xiao et al., 2019).

12 Numerous hydrological models have been developed for forecasting streamflow, how-

13 ever, owing to numerous sources of uncertainty involved in streamflow forecasting, these

14 models have limited forecasting capability in capturing the non-stationary and non-linearity

15 characteristics of hydrologic datasets (Nourani et al., 2011; Shortridge et al., 2016; Yaseen

16 et al., 2017; Cheng et al., 2017). To deal with uncertainties in streamflow forecasting,

17 ensemble streamflow forecasting is widely used in hydrological forecasting (Cloke and

18 Pappenberger, 2009; Pappenberger et al., 2012; Fan et al., 2014; Duan et al., 2019). One

19 advantage of ensemble streamflow forecasting is that it contains a number of forecast sce-

20 narios for explicitly characterizing forecast uncertainties. An ensemble prediction system

21 could recognise the uncertainty in the initial conditions and perturb them to produce several

22 initial states (Emerton et al., 2016; Boelee et al., 2019). This allows streamflow forecast

23 driven by uncertainty to be assessed and forecast lead times to be extended (Boelee et al.,

24 2019, 2017). For example, Alfieri et al. (2013) proposed a Global Flood Awareness System

25 (GloFAS). It detected probabilistic exceedance of warning thresholds and utilized ensem-

26 ble streamflow predictions to provide upcoming flood forecasts in large world river basins.

27 However, it is still difficult to evaluate probabilistic forecasts by relying on a large number

28 of flood scenarios and this unavoidably raises computational challenges (Cloke and Pap-

29 penberger, 2009). Therefore, accurate streamflow forecasting in both the short and long

30 terms remains still a challenging task.

31 In recent decades, the use of data-driven techniques of machine learning methods, such

32 as neural networks (NNs) (Nourani and Komasi, 2013; Noori and Kalin, 2016), support

33 vector machines (SVMs) (Sudheer et al., 2014; Rasouli et al., 2012; Yaseen et al., 2016;

3
34 Adnan et al., 2019), fuzzy logic (Alvisi and Franchini, 2011), and wavelet transform (WT)

35 (Kisi and Cimen, 2011; Fang et al., 2019), have received considerable attention for stream-

36 flow forecasting in applications. Various studies have shown that machine learning methods

37 are capable of capturing non-linear processes numerically with no knowledge of the under-

38 lying physical processes involved (Yaseen et al., 2015; Rathinasamy et al., 2013; Prasad

39 et al., 2017; Alvisi and Franchini, 2011).

40 Among these machine learning methods, artificial neural network (ANN), as a self-

41 learning and self-adaptive function approximator, has shown great ability in modelling

42 nonlinear hydrologic datasets (Nourani and Komasi, 2013). The ANNs can recognize the

43 nonlinear relationships between inputs and outputs, and well reproduce strongly nonlinear

44 relationships, especially in case that these relationships are not known or cannot be made

45 explicit a priori (Alvisi and Franchini, 2011). The ANNs have been extensively applied

46 to forecast streamflow at various lead-times, such as forecasting one-day-ahead river flow

47 in the semiarid mountain region by He et al. (2014), forecasting daily lake levels up to 3

48 days ahead by Kisi et al. (2012), and predicting river flow for 5 days ahead by Badrzadeh

49 et al. (2013). Although widely used, some drawbacks associated with ANNs, such as over-

50 fitting (Shortridge et al., 2016; Sun et al., 2014) and convergence to local minima (Guo

51 et al., 2011; Kalra et al., 2013), make it difficult to achieve a satisfactory forecasting per-

52 formance for a long lead-time in dealing with time series hydrological processes.

53 Most recently, long short term memory (LSTM) has gained significant attention among

54 hydrologists. LSTM, introduced by Hochreiter and Schmidhuber (1997), has proved to be

55 a powerful tool for addressing time-series prediction problems (Hu et al., 2019). Compared

56 to classical NNs, LSTMs are able to capture both the periodic and chaotic behaviours of

4
57 time series data, and learn their long-range dependencies with greater accuracy (Mouatadid

58 et al., 2019). For example, Kratzert et al. (2018) successfully adopted the LSTM model to

59 describe the rainfall–runoff behaviour of a large number of complex catchments at daily

60 scale. Ni et al. (2019) developed two hybrid models, based on the traditional LSTM model,

61 for monthly streamflow and rainfall forecasting. Results proved that LSTM was applicable

62 for time series prediction. In addition, Hu et al. (2018) demonstrated that the LSTM model

63 outperformed the ANN model for flood forecasting up to 6 hour ahead. Similarly, Le et al.

64 (2019) explored the capabilities of the ANN and LSTM models for forecasting the one-day,

65 two-day, and three-day ahead flowrate at Hoa Binh. The results revealed that the LSTM

66 model could learn long-term dependencies between sequential data series and exhibited

67 good performance in flood forecasting.

68 While research on the ANN and LSTM models in the field of streamflow forecasting

69 has developed rapidly, some shortcomings still persist. First, most research focuses on

70 streamflow forecast at a specific time scale such as daily or monthly, and lacks a systematic

71 comparison of model forecasting performance in both the short and long terms (Yaseen

72 et al., 2015). Second, to obtain streamflow forecasting at different lead-times, most studies

73 have to construct multi-models with different pairs of inputs and outputs at a few succes-

74 sive lags (Hu et al., 2018; Nourani and Partoviyan, 2018). For example, Nourani (2017)

75 constructed three kinds of relationships in three models to obtain the forecasting values

76 at the lead-times of 2, 4 and 7 days, respectively. Although up to 6-8 day ahead or 1-2

77 month ahead streamflow forecasts was achieved in some research (Fathian et al., 2019), it

78 is not only time consuming to build multi-models, but also remains unknown for model

79 performance in longer lead-time forecasting. Third, parameter setting in machine learning

5
80 methods is the key technology in model development, especially for the impact of the time

81 lag length on streamflow forecasting (Dehghani et al., 2015). However, investigations of

82 parameter settings among those models are rarely explored.

83 The purpose of this paper is to identify the robust modelling approach from two popular

84 machine methods, namely ANN and LSTM, especially for the long lead-time forecasting

85 capability, through assessing both modelling accuracy and precision at daily and monthly

86 scales. Two models are applied into two realistic cases, the Nan River Basin (NRB) and

87 Ping River Basin (PRB), two of main subbasins of Chao Phraya River Basin (CPRB), the

88 heart of Thailand. The main objectives of this paper are: (1) to investigate the param-

89 eter settings on model forecasting performance at daily and monthly scales, and enable

90 a screening of the best setting of parameters to attain an accurate model; (2) to explore

91 the forecasting capabilities of machine learning methods for a long lead-time at daily and

92 monthly scales; (3) to compare model forecasting results at daily and monthly scales, pro-

93 vide a deep insight into the quality of model forecast and explore the way in which various

94 processes are represented.

95 The remainder of the paper is organized as follows. In section 2, the governing equa-

96 tions for multi-step-ahead streamflow forecast strategy are briefly formulated, and the daily

97 and monthly forecast models (ANN and LSTM) are introduced in detail. Section 3 presents

98 the study area-Nan River Basin and model development for streamflow forecasting. Fore-

99 casting performance of two models in daily and monthly scales is described in section 4.

100 Finally in section 5, conclusions are presented.

6
101 2. Methodology

102 In this paper, the long lead-time streamflow forecasting is conducted using both ANN

103 and LSTM at daily and monthly scales. The ANN and LSTM models are developed for

104 forecasting streamflow at several lead-times, such as 1-20 days and 1-12 months. In the

105 model development, the model for one-step-ahead forecasting is first obtained in the train-

106 ing process. Then a recursive strategy is adopted in the trained models so that the multi-

107 step-ahead forecast can be achieved. The following sections provide a detailed description

108 of the one-step-ahead and multi-step-ahead daily/monthly streamflow forecasting strategy.

109 2.1. Multi-step-ahead daily and monthly streamflow forecast strategy

110 2.1.1. One-step-ahead daily and monthly streamflow forecast

111 In general, a streamflow forecasting model F is used to provide the predictive stream-

112 flow Qt+1 in a specified time scale, such as daily or monthly, based on historical climatic

113 records. It is generally known that streamflow generation processes are influenced by many

114 factors, including rainfall, evaporation, temperature, etc (Guo et al., 2011). As rainfall, a

115 natural process that has a high degree of variability in both time and space, is the key factor

116 for streamflow, this paper focuses on developing a streamflow forecasting model depending

117 on rainfall. Given a series of rainfall observation {Rt−M +1 , ..., Rt , Rt+1 } (where M is the

118 length of time lag), an one-step-ahead streamflow forecast can be given as:

bt+1 = F(Rt−M +1 , ..., Rt , Rt+1 ),


Q (1)

7
119
bt+1 represents the predictive streamflow at time (t + 1), R denotes rainfall dataset,
where Q

120 F is a forecasting model.

121 In order to obtain a satisfactory forecast accuracy, a key problem is how to choose

122 the length of lagged inputs M in Eq.(1). Enough high-quality observed inputs enable the

123 pattern detection between rainfall and runoff, however, the length of inputs is limited by

124 availability of measurements in practice. In addition, as the lagged time length M increases,

125 so too does the number of inputs and the complexity of the forecasting model developed

126 (Bowden et al., 2005). In this paper, an investigation of impacts of the time lag length on

127 model forecasting performance is conducted in section 4.1.1 and section 4.2.1.

128 2.1.2. Multi-step-ahead daily and monthly streamflow forecast

129 The multi-step-ahead time series forecasting can be described as an estimation on future

130
bt+h (where h = (1, .., H), H is the total of predictive steps, also called the
time series Q

131 lead-time). The most popular forecasting strategy is the recursive (also called iterated)

132 strategy (Taieb et al., 2010; Bontempi et al., 2012). In this study, to achieve the multi-step-

133 ahead streamflow forecast, a single forecasting model F is trained to perform an one-step-

134 ahead forecast in section 2.1.1. After the learning process, the multi-step-ahead streamflow

135 can be predicted using:





 F(Rt−M +1 , ..., Rt , Rt+1 ), (h = 1)



bt+h =
Q bt+1 , ..., Q
F(Rt−M +h , ..., Rt+1 , Q bt+h−1 ), (h ∈ {2, ..., M + 1}) , (2)






 F(Qbt−M +h−1 , ..., Q bt+h−1 ), (h ∈ {M + 2, ..., H})

8
136 where H is the total of predictive steps, also the length of lead-time, M is the length of

137 time lag. The last equation in (2) is used for streamflow forecasting when the rainfall data

138 is not available during the predictive period. In this paper, the multi-step-ahead streamflow

139
bt+h at daily and monthly scales is obtained as following steps:
forecast Q

140 (1) Splitting a series of rainfall and runoff datasets into training datasets (Xtraining , Ytraining ) ∈

141 the training period (T0 , Tp ) and validation datasets (Xvalidated , Yvalidated ) ∈ the valida-

142 tion period (Tp , TN ). Let Xtraining = {Rt−M +1 , ..., Rt , Rt+1 } ((t + 1) ∈ (T0 , Tp ) ) as

143 inputs, and Ytraining = Qt+1 as the targeted outputs.

144 (2) Selecting the time lag length (M , the number of lagged rainfall records to use as inputs)

145 to create a combination of inputs and outputs.

146 (3) Training the forecasting model F by different combination parameters, e.g. the length

147 of time lag and the maximum epoch number, and selecting the best parameter combi-

148 nation for one-step-ahead streamflow forecasting.

149 (4) Selecting the length of lead-time H in Eq.(2).

150 (5) Using the Xvalidated = {Rt−M +1 , ..., Rt , Rt+1 }, ((t − M ) ∈ (Tp , TN )) as new inputs, to

151 forecast the next-step-ahead streamflow by the trained ANN model in step (3).

152 (6) Combining the one-step-ahead streamflow forecast in step (5) and the available rain-

153 fall records as new inputs, and repeating the step (5) to obtain the second-step-ahead

154 streamflow forecast.

155 (7) Repeating the steps (6) and (5) to forecast H-step-ahead streamflow in Eq.(2) during

156 (Tp , TN ).

157 In this work, the ANN and LSTM models are selected as the forecasting model F in

9
158 Eqs. (1) and (2) for daily and monthly streamflow forecasting in a recursive strategy (as

159 shown in Fig. 1, details in Algorithm 1), which is introduced in detail in section 2.2 and

160 section 2.3 respectively.

figures/forecast_map.png

Fig. 1. Model forecasting strategy using a recursive way at daily and monthly scales.

10
Algorithm 1 Multi-step-ahead forecast using ANN and LSTM at daily and monthly scales.
ANN and LSTM are used for learning the input-output relationship F in Eq.(2)
(1) Parameter optimization in training process.
• Obtain the optimal parameters in training process.
• Select the optimal time lag M .
• Select the optimal maximum epoch E.

(2) Multi-step-ahead streamflow forecast process.


• Input Xvalidated : Sample M +1 steps rainfall datasets {Rt−M +1 , ..., Rt , Rt+1 }.
• Output Yvalidated : Sample one step runoff dataset Qt+1 .
• Select the forecasting step H.
for h = 1 to H do
if h = 1 then
• Predict the next-step-ahead streamflow using the trained function F in Eq.(2):
bt+h = F(Rt−M +h , ..., Rt+1 ).
Q

else if h ≤ M + 1 then
• Predict the next-step-ahead streamflow using the trained function F in Eq.(2):
bt+h = F(Rt−M +h , ..., Rt+1 , Q
Q bt+1 , ..., Q
bt+h−1 ).

else
• Predict the next-step-ahead streamflow using the trained F in Eq.(2):
bt+h = F(Q
Q bt−M +h−1 , ..., Q
bt+h−1 ).

end if
end for
e
• Obtain the length-H steps of forecast Q.

161 2.2. Artificial neural network for daily and monthly streamflow forecast

162 ANN, a mathematical model of biologically inspired system, provides a novel and ap-

163 pealing solution to the problem of relating input and output variables in complex systems

164 (Basheer and Hajmeer, 2000). It requires no information about the underlying complex

165 physical process, while constructing black-box models of complex and nonlinear relation-

166 ships between the input and output variables. In general, an ANN model consists of three

167 typical layers. Each layer processes a series of neurons, which are fully connected with

11
168 those of the following layer. As illustrated in Fig. S1 (in the supplementary material), the

169 first layer is the input layer that receives the input data. The hidden layer is the informa-

170 tion processing section, which uses weights to achieve nonlinear transformation between

171 the connection links to determine the output. The output layer receives the processed in-

172 formation from the last hidden layer and then outputs the result. The ANN model can be

173 mathematically formulated as:

m
X n
X
Yk = σ k [ wkj σj ( wij Xi + bj ) + bk ] (3)
j=1 i=1

174 where Xi represents the input value at the neuron i, Yk is the output value at the neuron

175 k, σk and σj denote the activation function for the hidden and output layers respectively.

176 m and n are the number of neurons at the input and hidden layers respectively, wij is the

177 weight between the input neuron i and the hidden neuron j, wkj is the weight between the

178 hidden neuron j and the output neuron k, bj and bk are biases of the j th neuron at the hidden

179 layer and k th neuron at the output layer, respectively.

180 To achieve appropriate performance of ANNs, the parameters, for example, the number

181 of neurons, weights and biases between layers and the number of iterations, should be opti-

182 mized. In this study, a four-layer ANN network that consists of one input layer, two hidden

183 layers and one output layer is established. The sigmoid function (f (a) = 1/(1 + exp−a ))

184 and the hyperbolic tangent function (f (a) = tanh(a)) are employed between the layers.

185 Levenberg-Marquardt algorithm (Asadi et al., 2013; Alizadeh et al., 2017; Zhang et al.,

186 2018) is selected as the optimization algorithm for training the ANN model, which is a

187 way to adjust the parameters consisting of weights w and bias b between layers. Referring

12
188 to multi-step-ahead forecast steps in Section 2.1.2, the ANN forecasting approach at daily

189 and monthly scales (as shown in Fig. 1 is summarized in Algorithm 1.

190 2.3. Long short-term memory model (LSTM) for daily and monthly streamflow forecast

191 The LSTM architecture is composed of special units (called memory blocks) in the

192 recurrent hidden layers. Each memory block contains the self-connected memory cells

193 and multiplicative units. The memory cells are used to store the temporal state of the

194 networks. The multiplicative units, including the input, output and forget gates, controls

195 the flow of information between the cells. The input gate is responsible for controlling

196 the flow of inputs into the memory cell, while the output gate conducts the output flow

197 of cell activations. The forget gate scales the internal state of the cell. Fig. S2 (in the

198 supplementary material) shows the information flow through a LSTM cell.

The LSTM transition equations are written below:

fs = σ(Wf [hs−1 , xs ] + bf ), (4)

is = σ(Wi [hs−1 , xs ] + bi ), (5)

fs = tanh(Wc [hs−1 , xs ] + bc ),
C (6)

fs ,
Cs = fs ⊙ Cs−1 + is ⊙ C (7)

Os = σ(Wo [hs−1 , xs ] + bo ), (8)

hs = Os ⊙ tanh(Cs ), (9)

199 where is , Os and fs are the input, output and forget gates respectively, Wi , Wo , Wf and Wc

200 represent the weights for each gate, bi , bo , bf and bc are the bias terms, σ denotes the logistic

13
201 sigmoid function, tanh is the hyperbolic tangent function, ⊙ is the the scalar product of

202
fs is the updated cell state, xs and hs are the cell
two vectors, Cs represents the cell state, C

203 input and output respectively.

204 In this study, the LSTM forecasting approach consists of one input layer, two hidden

205 layers and one output layer. The LSTM is trained based on truncated Back Propagation

206 Through Time (BPTT), which uses a back propagation network to update the parameters in

207 iterations (Werbos, 1990). Referring to multi-step-ahead forecast steps in Section 2.1.2, the

208 LSTM model forecasting at daily and monthly scales similar to that of the ANN forecasting

209 approach, is described in Fig. 1 and Algorithm 1.

210 3. Application

211 3.1. Study area

212 The Chao Phraya River Basin (CPRB), the heart of Thailand, is the center of rice pro-

213 duction and the region economy (Wichakul et al., 2013). Nan River Basin (NRB) and Ping

214 River Basin (PRB) were selected as two study sites as shown in Fig. 2. The first study site

215 NRB, as one of major subbasins of CPRB, covers an area of 11, 950 km2 (Kinouchi et al.,

216 2018). The CPRB, located in the heart of Thailand, is the center of rice production and the

217 region economy (Wichakul et al., 2013). Nan River Basin (NRB) as one of major subbasins

218 of CPRB, covers the area of 11, 950 km2 (Kinouchi et al., 2018). This basin includes the

219 third largest dam reservoir -Sirikit Dam Reservoir in Thailand as shown in Fig. 2. The func-

220 tion of Sirikit Dam Reservoir is to supply domestic water, irrigation and power generation.

221 The river flow through the center of Nan province and drains into the reservoir. The basin

222 is in a subtropical monsoon region, with relatively abundant rainfall and a humid climate

14
223 from May to September. The average annual precipitation over the basin is 900-2000 mm.

224 To determine the relationship between rainfall and runoff, streamflow into the Sirikit Dam

225 Reservoir and rainfall data from gauges (as shown in Fig. 2) were collected at daily and

226 monthly scales from 1974 to 2014. The introduction and results of streamflow forecasts at

227 the second study site PRB, are provided in the supplementary material.

figures/map.png

Fig. 2. Location of the Nan River Basin (NRB) and Ping River Basin (PRB).

228 3.2. Daily and monthly model development

229 In this study, the ANN and LSTM models are employed to forecast streamflow at daily

230 and monthly scales. The daily and monthly dataset are split into the separations of training

15
231 and validation datasets as shown in Table 1. The model inputs are the available observed

232 rainfall, and predicted streamflow (output) from the previous time levels (t − M + 1, . . . , t)

233 (as shown in in Eq.(2)). The targeted output is the streamflow at the time level t + 1. All

234 datasets were scaled to the range of 0-1 to match the consistency of machine learning-based

235 models, and rescaled back to the original values after the model simulation (Zhang et al.,

236 2018).

Table 1: The training and validation datasets at daily and monthly scales.

Training period Validation period


Daily scale 1 January 1974 to 17 November 2007 18 November 2007 to 20 February 2011
Monthly scale January 1974 to March 2004 April 2004 to December 2014

237 To evaluate the model parameter impact, different setups of model parameters are em-

238 ployed in the ANN and LSTM models during training period in both daily and monthly

239 forecasting. The parameters settings used in two models are summarized in Table 2.

Table 2: The parameter settings of ANN and LSTM models at daily and monthly forecast.

Daily forecast (day) Monthly forecast (month)


Time lag length
1 3 5 10 15 20 25 30 35 40 1 2 3 4 5
Epochs (ANN) 100, 200, 300, 400, 500 100, 200, 300, 400, 500
Epochs (LSTM) 10, 20, 30, 40, 50 10, 20, 30, 40, 50

240 3.3. Model performance evaluation

241 The performance of the ANN and LSTM forecasting models developed in this study is

242 assessed using various standard statistical performance evaluation criteria. The statistical

243 measures used here are the root mean squared error (RMSE), the Nash-Sutcliffe efficiency

244 coefficient (NSE) (Nash and Sutcliffe, 1970), the coefficient of correlation (CC) (Taylor,

245 1990), and the mean absolute error (MAE) (Legates and McCabe Jr, 1999). These indica-

16
246 tors can be defined as follows:

247 (1) Root Mean Squared Error (RMSE):

v
uPn
u (Qs − Qo )
t i i
i=1
RM SE = , (10)
n

248 (2) Nash–Sutcliffe Efficiency Coefficient (NSE):

 
o
P
n
s 2
(Q
 i=1 i − Q i ) 
N SE = 1 − 
P o
, (11)
n o 2
(Qi − Q )
i=1

249 (3) Coefficient of Correlation (CC):

P
n s o
(Qsi − Q )(Qoi − Q )
i=1
CC = r , (12)
P
n s 2P
n o 2
(Qsi −Q ) (Qoi −Q )
i=1 i=1

250 (4) Mean Absolute Error (MAE):

n
1X s
M AE = |Qi − Qoi |, (13)
n i=1

s o
251 where Qsi and Qoi are the ith forecasting and observed streamflow respectively, Q and Q

252 are the average forecasting and observed streamflow, n is the number of datasets.

17
253 4. Results and Discussions

254 In this section, the observed streamflow in the Sirikit Dam Reservoir is compared with

255 forecasting streamflow from the ANN and LSTM models, combined with different model

256 parameters at daily and monthly scales over the validation period.

257 4.1. Daily streamflow forecast

258 4.1.1. Modelling parameter optimization in daily streamflow forecast

259 A number of ANN and LSTM modelling cases have been set up with varying parame-

260 ters (the length of the time lag and the maximum epoch number). The impact of modelling

261 parameters used in ANN and LSTM on results is demonstrated in Fig. 3 and Fig. 4. As

262 shown in Fig. 3, the average RMSE in the boxes is gradually decreased when the length of

263 the time lag increases from 1 day to 10 days and then remains almost the same (220 m3 /s)

264 when the time lag exceeds 10 days in ANN modelling, while the length of the time lag

265 has a slight effect on results of LSTM modelling. Fig. 4 shows that the average values

266 of RMSE in both the ANN and LSTM models exhibit a rising trend when the maximum

267 epoch number increases.

268 The impact of the number of lagged days and epochs used in ANN and LSTM mod-

269 elling is further estimated by the map of the RMSE of the forecasting daily streamflow

270 results shown in Fig. S3 (in the supplementary material, where the box is masked from

271 green to blue in Fig. S3 (a) and (b)). It is observed that the LSTM model achieves better

272 forecast performance than ANN models. The lowest and highest RMSE (outlined in red)

273 for ANN modelling are 149.25 m3 /s and 271.51 m3 /s respectively, while for LSTM mod-

274 elling, 118.51 m3 /s and 199.22 m3 /s respectively. The optimal settings of the lagged days

18
275 and the maximum epoch number are provided in Table 3. The best statistical result of ANN

276 modelling in one-step-ahead forecast is achieved with the time lag = 10 days and epochs

277 = 200. For LSTM modelling, the best statistical results are obtained for 25 days and 10

278 respectively.

279 Overall, the ANN and LSTM models exhibit different forecasting performance with

280 varying parameter settings. It can be concluded that (1) an optimal time lag can be selected

281 to improve the ANN performance, (2) the effects of the length of time lag for the LSTM

282 model are uncertain in one-day-ahead forecast, and (3) the increase of the number of epochs

283 deteriorates the ANN and LSTM performance in one-day-ahead forecast.

19
figures/RMSE_lag_daily.png

Fig. 3. Impact of the length of the time lag on streamflow daily forecasting, where the RMSE of streamflow
results using (a) ANN, and (b) LSTM.

20
figures/RMSE_epoch_daily.png

Fig. 4. Impact of the maximum number of epochs on streamflow daily forecasting, where the RMSE of
streamflow results using (a) ANN, and (b) LSTM.

Table 3: The optimal parameter settings in daily forecasting.

Time lag (days) Epochs RMSE (m3 /s)


ANN 10 200 149.56
LSTM 25 20 118.51

21
284 4.1.2. One-day-ahead streamflow forecast

285 Fig. 5 shows the daily hydrographs at Sirikit Dam Reservoir in one-day-ahead forecast

286 from 2007 to 2010 using optimal parameters in Table 3. As shown in Fig. 5, it can be

287 noticed that the forecasting streamflow exhibits good agreement with the observed stream-

288 flow. The streamflow forecast using the ANN and LSTM models could capture the daily

289 variability, but underestimates the streamflow in the dry season while overestimates it in the

290 wet season. By using LSTM model, the predicted streamflow shows a reduction of over-

291 estimation in the wet season compared to that using the ANN model, and exhibits a good

292 match with the observations in Fig. 5. A comparison between the ANN and LSTM models

293 indicates that the LSTM model performs better than the ANN model in daily streamflow

294 forecasting. For example, the R2 is 0.86 for LSTM modelling while it is 0.76 in 2008

295 streamflow forecast for ANN modelling. In addition, it is noted that the number of epochs

296 required for LSTM model convergence is much smaller than the ANN model while the

297 number of the time lag has little effect on LSTM model results. This indicates that the

298 LSTM model is more robust and can extract nonlinearity characteristics of data more effi-

299 ciently (Zhang et al., 2018) in comparison to the ANN model.

22
figures/Prediction_daily.png

Fig. 5. Comparison of forecasting and observed daily streamflow using the ANN and LSTM models with the
optimal parameters in Table 3.

300 4.1.3. Multi-day-ahead streamflow forecast

301 The ANN and LSTM models have been further applied to multi-day-ahead streamflow

302 forecasting. The comparative plots of the forecasting results obtained from the ANN and

303 LSTM models at the lead-time = 1 day, 2 day, 3 day, 4 day, and 5 day are shown in Fig. 6.

304 It can be observed that a relatively good agreement between the observed and forecasting

305 streamflow, especially the streamflow peak is achieved at the lead-time = 1 day and 2 day.

23
306 With increasing the lead-time, the arrival time of forecasting streamflow peaks becomes

307 slightly delayed, which is caused by the accumulation of forecast errors. Compared with

308 low streamflow simulation of the LSTM model from 5 October 2008 to 26 October 2008,

309 the streamflow peaks of the ANN model are generally higher than observed streamflow. In

310 addition, the predicted streamflow exhibits abnormal fluctuations in the ANN simulation.

311 Four indicators, RMSE, NSE, CC, and MAE, are chosen to evaluate the multi-day-

312 ahead forecast performance during the validation period (from 18 November 2007 to 20

313 February 2011). The corresponding evaluation of forecasting streamflow results during

314 the lead-times of 1-20 days is illustrated in Fig. 7. It can be seen that the LSTM model

315 generally outperforms the ANN model with the smaller RMSE and MAE (in Fig. 7 (a) and

316 (d)), and higher NSE and CC especially at the longer lead-times (in Fig. 7 (b) and (c)).

317 It is also visible that the predicted accuracy of the ANN and LSTM models decays as the

318 forecasting horizon extends further in time (i.e. with the increasing lead-time length). As

319 the lead-time increases from 1 to 20 days, the RMSE and MAE in ANN modelling increase

320 approximately by 254% and 232% respectively, while in LSTM modelling, the RMSE and

321 MAE increase approximately by 291% and 215% respectively. The NSE ranges between

322 0.1 and 0.68 for the ANN model, while between 0.33 and 0.78 for the LSTM model. The

323 CC decreases from 0.83 to 0.59 for the ANN model and from 0.89 and 0.64 for the LSTM

324 model as the lead-time increases.

325 The results suggest that the forecasting performance for the LSTM model is better than

326 that of the ANN model for long lead-time daily forecasting, which may be explained by the

327 model structure. For LSTM model, it processes the time series datasets as a sequence and

328 one element as input at a time, and the past temporal information is stored in the memory

24
329 cell, which helps the LSTM model to capture datasets trend and to exhibit more powerful

330 forecasting capability than the ANN model (Zhang et al., 2018; Kratzert et al., 2018; Le

331 et al., 2019). This enables a memory mechanism in the LSTM model, where the network

332 uses information about past calculations from a few past steps to inform the decision of

333 whether or not this information should be passed along to the next iteration (Mouatadid

334 et al., 2019). As the LSTM model processes the input data in many time steps, the input

335 data are used to update a number of parameters in the LSTM internal memory cell states

336 in every step during a training period. During the prediction period, the memory cell states

337 depend only on the input at a specific time step and the states from the last time step

338 (Kratzert et al., 2018). However, the ANN model does not have a temporal memory and

339 the inputs in the model are assumed to be independent of each other, so that it is difficult

340 to recognise temporal changes (de la Fuente et al., 2019). Therefore, memory cells in the

341 model structure help the LSTM model capture datasets trend and exhibit more powerful

342 forecasting capability than the ANN model.

25
figures/Daily_lead.png

Fig. 6. Forecast performance at the lead-times 1, 2, 3, 4 and 5 days: (a) with the ANN model and (b) with the
LSTM model.

26
figures/Daily_forecast.png

Fig. 7. Forecast performance with the ANN and LSTM models over the validation period of year 2007 to
2011 at the lead-times of 1-20 days: (a) RMSE, (b) NSE, (c) CC and (d) MAE.

343 4.2. Monthly streamflow forecast

344 4.2.1. Modelling parameter optimization in monthly streamflow forecast

345 For optimal setting up of modelling parameters used in ANN and LSTM, we have

346 undertaken a number of test cases with varying the time lag length and epoch number.

347 The error estimate of results with different modelling parameters has been plotted in Fig. 8

348 and Fig. 9. As shown in Fig. 8, it can be noticed that the average RMSE in the boxes is

27
349 diminishing rapidly as the length of the time lag rises from 1 month to 5 months in ANN

350 modelling, while the length of the time lag has no obvious impact on results in LSTM

351 modelling. In contrast, the average values of RMSE exhibit a descending trend when the

352 maximum number of epochs increases in LSTM modelling, however, there is no obvious

353 trend of epochs effecting on ANN modelling in Fig. 9.

354 Fig. S4 (in the supplementary material) shows the map of the RMSE of the forecast-

355 ing daily streamflow results in ANN and LSTM modelling. By Comparing Fig. S4 (a)

356 and (b), it is obvious that the ANN model is sensitive to the time lag selection while the

357 LSTM model is more susceptible to the maximum epoch number. For example, in ANN

358 modelling, the RSME of results is increased from 3800 m3 /s (blue box) to 1500 m3 /s

359 (yellow box) when the time lag rises from 1 to 5 months with a fixed maximum epoch

360 number in Fig. S4 (a). Similarly, in Fig. S4 (b), the performance of the LSTM model

361 becomes better when the maximum epoch number gradually increases from 10 to 50 with

362 the time lag = 1 month. The lowest and highest RMSE (outlined in red) for the ANN

363 model are 1660.16 m3 /s and 3738.63 m3 /s respectively, while those of the LSTM model

364 are 1698.97 m3 /s and 2535.57 m3 /s respectively. The highest accuracy statistics of the

365 ANN and LSTM models at monthly scale are given in Table 4. The best statistical result

366 of the ANN model in one-month-ahead forecast is achieved with the time lag = 4 months

367 and epochs = 500. For the LSTM model, the time lag and maximum epoch numbers are 2

368 months and 50, respectively. The results indicate that the ANN model is comparable to the

369 LSTM model in monthly streamflow forecast.

370 Overall, for the influence of different parameter settings on the ANN and LSTM model

371 performance, it can be concluded that (1) increasing the time lag could improve the ANN

28
372 model accuracy in one-month-ahead forecast while remain inconspicuous for the LSTM

373 model, and (2) the increase of maximum epochs could improve the LSTM accuracy but

374 remains uncertain for the ANN model in one-month-ahead forecast.

figures/RMSE_lag_monthly.png

Fig. 8. Impact of the time lag on streamflow monthly forecasting, where the RMSE of streamflow results
using (a) ANN, and (b) LSTM.

29
figures/RMSE_epoch_monthly.png

Fig. 9. Impact of the maximum number of epochs on streamflow monthly forecasting, where the RMSE of
streamflow results using (a) ANN, and (b) LSTM.

Table 4: The optimal parameter settings in monthly forecasting.

Time lag (months) Epochs RMSE (m3 /s)


ANN 4 500 1660.16
LSTM 2 50 1598.97

375 4.2.2. One-month-ahead streamflow forecast

376 Fig. 10 displays the monthly hydrographs at Sirikit Dam Reservoir in one-month-ahead

377 forecast from year 2004 to 2012 using the optimal parameters in Table 4. It is shown in

30
378 Fig. 10 that the forecasting streamflow exhibits good agreement with the observed stream-

379 flow. The ANN and LSTM models not only capture the monthly streamflow variability, but

380 also successfully predict the low streamflow in dry season. During the validation period

381 (from April of 2004 to December of 2013), the R2 is 0.94 and 0.95 for the ANN and LSTM

382 models, respectively.

figures/Prediction_monthly.png

Fig. 10. Comparison of forecasting and observed monthly streamflow using the ANN and LSTM models
with the optimal parameters in Table 4.

31
383 4.2.3. Multi-month-ahead streamflow forecasting

384 The ANN and LSTM models have been further applied to multi-month-ahead stream-

385 flow forecasting. Fig. 11 gives the comparative plots of the forecasting results obtained

386 from the ANN and LSTM models at the lead-time = 1 month, 2 month, and 3 month. One

387 observes a relatively good match between the observed and forecasting streamflow from

388 two models in the first and second month ahead forecast. Low streamflow is accurately

389 forecasted, and high streamflow events are also properly captured although the arrival time

390 of forecasting streamflow peaks is slightly delayed in multi-month-ahead forecasting.

figures/Monthly_lead.png

Fig. 11. Forecast performance at lead-times 1, 2 and 3 months: (a) with the ANN model and (b) with the
LSTM model.

32
391 Again, four indicators are used to evaluate the multi-month-ahead forecast performance

392 during the validation period (from April of 2004 to December of 2014). The results from

393 these indicators for the forecast performance at the lead-times of 1-12 months are illus-

394 trated in Fig. 12. It is seen that as the lead-time length increases from 1 to 12 months, the

395 RMSE and MAE of results in ANN modelling increase approximately by 499% and 457%,

396 respectively, while in LSTM modelling, approximately by 442% and 425%, respectively.

397 Bearing in mind that NSE ≤ 0 indicates the model forecasts are unreliable, we can see that

398 the LSTM model can perform accurately streamflow forecast only for a one-month-lead

399 period (afterwards, NSE ≤ 0), while the ANN model is able to forecasting streamflow up

400 to 2 months ahead (Fig. 12(b)). The CC of forecasting streamflow in both the ANN and

401 LSTM models rapidly decreases from 0.97 to < 0.20 for the first 3-month predictive period.

402 Both model performance and forecast reliability is diminished as the lead-time increases as

403 shown in Fig. 12. The main reason for this could be the autocorrelation, which decreases in

404 time series, making time series much less predictable with rising the lead-time (Liu et al.,

405 2014).

406 It is noticed that the LSTM gradually fails to capture flow peaks in some occasions in

407 the third month ahead forecast. In contrast to its good performance in long lead-time daily

408 forecasting, the LSTM model fails in multi-month-ahead forecasting. The reason for this is

409 that in this case study, the number of observed monthly datasets available is much smaller

410 than that of daily datasets. For accurate prediction in LSTM, a large number of datasets

411 is required for learning the long-term dependencies between the input and output datasets

412 during the training process. This finding is similar to that of Kratzert et al. (2018), where

413 the authors found that the data intensive nature of the LSTMs was a potential barrier for

33
414 applying them in data-scarce problems.

figures/Monthly_forecast.png

Fig. 12. Forecast performance with the ANN and LSTM models at lead-times of 1-12 months: (a) RMSE,
(b) NSE, (c) CC and (d) MAE.

415 5. Conclusion

416 In this study, a recursive forecasting framework has been developed, to explore the

417 long lead-time forecasting capabilities of the ANN and LSTM models at daily and monthly

418 scales. The impact of the selection of parameters (the length of time lag and the number of

419 maximum epochs) on model performance is also explored in ANN and LSTM modelling.

420 The proposed models have been applied to a realistic case study, the Nan River Basin

34
421 (NRB), as one of major subbasins of the Chao Phraya River Basin (CPRB) in the heart of

422 Thailand. The main findings are listed as follows:

423 (1) In daily streamflow forecasting, the LSTM model outperforms the ANN model up to

424 20 days ahead. The forecasting streamflow achieves a relatively good agreement with

425 the observed streamflow, especially for the streamflow peak.

426 (2) In monthly streamflow forecasting, the situation becomes reverse in the LSTM model,

427 where the forecasts of the ANN model turn out to be better with the increase of lead-

428 time up to 12 months ahead. High streamflow events could be properly captured by the

429 ANN model, except for a slight flow peak delayed.

430 (3) The length of time lag and the number of epochs exert great impacts on model forecast-

431 ing performance at daily and monthly scales. This indicates that parameter optimiza-

432 tion at different time scales could be effective to enhance the long lead-time forecasting

433 accuracy of the machine learning models in the short and long terms.

434 Overall, our study shows that the LSTM method is superior to the ANN model in the

435 daily streamflow forecasting for a long lead-time. This could provide an insight to water

436 resource managers to infer daily river discharge ahead in realistic hydrological applications.

437 While for monthly streamflow forecast, the forecasting accuracy of both models is still

438 restricted over two or three months ahead. This could be a subject of future research efforts.

439 Providing additional climatic factors into the presented models such as temperature and

440 humidity, or developing hybrid models such as machine learning methods combined with

441 data assimilation, would improve model accuracy in monthly streamflow forecast for a long

442 lead-time.

35
443 Acknowledgments

444 This work was supported by EPSRC (MAGIC (EP/N010221/1) and INHALE (EP/T003189/1),

445 and the Royal Society (IEC/ NS- FC/170563) in the UK. We would like to thank Eishi Ki-

446 tano for sharing data to this article. The authors acknowledge the reviewers and Editor for

447 their in depth perspicacious comments that contributed to improving the presentation of

448 this paper.

Adnan, R.M., Liang, Z., Heddam, S., Zounemat-Kermani, M., Kisi, O., Li, B., 2019. Least
square support vector machine and multivariate adaptive regression splines for stream-
flow prediction in mountainous basin using hydro-meteorological data as inputs. Journal
of Hydrology , 124371.

Alfieri, L., Burek, P., Dutra, E., Krzeminski, B., Muraro, D., Thielen, J., Pappenberger, F.,
2013. Glofas-global ensemble streamflow forecasting and flood early warning. Hydrol-
ogy and Earth System Sciences 17, 1161.

Alizadeh, M.J., Kavianpour, M.R., Kisi, O., Nourani, V., 2017. A new approach for simu-
lating and forecasting the rainfall-runoff process within the next two months. Journal of
hydrology 548, 588–97.

Alvisi, S., Franchini, M., 2011. Fuzzy neural networks for water level and discharge fore-
casting with uncertainty. Environmental Modelling & Software 26, 523–37.

Asadi, S., Shahrabi, J., Abbaszadeh, P., Tabanmehr, S., 2013. A new hybrid artificial neural
networks for rainfall–runoff process modeling. Neurocomputing 121, 470–80.

Badrzadeh, H., Sarukkalige, R., Jayawardena, A., 2013. Impact of multi-resolution analysis
of artificial intelligence models inputs on multi-step ahead river flow forecasting. Journal
of Hydrology 507, 75–85.

Basheer, I.A., Hajmeer, M., 2000. Artificial neural networks: fundamentals, computing,
design, and application. Journal of microbiological methods 43, 3–31.

Boelee, L., Lumbroso, D., Samuels, P., Stephens, E., Cloke, H., 2017. A review of the
understanding of uncertainty in a flood forecasting system and the available methods of
dealing with it .

Boelee, L., Lumbroso, D.M., Samuels, P.G., Cloke, H.L., 2019. Estimation of uncertainty
in flood forecasts—a comparison of methods. Journal of Flood Risk Management 12,
e12516.

36
Bontempi, G., Taieb, S.B., Le Borgne, Y.A., 2012. Machine learning strategies for time
series forecasting, in: European business intelligence summer school, Springer. pp. 62–
77.

Bowden, G.J., Dandy, G.C., Maier, H.R., 2005. Input determination for neural network
models in water resources applications. part 1—background and methodology. Journal
of Hydrology 301, 75–92.

Cheng, M., Wang, Y., Engel, B., Zhang, W., Peng, H., Chen, X., Xia, H., 2017. Per-
formance assessment of spatial interpolation of precipitation for hydrological process
simulation in the three gorges basin. Water 9, 838.

Cloke, H., Pappenberger, F., 2009. Ensemble flood forecasting: A review. Journal of
hydrology 375, 613–26.

Dehghani, M., Saghafian, B., Rivaz, F., Khodadadi, A., 2015. Monthly stream flow fore-
casting via dynamic spatio-temporal models. Stochastic environmental research and risk
assessment 29, 861–74.

Duan, Q., Pappenberger, F., Wood, A., Cloke, H.L., Schaake, J., 2019. Handbook of
Hydrometeorological Ensemble Forecasting. Springer.

Emerton, R.E., Stephens, E.M., Pappenberger, F., Pagano, T.C., Weerts, A.H., Wood, A.W.,
Salamon, P., Brown, J.D., Hjerdt, N., Donnelly, C., et al., 2016. Continental and global
scale flood forecasting systems. Wiley Interdisciplinary Reviews: Water 3, 391–418.

Fan, F.M., Collischonn, W., Meller, A., Botelho, L.C.M., 2014. Ensemble streamflow
forecasting experiments in a tropical basin: The são francisco river case study. Journal
of Hydrology 519, 2906–19.

Fang, W., Huang, S., Ren, K., Huang, Q., Huang, G., Cheng, G., Li, K., 2019. Examining
the applicability of different sampling techniques in the development of decomposition-
based streamflow forecasting models. Journal of hydrology 568, 534–50.

Fathian, F., Mehdizadeh, S., Sales, A.K., Safari, M.J.S., 2019. Hybrid models to improve
the monthly river flow prediction: Integrating artificial intelligence and non-linear time
series models. Journal of Hydrology 575, 1200–13.

de la Fuente, A., Meruane, V., Meruane, C., 2019. Hydrological early warning system
based on a deep learning runoff model coupled with a meteorological forecast. Water
11, 1808.

Guo, J., Zhou, J., Qin, H., Zou, Q., Li, Q., 2011. Monthly streamflow forecasting based on
improved support vector machine model. Expert Systems with Applications 38, 13073–
81.

He, Z., Wen, X., Liu, H., Du, J., 2014. A comparative study of artificial neural network,
adaptive neuro fuzzy inference system and support vector machine for forecasting river
flow in the semiarid mountain region. Journal of Hydrology 509, 379–86.

37
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9,
1735–80.

Hu, C., Wu, Q., Li, H., Jian, S., Li, N., Lou, Z., 2018. Deep learning with a long short-term
memory networks approach for rainfall-runoff simulation. Water 10, 1543.

Hu, R., Fang, F., Pain, C., Navon, I., 2019. Rapid spatio-temporal flood prediction and
uncertainty quantification using a deep learning method. Journal of Hydrology .

Kalra, A., Ahmad, S., Nayak, A., 2013. Increasing streamflow forecast lead time for
snowmelt-driven catchment based on large-scale climate patterns. Advances in Water
Resources 53, 150–62.

Kinouchi, T., Yamamoto, G., Komsai, A., Liengcharernsit, W., 2018. Quantification of
seasonal precipitation over the upper chao phraya river basin in the past fifty years based
on monsoon and el niño/southern oscillation related climate indices. Water 10, 800.

Kisi, O., Cimen, M., 2011. A wavelet-support vector machine conjunction model for
monthly streamflow forecasting. Journal of Hydrology 399, 132–40.

Kisi, O., Shiri, J., Nikoofar, B., 2012. Forecasting daily lake levels using artificial intelli-
gence approaches. Computers & Geosciences 41, 169–80.

Kratzert, F., Klotz, D., Brenner, C., Schulz, K., Herrnegger, M., 2018. Rainfall–runoff
modelling using long short-term memory (lstm) networks. Hydrol. Earth Syst. Sci 22,
6005–22.

Le, X.H., Ho, H.V., Lee, G., Jung, S., 2019. Application of long short-term memory (lstm)
neural network for flood forecasting. Water 11, 1387.

Legates, D.R., McCabe Jr, G.J., 1999. Evaluating the use of “goodness-of-fit” measures in
hydrologic and hydroclimatic model validation. Water resources research 35, 233–41.

Liang, Z., Xiao, Z., Wang, J., Sun, L., Li, B., Hu, Y., Wu, Y., 2019. An improved chaos
similarity model for hydrological forecasting. Journal of Hydrology 577, 123953.

Liu, Z., Zhou, P., Chen, G., Guo, L., 2014. Evaluating a coupled discrete wavelet transform
and support vector regression for daily and monthly streamflow forecasting. Journal of
hydrology 519, 2822–31.

Milly, P.C., Dunne, K.A., Vecchia, A.V., 2005. Global pattern of trends in streamflow and
water availability in a changing climate. Nature 438, 347–50.

Mouatadid, S., Adamowski, J.F., Tiwari, M.K., Quilty, J.M., 2019. Coupling the maximum
overlap discrete wavelet transform and long short-term memory networks for irrigation
flow forecasting. Agricultural Water Management 219, 72–85.

Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part
i—a discussion of principles. Journal of hydrology 10, 282–90.

38
Ni, L., Wang, D., Singh, V.P., Wu, J., Wang, Y., Tao, Y., Zhang, J., 2019. Streamflow and
rainfall forecasting by two long short-term memory-based models. Journal of Hydrology
, 124296.

Noori, N., Kalin, L., 2016. Coupling swat and ann models for enhanced daily streamflow
prediction. Journal of Hydrology 533, 141–51.

Nourani, V., 2017. An emotional ann (eann) approach to modeling rainfall-runoff process.
Journal of Hydrology 544, 267–77.

Nourani, V., Kisi, Ö., Komasi, M., 2011. Two hybrid artificial intelligence approaches for
modeling rainfall–runoff process. Journal of Hydrology 402, 41–59.

Nourani, V., Komasi, M., 2013. A geomorphology-based anfis model for multi-station
modeling of rainfall–runoff process. Journal of Hydrology 490, 41–55.

Nourani, V., Partoviyan, A., 2018. Hybrid denoising-jittering data pre-processing ap-
proach to enhance multi-step-ahead rainfall–runoff modeling. Stochastic environmental
research and risk assessment 32, 545–62.

Pappenberger, F., Dutra, E., Wetterhall, F., Cloke, H.L., 2012. Deriving global flood hazard
maps of fluvial floods through a physical model cascade. Hydrology and Earth System
Sciences 16, 4143–56.

Prasad, R., Deo, R.C., Li, Y., Maraseni, T., 2017. Input selection and performance opti-
mization of ann-based streamflow forecasts in the drought-prone murray darling basin
region using iis and modwt algorithm. Atmospheric Research 197, 42–63.

Rasouli, K., Hsieh, W.W., Cannon, A.J., 2012. Daily streamflow forecasting by machine
learning methods with weather and climate inputs. Journal of Hydrology 414, 284–93.

Rathinasamy, M., Adamowski, J., Khosa, R., 2013. Multiscale streamflow forecasting
using a new bayesian model average based ensemble multi-wavelet volterra nonlinear
method. Journal of Hydrology 507, 186–200.

Shafizadeh-Moghadam, H., Valavi, R., Shahabi, H., Chapi, K., Shirzadi, A., 2018. Novel
forecasting approaches using combination of machine learning and statistical models for
flood susceptibility mapping. Journal of environmental management 217, 1–11.

Shortridge, J.E., Guikema, S.D., Zaitchik, B.F., 2016. Machine learning methods for em-
pirical streamflow simulation: a comparison of model accuracy, interpretability, and un-
certainty in seasonal watersheds. Hydrology & Earth System Sciences 20.

Sudheer, C., Maheswaran, R., Panigrahi, B.K., Mathur, S., 2014. A hybrid svm-pso model
for forecasting monthly streamflow. Neural Computing and Applications 24, 1381–9.

Sun, A.Y., Wang, D., Xu, X., 2014. Monthly streamflow forecasting using gaussian process
regression. Journal of Hydrology 511, 72–81.

39
Taieb, S.B., Sorjamaa, A., Bontempi, G., 2010. Multiple-output modeling for multi-step-
ahead time series forecasting. Neurocomputing 73, 1950–7.

Taylor, R., 1990. Interpretation of the correlation coefficient: a basic review. Journal of
diagnostic medical sonography 6, 35–9.

Tongal, H., Booij, M.J., 2018. Simulation and forecasting of streamflows using machine
learning models coupled with base flow separation. Journal of hydrology 564, 266–82.

Werbos, P.J., 1990. Backpropagation through time: what it does and how to do it. Proceed-
ings of the IEEE 78, 1550–60.

Wichakul, S., Tachikawa, Y., Shiiba, M., Yorozu, K., 2013. Development of a flow routing
model including inundation effect for the extreme flood in the chao phraya river basin,
thailand 2011. Journal of Disaster Research 8, 415.

Xiao, Z., Liang, Z., Li, B., Hou, B., Hu, Y., Wang, J., 2019. New flood early warning and
forecasting method based on similarity theory. Journal of Hydrologic Engineering 24,
04019023.

Yaseen, Z.M., Ebtehaj, I., Bonakdari, H., Deo, R.C., Mehr, A.D., Mohtar, W.H.M.W., Diop,
L., El-Shafie, A., Singh, V.P., 2017. Novel approach for streamflow forecasting using a
hybrid anfis-ffa model. Journal of Hydrology 554, 263–76.

Yaseen, Z.M., El-Shafie, A., Jaafar, O., Afan, H.A., Sayl, K.N., 2015. Artificial intelligence
based models for stream-flow forecasting: 2000–2015. Journal of Hydrology 530, 829–
44.

Yaseen, Z.M., Kisi, O., Demir, V., 2016. Enhancing long-term streamflow forecasting and
predicting using periodicity data component: application of artificial intelligence. Water
resources management 30, 4125–51.

Zhang, D., Lin, J., Peng, Q., Wang, D., Yang, T., Sorooshian, S., Liu, X., Zhuang, J.,
2018. Modeling and simulating of reservoir operation using the artificial neural network,
support vector regression, deep learning algorithm. Journal of hydrology 565, 720–36.

40

You might also like