You are on page 1of 21

Journal of Hydrology: Regional Studies 46 (2023) 101357

Contents lists available at ScienceDirect

Journal of Hydrology: Regional Studies


journal homepage: www.elsevier.com/locate/ejrh

Application of deep learning algorithms to confluent flow-rate


forecast with multivariate decomposed variables
Njogho Kenneth Tebong a, d, *, Théophile Simo d, e, Armand Nzeukou Takougang d,
Alain Tchakoutio Sandjon b, c, d, Ntanguen Patrick Herve a, d
a
Research Unit Condensed Matter, Electronics and Signal Processing, Department of Physics, Faculty of Sciences, University of Dschang, PO Box 67,
Dschang, Cameroon
b
Department of Computer Science including Basic sciences, Higher Technical Teacher’s Training College Kumba, University of Buea, P.O box 249,
Buea Road, Kumba, Cameroon
c
Laboratory of environmental modeling and atmospheric Physics, University of Yaoundé 1, Yaoundé, Cameroon
d
Laboratory of Industrial Systems and environmental engineering, Fotso Victor university institute of Technology, University of Dschang, Bandjoun,
Cameroon
e
Institut Universitaire de Technologie Fotso Victor de Bandjoun, B.P.: 134, Bandjoun, Cameroon

A R T I C L E I N F O A B S T R A C T

Keywords: Study region: Song bengue confluent in Cameroon regulates the river flow rate for hydro energy
Confluent production with input from four upstream reservoirs.
Flow rate Study focus: Deep learning models forecast a day flow rate of the Song bengue confluent.
Time series
Decomposed time series multivariate variables of flow rate, precipitation, and upstream reservoir
Forecast
inflows, outflows, and precipitation are used. Different windows and horizons for the forecast are
Deep learning
Statistical indexes analyzed using deep learning models. A comparative study among the models is carried out. Input
parameters are decomposed and different partitions are used as scenarios for the best partition.
New hydrological insight: A 7-day window and 1-day forecast yield the lowest error. The dense
model is the best among the models followed by the Long-short term memory (LSTM) model, and
lastly, the one-dimensional convolutional neural network (Conv1D) based on mean absolute error
(MAE), mean square error (MSE), root mean squared error (RMSE), and Nash Sutcliff Efficiency
(NSE). Using the scenario with all decomposed variables produces the best result with about a
50% difference in error margin. The second-best result is obtained by using only undecomposed
data. The remainder component should not be ignored as it contains important hydrological
information.

1. Introduction

The river flow rate is essential for the efficient management of a river basin (Patel et al., 2015). It equally helps in decision-making
for maximum water utilization. Understanding the flow rate of a river is necessary for the maximization of hydro energy production
(Ahmad and Hossain, 2019). Machine learning methods have been applied to study river flow rate and reservoir inflow forecasting to
maximize the hydro electricity production (Jothiprakash et al., 2012, Wang et al., 2014, Cheng et al., 2015, Rezaie-Balf et al., 2019,

* Corresponding author at: Research Unit Condensed Matter, Electronics and Signal Processing, Department of Physics, Faculty of Sciences,
University of Dschang, PO Box 67, Dschang, Cameroon.
E-mail address: kennethtebong@yahoo.com (N.K. Tebong).

https://doi.org/10.1016/j.ejrh.2023.101357
Received 29 November 2022; Received in revised form 18 February 2023; Accepted 2 March 2023
Available online 4 March 2023
2214-5818/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Bashir et al., 2019, Pini et al., 2020, Bordin et al., 2020, Zhou et al., 2022, Sharifi et al., 2022). Recurrent neural networks (RNN) with
ordinary differential equations have been applied to forecast reservoir inflow rate with better results obtained compared to traditional
neural networks (Zhou and Li, 2021). RNN models produce results with good correlation for short intervals, and when the interval
increases they become less efficient. This problem has been handled with the advent of Long Short Term Memory (LSTM) neural
networks and gives good results even with a longer time (Debasish., 2022).
To increase the accuracy of forecasting parameters, ensemble methods that make use of the strength of different models have been
applied in machine learning to increase the accuracy of the predicted parameter. The advantages of ensemble models for forecasting
include reduced overfitting and generalization (Bai et al., 2016; Yang et al., 2017), improved uncertainty handling (Zhao et al., 2018),
robustness (Yu et al., 2018), better functioning with limited data (Rezaie-Balf et al., 2019), and combining multiple models with
different architectures or parameters (Kishore et al., 2020; Hong et al., 2020; Luo et al., 2020; Li et al., 2021; Zhang et al., 2021; Gupta
et al., 2022).
Efficient preprocessing of input data before fitting into the model ameliorates the correlation between predicted and measured
parameters (Yousefi et al., 2022) in addition to good model building for good correlation. Qi et al. (2019) applied an ensemble learning
model on each component of the decomposed input variable and used LSTM deep learning model to forecast each component before
using the result as input to forecast the final output for a day ahead reservoir inflow forecasting. Data pre-processing and efficient
model building greatly improve the correlation between measured and predicted flow rates used to maximize hydroelectricity pro­
duction (Ghayekhloo et al., 2015; Jaramillo-Morán et al., 2021; Meydani et al., 2022; Yousefi et al., 2022).
The electrical energy sector of Cameroon is highly hydro dominant with 96% of all the energy generated from the nation being from
hydropower (Djiela et al., 2021). Despite this high potential, there is still frequent blackout though its hydro potential if fully
maximized can meet its energy demand and that of its neighboring nations (Simo et al., 2006). Good management of the Sanaga
watershed, the largest watershed in the nation is very important to meet its energy needs (Tengeleng et al., 2014). Tengeleng et al.
(2014) used a neural network to forecast the monthly river flow of the Song Bengue River. The result showed that a window size of two
months gave a strong correlation between the forecast and the measured values.
There exist several time series decomposition methods for pre-processing such as classical decomposition, X11 decomposition, and
SEATS (seasonal extraction in autoregressive integrated moving average (ARIMA) time series) decomposition. Though classical
decomposition is widely used, it contains some drawbacks such as the unavailability of the trend cycle for the first and last few ob­
servations; the period of its seasonality is one year which is not the case with all times series data. The X11 decomposition method
overcomes the shortcomings of the classical decomposition method and is robust to outliers. Some drawback of the X11 decomposition
method is its ability to handle only monthly and quarterly data. Just like X11 decomposition, the SEATS decomposition method cannot
work with data with daily, hourly, or weekly seasonality as its seasonality is monthly and quarterly. Due to the above downside of the
decomposition methods, we apply Seasonal-Trend decomposition with LOESS (STL) for time series decomposition. LOESS is a means of
correlating nonlinear relationships. STL handles all kinds of seasonality, the smoothness of the trend cycle can be managed by the user,
the seasonal component changes over time and not just yearly and the user can control its seasonality (Hyndman and Athanasopoulos,
2018). Other methods for time series decomposition such as empirical mode decomposition (EMD) is a data-driven method that de­
composes a time series into its intrinsic mode functions (IMFs) which are functions that are defined by their extrema and
zero-crossings. EMD extracts the high-frequency IMFs first and the low-frequency ones last. Variational mode decomposition (VMD) is
a non-linear and non-parametric method for decomposing a signal into a sum of IMFs. It is similar to EMD but uses a variation
optimization to decompose the signal into IMFs. VMD is more robust to noise and it can extract more modes than EMD (Qi et al., 2019,
Liu et al., 2022). The strong seasonality of the input data led to the application of STL for decomposition in this study.
The main drawback with the above-cited models is their inability to decompose the input variables and partition the decomposed
components for better understanding among decomposed variables and efficient flow rate forecast. This preprocessing method helps in
the understanding of the hydrological correlation between each of the decomposed components and their impact on the forecast
variable. As a contribution to solving this problem, different window sizes and horizons are used to determine a good correlation
between measured and forecast values. Also, a comparative study among deep learning algorithms is carried out to obtain the best with
minimal statistical errors. Next, the input data are decomposed using Seasonal-Trend decomposition with LOESS (STL) and the forty-
two different decomposed variables were partitions and used as input in the deep learning models to study the correlation between
each parameter and the predicted flow rate. This is applied to study the flow rate of Song Bengue confluent which has four upstream
reservoirs to regulate the flow rate and two downstream hydro production plants.
This study is organized as follows: Section 1 gives the introduction of the work, Section 2 is about the materials and study area,
Section 3 gives the study methodology implemented, Section 4 summarizes the findings and Section 5 gives a discussion of the result.
The conclusion of the work is in Section 6.

2. Materials

Song bengue is found in the Sanaga watershed in Cameroon. It is confluent that receives water from four upstream reservoirs
namely Mbakaou, Lom pangar, Mape, and Bamenjin at 637 km, 466 km, 372 km, and 394 km respectively from the confluent. The
study zone has the following geographical coordinates (3.510 N, 10.100 E) and (6.330 N, 14.420E). Songloulou and Edea are hydro­
power plants located downstream of Song bengue and supply energy to the entire southern interconnected grid (SIG) of Cameroon. All
the data used for this study was collected from Eneo (energy new), the main electricity distributor Company in Cameroon. Daily
historical data from 2015 to 2020 were used for the training and testing of the models.
The fourteen undecomposed data used in this study for the multivariate flow rate forecast include the flow rate in (m3/s) and

2
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 1. Daily (a) flow rate and (b) precipitation of Song Bengue from 2015 to 2020.

precipitation in (mm) for Song bengue, the inflow in (m3/s) namely Bamenjin inflow, Lom pangar inflow, Mape inflow, and the
Mbakaou inflow. The outflow in (m3/s) includes Bamenjin outflow, Lom pangar outflow, Mape outflow, and the Mbakaou outflow and
the precipitation in (mm) includes Bamenjin precipitation, Lom pangar precipitation, Mape precipitation, and Mbakaou precipitation
of the four upstream reservoirs. The reservoirs serve to regulate the flow rate on the Sanaga watershed for hydroelectricity production.
The inflow from the reservoirs into the study site and precipitation at the site is shown in Fig. 1. Fig. 2 represents the study site. Fig. 3,
Fig. 4, and Fig. 5 represent the upstream reservoirs inflows, outflows, and precipitations respectively. Much seasonality can be seen in
Fig. 1, Fig. 3, Fig. 4, and Fig. 5. Song bengue confluent receives input from four upstream reservoirs that function to regulate its flow
rate for hydroelectricity production. The seasonality in the figures is due to the nature of the rainy season and dry season of Cameroon
and also due to the stochastic nature of the demand for hydroelectricity from SIG (Simo et al., 2007) with the inflow and precipitation
being accounted for by the seasonal changes while the reservoir output is dependent on the demand for hydroelectricity.

3. Methodology

Deep learning models (DLMs) are a subset of machine learning models that implement multiple layers of artificial neural networks
to learn the representation of data. These layers, called hidden layers, permit DLMs to master hierarchical representations of data. This
structure allows DLMs to learn more powerful representations of data. DLMs are typically trained using a variant of the back­
propagation algorithms, which is used to adjust the parameters of the model to minimize the error between the model’s forecast and
the actual training data (Goodfellow et al., 2016; Xu et al., 2017; Anubala et al., 2018; Mezzini et al., 2019; Schons et al., 2018). The
following deep-learning models were used for forecasting the flow rate.

3.1. Dense model

For a neural network (NN), a dense layer is a layer with all its neurons connected to those of the previous layer. The neurons in a
dense layer model receive as inputs the outputs from the preceding layer. The neurons from the dense layer perform a vector-matrix
multiplication, a process where the number of row vectors in the preceding layer equals the number of column vectors in the dense
layer (Goodfellow, 2016; Luo et al., 2020). The dense network is shown in Fig. 6.

3
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 2. Study area of Song Bengue confluent with the upstream reservoirs and downstream hydro plants.

3.2. One-dimensional convolutional neural network (1D CNN) model

Convolution Neural Networks (CNN) are deep learning tools with a lot of applications depending on the input data. In 1D CNN, the
kernel moves in 1 direction with two-dimensional data and is mostly applied to time series prediction. In 2D CNN, the kernel’s
movement is in 2 directions and uses three-dimensional data, and is mostly used on Image data. In 3D CNN, the kernel moves in 3
directions and uses 4-dimensional data like CT Scans. CNN typically has two layers namely the convolution layers and the Multi-Layer
Feed Forward (MLFF) layers as shown in Fig. 7 (Qazi et al., 2022). The input layer precedes the convolution layer which precedes the
MLFF layer. Convolutions are performed (sliding the kernel over the input causally or non-causally) in the convolution layer with other
operations on the input like feature extraction and MLFF (dense-like layer) as the decision block (Goodfellow, 2016; Anubala et al.,
2018; Mezzini et al., 2019). In this work, 1D CNN is used for the flow rate time series forecasting using TensorFlow (a free open-sourced
end-to-end software, a library for numerous machine learning functions) with Keras (a high-level neural network library that runs on
top of TensorFlow).

4
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 3. Upstream reservoirs inflows in (m3/s) (a) Bamenjin, (b) Lom pangar, (c) Mape, and (d) Mbakaou.

3.3. Long short-term memory model

Long Short-Term Memory (LSTM) is an improved version of recurrent neural networks that handle the long-term memory problem
of RNN. RNNs handle efficiently learning in short intervals and when the intervals are long their efficiency reduces. LSTM is applied in
time-series data processing, prediction, and classification (Debasish., 2022). It consists of a cell, an input gate, a forget gate, and an
output gate. The flow of information into and out of the cell is controlled by three gates, and the cell remembers values over arbitrary
time intervals. Fig. 8 shows the basic structure of an LSTM cell.
The input gate determines which of the input values should be used to change the memory. The sigmoid function determines
whether to allow 0 or 1 values through the gate. The tanh function assigns weight to the data provided on a scale of − 1–1. Eq. (1), Eq.
(2), and Eq. (3) represents the equations of the input gate, the outcome of the cell, and the forget gate respectively.
it = σ(Wi .[ht − 1, xt ] + bi ) (1)

Ct = tanh(Wc .[ht − 1, xt ] + bc ) (2)

5
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 4. Upstream reservoirs outflows in (m3/s) (a) Bamenjin, (b) Lom pangar, (c) Mape, and (d) Mbakaou.

The forget gate finds the details to be removed from the block using the sigmoid function. It is characterized by Eq. (4).
( )
ft = σ Wf .[ht − 1, xt ] + bf (3)

For the output gate, the sigmoid function determines either 0 or 1 through the gate. The tanh function assigns the weight to the
values provided on a scale of − 1–1. It is characterized by Eq. (5) and (6).
Ot = σ (WO [ht − 1, xt ] + bo ) (4)

ht = ot ∗ tanh(Ct ) (5)

6
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 5. Upstream reservoirs precipitations in (a) Bamenjin, (b) Lom pangar, (c) Mape, and (d) Mbakaou.

Where it , Ot andft represents the input gate, output gate, and forget gate at the timet. Wi , Wf , W0 , and. represents weights that connect
the hidden layer input to the input, output, and forget gates. bi , bf ,bo , and bc are vectors. Ct and ht are the outcome of the cell and the
outcome of the layer, respectively (Goodfellow, 2016; Li et al., 2021; Kim et al., 2022).
LSTM was implemented in python for the flow rate forecast of Song Bengue confluent.

7
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 6. A NN with two dense layers in the hidden layer.

Fig. 7. A typical Convolution Neural Network structure.

Fig. 8. LSTM architecture.

3.4. Models parameters

The models were applied using python with TensorFlow in Keras and possess the general and specific parameters as shown in
Table 1. TensorFlow is an open-source machine-learning library for numerical computation and large-scale machine learning. Keras is
a high-level neural network written in python and capable of running on top of TensorFlow for building and training deep learning
models. Table 1 presents the parameters and hyperparameters of the deep models.

3.5. Time series decomposition

While observing the plot of our historical flow rate in Fig. 1(a), we noticed that the seasonal component of the data is almost
constant over time. With this observation, we chose the additive time series decomposition over the multiplicative time series
decomposition (Mills, 2019). Fig. 9 shows the proposed methodology. Fig. 10 shows the time series decomposition of the flow rate data

8
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Table 1
Models parameters and hyperparameters.
General parameters for Dense model, Conv1D and LSTM

Parameters value Parameters value

Activation function Relu Optimizer Adam


Number of hidden layers 2 Epoch 100
Loss function Mean absolute error Batch size 128
Dense model
The neuron of the first hidden layer 1000 The neuron of the second hidden layer 100
The neuron of the output layer 1
Conv1D model
Filters 128 Kernel_size 5
Padding causal
LSTM
The neuron of the first hidden layer 1000 The neuron of the second hidden layer 100
Verbose 0

Fig. 9. Overview of the proposed framework.

into seasonal, trend, and random or remainder components. Eq. 6 represents the decomposition of the additive time series into various
components namely seasonal components, trend components, and random components.
yx = Sx + Tx + Rx (6)
yx is the data, Sx is the seasonal component, Tx is the trend component, and Rx is the random component of the time series. All the
inflow, outflow, and precipitation of the reservoirs and flow rate with precipitation of Song Bengue were decomposed using STL as
shown in Fig. 10. Seasonal-Trend decomposition with LOESS (STL) was applied for the time series decomposition. STL was chosen due
to its versatile and robust qualities. It is advantageous over other decomposition methods due to its ability to handle any type of
seasonality, the rate of change of the seasonal and trend component can be controlled by the forecaster. Fig. 9 shows the proposed
methodology to forecast the flow rate. The input data are decomposed using STL and each component is partitioned. The partitions are
used as input into the deep learning models for forecasting river flow rates. The scenario with the least error is chosen as the optimum
for flow rate forecasting. Fig. 10 depicts the decomposition of the flow rate using STL decomposition. Preprocessing is done through the
decomposition of the input variables using STL. The objective is to minimize the error between the model’s forecast and the actual
training flow rate data. Table 2 shows the statistical metrics of maximum, minimum, standard deviation (STD), and mean values of the
decomposed input variables.
The robust STL is used to decompose the input variables into the random, seasonal, and trend components. Fig. 10 presents the time
series of Song bengue flow rate (Fig. 10 (a)) decomposed into its random, seasonal, and trend components as represented in Fig. 10 (b),
Fig. 10 (c), and Fig. 10 (d) respectively. All fourteen input components are decomposed into these three components and grouped as
scenarios for forecasting. Table 2 represents the statistical characteristics of all the forty-two decomposed input variables.
For the flow rate, its mean, standard deviation, minimum, and the maximum value is 1810.99 m3/s, 1005.92 m3/s, 820 m3/s, and
7911 m3/s. From Fig. 9, we propose a framework for time series forecast where the hydrological data are all decomposed using STL and
the various components partitioned in all the given sets, and each set is then combined with the flow rate before fitting the multivariate
data into the deep learning model. The partition with the minimum error is then considered the partition with optimum input variables
for flow rate forecast. 80% of our data was used to train the models and 20% was used to test the models.

3.6. valuation criteria for the deep learning models

In this section, four different evaluation criteria were used to evaluate the forecasting abilities of the different models and parti­
tioned inputs of the models. These criteria include the Mean Absolute Error (MAE) (Samadrita Ghosh, 2022), Mean Squared Error
(MSE) (Samadrita Ghosh, 2022), Root Mean Squared Error (RMSE) (Samadrita Ghosh, 2022), and Nash-Sutcliffe efficiency (NSE)
(McCuen et al., 2006) respectively.
1 ∑N
MAE = yi|
|yi − ̂ (7)
N i=1

9
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 10. Time series decomposition of the flow rate of Song Bengue into seasonal, trend, and random components using STL.

1 ∑N
MSE = (yi − ̂y i )2 (8)
N i=1
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1 ∑N
RMSE = (yi − ̂y i )2 (9)
N i=1
∑N
(yi − ̂y i )2
NSE = 1 − ∑i=1
N 2
(10)
i=1 (yi − y)

Where yi is the observed value,̂


y i , the forecast value, and ythe mean of the observed value. N represents the sample size.

4. Results

4.1. Justification of window size (input) and horizon (output)

We used a univariate dense model with window sizes and horizons of seven past days and a day ahead, thirty past days and a day
ahead, and thirty past days and seven days ahead. The choices of the window sizes and horizons are due to the evaluation of the
interval for forecasting that is from short-term (one day to a week) to medium-term forecast (one month). These windows and horizons
were evaluated using the dense model and the MAE gives 61.68 m3/s, 82.89 m3/s, and 222.74 m3/s respectively.
The error increases with decreasing window size and increasing forecast horizon, aligning with the findings of Bogner et al. (2019).
Therefore, a window size of seven days and a day ahead forecast horizon were utilized for the three deep learning models.

10
N.K. Tebong et al.
Table 2
statistical metrics of maximum, minimum, standard deviation (STD), and mean values of the decomposed input variables.
Trend input trend Trend Trend Trend Trend Trend Trend Trend Trend Trend Trend Trend Trend Trend
components inflow inflow inflow inflow inflow outflow outflow outflow outflow precipitation precipitation precipitation precipitation precipitation
Bamendjin Lom Mape Mbakaou SBG Bamendjin Lom Mape Mbakaou Bamenjin Lom pangar Mape Mbakaou SBG
pangar pangar
mean 63.69 257.40 110.08 384.95 7.17 44.77 220.12 80.59 361.45 5.70 4.38 5.10 5.86 7.17
STD 6.85 19.62 12.57 75.58 0.96 10.12 38.84 17.02 65.99 0.50 0.35 0.52 0.98 0.96
min 55.88 222.98 71.94 206.60 6.13 16.04 151.86 42.05 192.00 4.79 3.42 4.53 3.23 6.13
max 76.44 301.99 132.11 533.65 9.99 65.32 278.36 109.15 450.88 6.76 4.85 6.78 8.42 9.99
seasonal input Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal Seasonal
components inflow inflow inflow inflow inflow outflow outflow outflow outflow precipitation precipitation precipitation precipitation precipitation
Bamendjin Lom Mape Mbakaou SBG Bamendjin Lom Mape Mbakaou Bamenjin Lom pangar Mape Mbakaou SBG
pangar pangar
mean 1.37 9.01 4.43 12.64 48.36 -0.57 -4.49 -1.83 7.82 -0.00 0.07 0.05 0.05 0.14
STD 54.22 193.47 99.46 368.51 891.03 43.06 164.89 88.40 217.09 5.97 5.44 6.04 7.06 7.62
11

Min -59.96 -209.67 -106.97 -366.33 -742.50 -48.32 -206.07 -85.87 -250.68 -5.80 -4.49 -5.23 -6.30 -7.53
max 129.05 442.38 239.56 1769.05 2747.33 126.61 336.18 233.31 577.27 25.76 22.68 31.41 34.19 40.82
Random input Random Random Random Random Random Random Random Random Random Random Random Random Random Random
components inflow inflow inflow inflow inflow outflow outflow outflow outflow precipitation precipitation precipitation precipitation precipitation
Bamendjin Lom Mape SBG Mbakaou Bamendjin Lom Mape Mbakaou Bamenjin Lom pangar Mape Mbakaou SBG
pangar pangar
mean -0.07 -0.02 0.46 0.79 2.36 0.82 -0.76 1.37 3.80 -0.01 0.01 -0.04 0.01 -0.05
STD 21.87 74.23 42.16 407.00 263.70 44.10 110.23 63.93 136.93 9.17 8.79 9.07 10.09 12.40
min -76.44 -226.46 -173.24 -1800.74 -1956.57 -164.86 -324.05 -213.24 -587.15 -28.86 -25.83 -32.35 -37.93 -46.68

Journal of Hydrology: Regional Studies 46 (2023) 101357


max 118.70 535.12 282.86 3419.49 6734.58 199.57 266.03 448.78 819.72 63.47 84.28 75.64 91.02 77.17
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Table 3
Results of evaluation criteria for the different univariate models.
Model MAE (m3/s) MSE (m3/s)2 RMSE (m3/s) NSE

Dense model 61.678 12,785.538 113.073 0.992


Conv1D 73.061 17,608.414 132.697 0.989
LSTM 65.986 14,723.166 121.339 0.990

Fig. 11. A plot of different deep learning models for univariate time series decomposition.

Table 4
Evaluation criteria for the different partitions for the dense model.
Dense model MAE (m3/s) MSE (m3/s)2 RMSE (m3/s) NSE

Total decomposed data 28.277 1743.524 41.756 0.998


No Random 49.624 7240.706 85.092 0.991
No seasonal 49.849 6875.273 82.917 0.991
No trend 47.937 4662.384 68.282 0.994
Random only 49.805 7496.825 86.584 0.990
Seasonal only 48.116 6921.815 83.197 0.991
Trend only 48.726 6855.770 82.800 0.991
Total undecomposed data 46.798 7333.450 85.636 0.990

Table 5
Evaluation criteria for the different partitions for the Conv1D model.
Conv1D model MAE (m3/s) MSE (m3/s)2 RMSE (m3/s) NSE

Total decomposed data 31.658 2164.872 46.528 0.997


No Random 53.833 8821.959 93.925 0.989
No seasonal 55.326 9056.107 95.164 0.988
No trend 54.758 7309.122 85.493 0.991
Random only 56.282 9139.604 95.601 0.988
Seasonal only 53.461 8516.412 92.284 0.989
Trend only 56.637 9430.051 97.108 0.988
Total undecomposed data 49.553 8652.418 93.018 0.989

Table 6
Evaluation criteria for the different partitions for the LSTM model.
LSTM model MAE (m3/s) MSE (m3/s)2 RMSE (m3/s) NSE

Total decomposed data 30.606 2030.643 45.063 0.997


No Random 50.965 7623.405 87.312 0.990
No seasonal 51.316 7489.804 86.544 0.990
No trend 53.633 6124.419 78.259 0.992
Random only 52.234 8091.730 89.954 0.989
Seasonal only 51.276 8032.682 89.625 0.990
Trend only 52.709 8830.853 93.973 0.989
Total undecomposed data 47.386 7487.015 86.528 0.990

12
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 12. LSTM multivariate forecast compared with confluent inflows from upstream reservoirs (a) total decompose data, (b) no random, (c) no
seasonal, (e) no trend.

4.2. Models results for multivariate time series forecast

Forecasting on the test data was done using three different models namely the dense model, the Conv1D model, and the LSTM
model for univariate time series and the evaluation criteria were tested to choose the best model. Table 3 summarizes the results
obtained. From Table 3, the dense model is the best with the minimum evaluation criteria followed by the LSTM model and lastly the
Conv1D model. Fig. 11 shows a plot of the forecast with the different models with an offset of 300 on the test data.

13
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 13. LSTM multivariate forecast compared with confluent inflows from upstream reservoirs (e) random only, (f) seasonal only, (g) trend only,
(h) undecomposed data.

14
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 14. Conv1D multivariate forecast compared with confluent inflows from upstream reservoirs (a) total decompose data, (b) no random, (c) no
seasonal, (e) no trend.

4.3. Multivariate prediction with different decomposed partition sets

From the result in Table 3, the dense model gave the best result for univariate time series. The models were further used to forecast
the flow rate using the different decomposed partitions of the multivariate time series forecast and the results are shown in Table 4,
Table 5, and Table 6. The plots for the different partitions are shown in Fig. 12, Fig. 13, Fig. 14, Fig. 15, Fig. 16, and Fig. 17. The results
of the evaluation criteria for these plots are shown in Table 4, Table 5, and Table 6. The bar plot of the evaluation criteria for MAE and
RMSE for the three models with all the scenarios is depicted in Fig. 18.
The evaluation criteria for each partition were measured as shown in Table 4, Table 5, and Table 6. The bar plot is shown in Fig. 18.
Since the mean square error is squared of the root mean square error, we chose to leave out MSE to best appreciate the difference in the
partitions.

15
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 15. Conv1D multivariate forecast compared with confluent inflows from upstream reservoirs (e) random only, (f) seasonal only, (g) trend only,
(h) undecomposed data.

Fig. 1, Fig. 3, Fig. 4, and Fig. 5 show a strong and predictable seasonality in the input data. This strong seasonality can be seen from
the results shown in Table 4, Table 5, and Table 6. When the Random only, seasonal only, and trend only were used as scenarios, the
seasonal only scenario produced the least error among the three scenarios for all three models used. This is in accord with the strong
seasonality in the input components. Again, the greatest MAE of 49.849 m3/s for the dense model was obtained when the forecast was
done without the seasonal term. This strongly indicates the importance of the seasonal term for forecasting the Song bengue flow rate.
The random or remainder component of the flow rate as shown in Fig. 10 (b) represents the stochasticity or the unexplained pattern in
the time series. Forecasting the flow rate with the scenario of the random term produces an interesting result that is not too different
from the result obtained with the Trend only scenario as shown in Table 4, Table 5, and Table 6. This indicates the importance of the

16
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 16. Dense multivariate forecast compared with confluent inflows from upstream reservoirs (a) total decompose data, (b) no random, (c) no
seasonal, (e) no trend.

remainder term in the time series for the Song bengue confluent. The scenario of total decomposed data produces the best result for all
three deep learning models as shown in Table 4, Table 5, and Table 6. This indicates the importance and correlation of all the
decomposed variables in forecasting the Song bengue confluent flow rate. Furthermore, the scenario of total undecomposed data
produced the next best result. As such, all the input data namely inflows, outflows, and precipitations are important for forecasting the
flow rate of the confluent.

17
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 17. Dense multivariate forecast compared with confluent inflows from upstream reservoirs (e) random only, (f) seasonal only, (g) trend only,
(h) undecomposed data.

5. Discussion

The article discusses the results of a study on forecasting the flow rate of Song Bengue using different sizes and horizons. The study
found that as the forecasting interval and window size increased the efficiency of the model for forecasting Song bengue confluent
decreased, which aligns with the findings of Bogner et al. (2019) who studied the impact of window size and horizon on the efficiency
of models.
The study also found that the deep models used in the study namely the dense model, Conv1D model, and LSTM model had a high
coefficient of NSE value greater than 0.9, indicating a very good fit according to Nash and Sutcliffe (1970) who defined NSE model
evaluation as 1 for a perfect fit, and NSE > 0.75 as a very good fit. This good fit is due to the choice of hyperparameters due to tuning
and experimentation of the deep learning models, sufficient training data, and the suitable architecture of the models.

18
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Fig. 18. MAE and RMSE evaluation criteria for the different partitions with (a) LSTM model, (b) Conv1D model, and (c) Dense model.

The study also found that the lowest error occurred when utilizing all decomposed variables as input, emphasizing the importance
of preprocessing input data for time series forecasting. This is supported by the work of Ghayekhloo et al. (2015) who studied the
impact of data preprocessing on time series forecasting, Jaramillo-Morán et al. (2021) who applied preprocessing tools for improving
forecasting results by decomposing the time series into its trend and fluctuations and decomposition into Intrinsic Mode Functions
(IMF) using Empirical Mode Decomposition (EMD), Meydani et al. (2022) who preprocessed data for reservoir inflow forecasting by
downscaling raw precipitation and temperature forecast which increases the model accuracy for forecast and Yousefi et al. (2022) who
used causal empirical decomposition to improve the model’s efficiency for forecasting. This study specifically used STL-decomposition
preprocessing due to its ability to treat data with strong seasonality. The forecasting error for the daily flow rate forecast Is shown in
Table 4, Table 5, and Table 6.
Additionally, the study found that the random component of the time series should not be disregarded as the least error occurred
when all the components of the time series are used (scenario of total decompose data) as inputs into the forecasting model.
Furthermore, the study found that decomposing the input time series before using it as input for prediction reduces the prediction
error. The study also found that using univariate forecasting resulted in a larger MAE than its multivariate counterpart.
Overall, the study highlights the importance of utilizing all components of time series data for forecasting and the benefits of
preprocessing input data for time series forecasting of the Song bengue confluent. This is consistent with the findings of other studies in
the field, which have shown that preprocessing and including all components of a time series in the model can lead to improved
forecasting results.

6. Conclusion

In conclusion, this study aimed at examining the flow rate of Song Bengue using different window sizes and horizons. The results
revealed that the forecasting efficiency decreased as the forecasting interval and window size increased, which is in line with previous
studies that have shown that models tend to become less efficient as the window size and horizon increase. Furthermore, the study
found that the deep learning models used had a high NSE value of greater than 0.9, indicating a very good fit according to Nash and

19
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Sutcliffe. Additionally, the study found that the lowest error occurred when utilizing all decomposed variables as input, which
highlights the importance of preprocessing input data for time series forecasting and the importance and correlation of these hy­
drological input data to forecast the flow rate of Song bengue. This is consistent with the findings of other studies in the field, which
have shown that preprocessing and including all components of a time series in the model can lead to improved forecasting results. The
study also found that using univariate forecasting for the deep models results in a larger error than their multivariate counterpart in
forecasting the Song bengue confluent flow rate. The study also found that the random or remainder component of the time series
should not be neglected as the least error occurred when all the decomposed components are used as input. This study emphasizes the
importance of including all components of time series data for forecasting and the benefits of preprocessing input data for time series
forecasting. This preprocessing method to improve model forecasting efficiency is proposed but we acknowledge the limitations of the
deep learning models due to uncertainty analysis. Applying ensemble learning to each decomposed component for forecasting con­
stitutes future work.

CRediT authorship contribution statement

Njogho Kenneth Tebong: Writing – original draft preparation, Visualization, Resources, Software, formal analysis, Investigation.
Théophile Simo: Conceptualization, Methodology, Validation, formal analysis, Investigation, Resources, Project administration.
Armand Nzeukou Takougang: Supervision, Validation, Formal analysis, Resources, Project administration. Alain Tchakoutio
sandjon: Writing – review & editing, Visualization Ntanguen Patrick Herve: Resources, analysis, Editing, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data Availability

Data will be made available on request.

References

Ahmad, S.K., Hossain, F., 2019. A generic data-driven technique for forecasting of reservoir inflow: application for hydropower maximization. Environ. Model. Softw.
119, 147–165.
Anubala, V.P., Muthukumaran, N., Nikitha, R., 2018. Performance Analysis of Hookworm Detection Using Deep Convolutional Neural Network. In 2018 International
Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE,, pp. 348–354 (December).
Bai, Y., Chen, Z., Xie, J., Li, C., 2016. Daily reservoir inflow forecasting using multiscale deep feature learning with hybrid models. J. Hydrol. 532, 193–206.
Bashir, A., Shehzad, M.A., Hussain, I., Rehmani, M.I.A., Bhatti, S.H., 2019. Reservoir inflow prediction by ensembling wavelet and bootstrap techniques to multiple
linear regression model. Water Resour. Manag. 33 (15), 5121–5136.
Bogner, K., Pappenberger, F., Zappa, M., 2019. Machine learning techniques for predicting the energy consumption/production and its uncertainties driven by
meteorological observations and forecasts. Sustainability 11 (12), 3328.
Bordin, C., Skjelbred, H.I., Kong, J., Yang, Z., 2020. Machine learning for hydropower scheduling: state of the art and future research directions. Procedia Comput. Sci.
176, 1659–1668.
Cheng, C.T., Feng, Z.K., Niu, W.J., Liao, S.L., 2015. Heuristic methods for reservoir monthly inflow forecasting: a case study of Xinfengjiang Reservoir in Pearl River,
China. Water 7 (8), 4477–4495.
Djiela, R.H.T., Kapen, P.T., Tchuen, G., 2021. Techno-economic design and performance evaluation of photovoltaic/diesel/batteries system through simulation of the
energy flow using generated solar radiation data. Energy Convers. Manag. 248, 114772.
Ghayekhloo, M., Menhaj, M.B., Ghofrani, M., 2015. A hybrid short-term load forecasting with a new data preprocessing framework. Electr. Power Syst. Res. 119,
138–148.
Goodfellow, I., Bengio, Y., Courville, A. ,2016. Deep learning. MIT press.
Gupta, A., Kumar, A., 2022. Two-step daily reservoir inflow prediction using ARIMA-machine learning and ensemble models. J. Hydro-Environ. Res. 45, 39–52.
Hong, J., Lee, S., Bae, J.H., Lee, J., Park, W.J., Lee, D., Lim, K.J., 2020. Development and evaluation of the combined machine learning models for the prediction of
dam inflow. Water 12 (10), 2927.
Hyndman, R.J., Athanasopoulos, G. , 2018. Forecasting: principles and practice. OTexts.
Jaramillo-Morán, M.A., Fernández-Martínez, D., García-García, A., Carmona-Fernández, D., 2021. Improving artificial intelligence forecasting models performance
with data preprocessing: european union allowance prices case study. Energies 14 (23), 7845.
Jothiprakash, V., Magar, R.B., 2012. Multi-time-step ahead daily and hourly intermittent reservoir inflow prediction by artificial intelligent techniques using lumped
and distributed data. J. Hydrol. 450, 293–307.
Kishore, S., Prasad, B.S., 2020. Reservoir inflow prediction using multi-model ensemble system. 2020 International Conference on Communication, Computing and
Industry 4.0 (C2I4). IEEE,, pp. 1–6 (December).
Li, F., Ma, G., Chen, S., Huang, W., 2021. An ensemble modeling approach to forecast daily reservoir inflow using bidirectional long-and short-term memory (Bi-
LSTM), variational mode decomposition (VMD), and energy entropy method. Water Resour. Manag. 35, 2941–2963.
Liu, T., Ma, X., Li, S., Li, X., Zhang, C., 2022. A stock price prediction method based on meta-learning and variational mode decomposition. Knowl. -Based Syst. 252,
109324.
Luo, B., Fang, Y., Wang, H., Zang, D., 2020. Reservoir inflow prediction using a hybrid model based on deep learning. In. In: IOP Conference Series: Materials Science
and Engineering, Vol. 715. IOP Publishing,, 012044.
McCuen, R.H., Knight, Z., Cutter, A.G., 2006. Evaluation of the Nash–Sutcliffe efficiency index. J. Hydrol. Eng. 11 (6), 597–602.
Meydani, A., Dehghanipour, A., Schoups, G., Tajrishy, M., 2022. Daily reservoir inflow forecasting using weather forecast downscaling and rainfall-runoff modeling:
Application to Urmia Lake basin, Iran. J. Hydrol.: Reg. Stud. 44, 101228.
Mezzini, M., Bonavolontà, G., Agrusti, F., 2019. Predicting university dropout by using convolutional neural networks. In INTED2019 Proceedings. IATED,,
pp. 9155–9163.
Mills, T.C. , 2019. Applied time series analysis: A practical guide to modeling and forecasting. Academic press.

20
N.K. Tebong et al. Journal of Hydrology: Regional Studies 46 (2023) 101357

Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I-A Discussion of principles. J. Hydrol. 10 (1970), 282–290.
Patel, S.S., Ramachandran, P., 2015. A comparison of machine learning techniques for modeling river flow time series: the case of upper Cauvery river basin. Water
Resour. Manag. 29, 589–602.
Pini, M., Scalvini, A., Liaqat, M.U., Ranzi, R., Serina, I., Mehmood, T., 2020. Evaluation of machine learning techniques for inflow prediction in Lake Como, Italy.
Procedia Comput. Sci. 176, 918–927.
Qazi, E.U.H., Almorjan, A., Zia, T., 2022. A one-dimensional convolutional neural network (1D-CNN) based deep learning system for network intrusion detection.
Appl. Sci. 12 (16), 7986.
Qi, Y., Zhou, Z., Yang, L., Quan, Y., Miao, Q., 2019. A decomposition-ensemble learning model based on LSTM neural network for daily reservoir inflow forecasting.
Water Resour. Manag. 33, 4123–4139.
Rezaie-Balf, M., Naganna, S.R., Kisi, O., El-Shafie, A., 2019. Enhancing streamflow forecasting using the augmenting ensemble procedure coupled machine learning
models: case study of Aswan High Dam. Hydrol. Sci. J. 64 (13), 1629–1646.
Samadrita Ghosh, (14th November 2022). “The ultimate guide to evaluation and selection of models in machine learning” 〈https://neptune.ai/blog/the-ultimate-
guide-to-evaluation-and-selection-of-models-in-machine-learning〉.
Schons, T., Moreira, G.J., Silva, P.H., Coelho, V.N., Luz, E.J., 2018. Convolutional network for EEG-based biometric. In: In Progress in Pattern Recognition, Image
Analysis, Computer Vision, and Applications: 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, Chile, November 7–10, 2017, Proceedings, 22. Springer
International Publishing,, pp. 601–608.
Sharifi, M.R., Akbarifard, S., Madadi, M.R., Akbarifard, H., Qaderi, K., 2022. Comprehensive assessment of 20 state-of-the-art multi-objective meta-heuristic
algorithms for multi-reservoir system operation. J. Hydrol. 613, 128469.
Simo, T., Kenfack, F., Ngundam, J.M., 2007. Contribution to the long-term generation scheduling of the Cameroonian electricity production system. Electr. Power
Syst. Res. 77 (10), 1265–1273.
Tengeleng, S.I.D.D.I., Armand, N.Z.E.U.K.O.U., Armel, K.A.P.T.U.E., Alain, T.S., Théophile, S.I.M.O., Cedrigue, D., 2014. Monthly predicted flow values of the Sanaga
River in Cameroon using neural networks applied to GLDAS, MERRA and GPCP data. J. Water Resour. Ocean Sci. 3 (2), 22–29.
Wang, Y., Guo, S., Chen, H., Zhou, Y., 2014. Comparative study of monthly inflow prediction methods for the Three Gorges Reservoir. Stoch. Environ. Res. Risk Assess.
28, 555–570.
Xu, K., Roussel, P., Csapó, T.G., Denby, B., 2017. Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode
ultrasound images. J. Acoust. Soc. Am. 141 (6), EL531–EL537.
Yang, T., Asanjan, A.A., Welles, E., Gao, X., Sorooshian, S., Liu, X., 2017. Developing reservoir monthly inflow forecasts using artificial intelligence and climate
phenomenon information. Water Resour. Res. 53 (4), 2786–2812.
Yousefi, M., Cheng, X., Gazzea, M., Wierling, A.H., Rajasekharan, J., Helseth, A., Arghandeh, R., 2022. Day-ahead inflow forecasting using causal empirical
decomposition. J. Hydrol. 613, 128265.
Zhang, W., Wang, H., Lin, Y., Jin, J., Liu, W., An, X., 2021. Reservoir inflow predicting model based on machine learning algorithm via multi-model fusion: a case
study of Jinshuitan river basin. IET Cyber Robot. 3 (3), 265–277.
Zhao, T., Minsker, B., Salas, F., Maidment, D., Diev, V., Spoelstra, J., Dhingra, P., 2018. Statistical and hybrid methods implemented in a web application for
predicting reservoir inflows during flood events. JAWRA J. Am. Water Resour. Assoc. 54 (1), 69–89.
Zhou, F., Li, L., 2021. Forecasting reservoir inflow via recurrent neural odes. Proc. AAAI Conf. Artif. Intell. Vol. 35 (No. 17), 15025–15032.
Zhou, F., Wang, Z., Chen, D., Zhang, K., 2022. Reservoir inflow forecasting in hydropower industry: a generative flow-based approach. IEEE Trans. Ind. Inform. 19 (2),
1196–1206.

21

You might also like