You are on page 1of 13

Development of Precipitation Forecast Model Based on

Artificial Intelligence and Subseasonal Clustering


Laleh Parviz 1 and Kabir Rasouli 2

Abstract: Forecasting precipitation remains challenging because of its large spatial and temporal variability, and the uncertainty in pre-
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

cipitation forecast leads to an important source of uncertainty in the prediction of other components of a hydrological system. In this study, in
order to forecast subseasonal precipitation and better characterize the temporal variability of precipitation, a hybrid precipitation forecast
model was developed based on (1) temporal clustering of subseasonal precipitation; and (2) coupling an improved seasonal autoregressive
integrated moving average (ISARIMA) model to an artificial neural network (ANN) model to take the advantages of both models and capture
precipitation persistence and statistics in each cluster. The performance of the proposed model was compared against different variations of
conventional statistical models in the Rasht station with a humid climate and Gorgan station with a Mediterranean climate, both located
south of the Caspian Sea in northern Iran. The model evaluation criteria indicated that the hybrid model can remarkably improve forecast
accuracy. The root-mean square error score of the forecasted precipitation by the hybrid model against observations decreased 48% and 24%
in the Rasht and Gorgan stations, respectively, when compared with the seasonal autoregressive integrated moving average (SARIMA)
model and the index of agreement increased 32% and 17%, respectively, when compared with the ANN models. The proposed hybrid model
can be a useful tool for forecasting subseasonal precipitation in humid and arid climates with persistent and nonpersistent precipitation
patterns. DOI: 10.1061/(ASCE)HE.1943-5584.0001862. © 2019 American Society of Civil Engineers.
Author keywords: Subseasonal precipitation; Forecast; Hybrid model; Artificial neural network (ANN); Clustering analysis.

Introduction evapotranspiration, surface and subsurface runoff, and groundwater


recharge, remains uncertain (Fekete et al. 2004). Extensive data
Accurate precipitation forecasts, especially in subseasonal time collection is usually required for a knowledge-driven modeling
scales, are essential for short-term planning of agricultural activities experiment to represent governing atmospheric and near-surface
and efficient water resources management. There are currently hydrologic processes. Data collection through field experiments
three main forecasting approaches: dynamical (knowledge-driven, is expensive and challenging, particularly in remote areas. Simplic-
e.g., Michalakes 1999), statistical (data-driven, e.g., Valipour et al. ity and application of the data-driven modeling approaches in
2013), and combination of statistical and dynamical approaches different geographical regions make these tools desirable for
(e.g., Carter et al. 1989; Hamill and Whitaker 2006; Slater et al. meteorological and hydrological forecasts (Ouyang et al. 2016).
2017; Sarkodie and Strezov 2018). In the third category, also The data-driven modeling methods such as autoregressive
known as model output statistics technique (Glahn and Lowry integrated moving average (ARIMA) (Ouyang et al. 2016) and ma-
1972), first, dynamical model forecasts are archived and then a
chine learning approaches such as artificial neural network (ANN)
statistical linear or nonlinear model is applied to determine the re-
(Moustris et al. 2011; Tealab et al. 2017) were widely used in Earth
lationship between numerical model outputs and weather variables,
sciences. The ARIMA models, which are based on the best fit to the
e.g., precipitation. However, these models still show large forecast
historical values of hydrometeorological time series, were first pro-
biases in precipitation estimation (Cannon et al. 2015) compared to
posed by Box and Jenkins (1976). These models were implemented
local observations. Therefore, a skillful precipitation forecast still
remains a challenging task due to its high spatial and temporal vari- to analyze short-term and long-term variation of fluctuations in the
ability and nonlinear and nonstationary characteristics (Sarhadi atmospheric and hydrologic time series (Sun and Koch 2001). For
et al. 2016). Improvement in precipitation estimations is needed, instance, Dabral and Murry (2017) used a seasonal ARIMA called
especially in arid and semiarid regions where slight changes in pre- SARIMA to forecast monthly, weekly, and daily monsoon rainfalls
cipitation can markedly change runoff mechanisms (Fekete et al. in humid regions of India. Statistical model assessment measures
2004). With uncertainties in precipitation forecast, estimation of showed a reasonably acceptable performance of the SARIMA
all other components of a hydrological cycle, including infiltration, models in the literature (Eni and Adeyeye 2015; Chang et al. 2012).
Osarumwense (2013) used the Box-Jenkins SARIMA model for
1
Assistant Professor, Faculty of Agriculture, Azarbaijan Shahid Madani rainfall forecast in Nigeria. Narasimha Murthy et al. (2018) inves-
Univ., Tabriz 53714-161, Iran. Email: laleh_parviz@yahoo.com tigated rainfall patterns using SARIMA in northeastern India.
2
Research Physical Scientist, Dept. of Geoscience, Univ. of Calgary, The ANN models, with emulating the structure of human neural
2500 University Dr. NW, Calgary, AB, Canada T2N 1N4 (corresponding systems, were introduced by McCulloch and Pitts (1943). The
author). ORCID: https://orcid.org/0000-0002-8176-2132. Email: kabir back-propagation neural network models were used in different ap-
.rasouli@ucalgary.ca
plications. For instance, these models were used for subseasonal
Note. This manuscript was submitted on April 22, 2018; approved on
August 5, 2019; published online on September 28, 2019. Discussion rainfall forecast in Indonesia by Mislan et al. (2015). They devel-
period open until February 28, 2020; separate discussions must be sub- oped the back-propagation ANN methods with different optimiza-
mitted for individual papers. This paper is part of the Journal of Hydro- tion techniques to improve precipitation estimations by optimizing
logic Engineering, © ASCE, ISSN 1084-0699. the structure and training period, which may affect the accuracy of

© ASCE 04019053-1 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


subseasonal and seasonal forecasts. Moustris et al. (2011) used the modeling can substantially improve the forecasting skill (Wang
preceding 4 months precipitation and Sulaiman and Wahab (2018) et al. 2014). In coupling data-driven models, either temporal vari-
used the preceding 1–6 months precipitation to forecast monthly ability with different time scales (e.g., annual and monthly) are
precipitation. Marzano et al. (2006) compared the accuracy of pre- captured, or two separate linear and nonlinear signals of the given
cipitation intensity estimated by ANN with previously developed phenomenon are captured (Zeynoddin et al. 2018). For instance,
regression methods. In comparison to the regression approaches, ARIMA is used to capture the linear signals while ANN is widely
ANN provides more skillful forecasts. Karamouz et al. (2009) devel- used to capture the nonlinear signals (e.g., Chen and Wang 2007;
oped a hybrid drought index using a probabilistic ANN to forecast Ozozen et al. 2016; Lee et al. 2018). Moeeni and Bonakdari (2017)
drought on a monthly time scale. Diagnosing the performance of coupled SARIMA to ANN, and Moeeni et al. (2017) coupled
the SARIMA and ANN models over the last decades and in different SARIMA to a neuro-fuzzy model to forecast monthly streamflow.
scientific fields has resulted in the improvements of these models In contrast to other hybrid models in the literature, both temporal
through (1) combining them and taking the advantages of these variability and two types of linear and nonlinear signals can be
two models (Pannakkong et al. 2016; Zhang 2003), and (2) overcom- captured by the proposed hybrid model in this study to further im-
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

ing the weakness of each model by preprocessing the training data prove the precipitation estimations. To overcome the limitation of
(Wang et al. 2014). Wang et al. (2014) improved the accuracy of ISARIMA model, the ANN model, instead of a regression model,
monthly precipitation forecast using an improved version of SAR- was used in this paper to capture the nonlinear relationships between
IMA model, called ISARIMA. However, the application of linear re- main statistics of monthly precipitation clusters and associated
gression in this model for capturing nonlinear features of precipitation precipitation intensities. Therefore, a hybrid model is developed
can lead to uncertainties in precipitation forecasts. Both linearity and in this study by integrating ISARMIA and ANN models to estimate
nonlinearity of time series are required to be considered in precipi- monthly precipitation in the future. Therefore, the objectives of this
tation forecasts (Chen and Wang 2007; Tosun et al. 2016). Satisfac- study are to cluster the temporal patterns of precipitation, improve
tory results of the linear models are possible when the linear monthly estimations of precipitation in two weather stations south of
component of the time series is dominant to the nonlinear component. the Caspian Sea, and evaluate the forecast performance of the pro-
Also, with a highly nonlinear component of the time series, nonlinear posed hybrid method against SARIMA, ANN, and their improved
models can substantially improve the forecast skill (Yolcu et al. versions. This paper is organized as follows. First, the study area is
2013). It is well established in the literature that ANN can signifi- introduced and the proposed hybrid model is developed. Next, pre-
cantly outperform linear regression models in environmental model- cipitation forecasts by data-driven models are compared with the
ing (Zhang 2003; Chen and Wang 2007; Zeynoddin et al. 2018). forecasts of the hybrid model and performance of the models is
One of the advantages of the data-driven models is their ability evaluated based on assessment measures. Finally, discussions of
to capture temporal variability, especially seasonal and subseasonal the findings and conclusions of this paper are described.
variations needed for better estimation of temporal variability in
hydrological processes and better managing water resources
(Dabral and Murry 2017; Jiang et al. 2010). Modeling temporal Methodology
variability should be carried out with care to avoid losing inter-
monthly variations (Wang et al. 2014). In the conventional
Study Site Description
SARIMA model structure intermonthly variations are usually ne-
glected, which can affect the simulation performance. Capturing The precipitation time series in the Rasht weather station in Gilan
both interannual and intermonthly variations in the hydrological Province, and the Gorgan weather station in Golestan Province in

45° E 50° E 55° E 60° E


40° N 40° N

Caspian Sea

35° N 35° N

30° N 30° N

Study area
Sea Persian Gulf
Province border
25° N 0 350 700 km
Gulf of Oman 25° N

45° E 50° E 55° E 60° E

Fig. 1. Location of weather stations south of the Caspian Sea in northern Iran. The Rasht station is located in the Gilan Province and the Gorgan
station is located in the Golestan Province.

© ASCE 04019053-2 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


Iran (Fig. 1) were used to investigate the performance of the devel- g = transfer functions; and Pi = ith input to the neuron (Pannakkong
oped hybrid model in forecasting monthly precipitation. These et al. 2016).
two stations are located within the Khazar Basin, which is one
of the five main river basins in the Iranian Plateau. The climate
of Rasht and Gorgan stations were determined as wet and humid, ISARIMA Model with Clustering Analysis
respectively, according to the precipitation effectiveness index The SARIMA model can be applied to time series with seasonal
(Thornthwaite 1948). The Koppen (1936) climate classification patterns with a subseasonal time step, e.g., week or month. The
subtype is “Cfa” (i.e., humid subtropical climate) in the Rasht sta- SARIMA model accounts for the interannual variation of monthly
tion and “Csa” (hot summer Mediterranean climate) in the Gorgan time series and does not include intermonthly variations. There-
station. Changes in seasonal precipitation are expected under the fore, ISARIMA was proposed to improve the conventional
effect of climate change in this area (Coles et al. 2017). SARIMA models by incorporating both interannual and inter-
monthly variations in environmental fluxes (Wang et al. 2014).
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

ARIMA Model The modeling procedure of ISARIMA includes clustering analysis


in which monthly time series are classified into clusters with sim-
The conventional ARIMA models were formulated with a linear ilar statistics (Wang and Yao 2018). There are several partitioning
function of times series in the past and random shocks (Jiang clustering algorithms such as the k-means (Lioyd 1982), subtrac-
et al. 2010). The general notation of ARIMA is described as tive and hierarchical clustering (Jain and Dubes 1998), and some
ARIMAðp; d; qÞ, where p, d, and q explain orders of autoregres- soft computing ones such as fuzzy c-means (Bezdek 1981) and
sive, differencing, and moving average parameters, respectively self-organizing neural network (Kohonene 1989; Sfetsos and
(Narasimha Murthy et al. 2018). The ARIMA models were recently Siriopoulos 2004). The hierarchical cluster analysis is the most
modified to account for the seasonal variability. The seasonal commonly used method (Tamilselvi et al. 2015), which uses an
ARIMAs are called SARIMAðp; d; qÞ × ðP; D; QÞs , in which algorithm to produce a dendrogram that assembles variables or
(P; D; QÞs represents the seasonal variations. The mathematical objects into a signal tree, allowing users to visualize the similar-
form of a multiplicative seasonal model can be written as the ities of the samples. This method is usually used to evaluate intra-
following equation (Jiang et al. 2010): group and intergroup similarities and contrasts (Caesar et al.
φp ðBÞΦP ðBs Þð1 − BÞd ð1 − Bs ÞD Zt ¼ θq ðBÞΘQ ðBs Þεt 2018). A hierarchical clustering method introduced by Ward
(1963) is used in this study to create groups with minimum vari-
with ance (Eszergár-Kiss and Caesar 2017). This method is based on
φp ðBÞ ¼ 1 − φ1 B − : : : − φp Bp the error sum of squares in identifying and merging clusters
(Zhang et al. 2018). The Ward method is complex but performs
θq ðBÞ ¼ 1 − θ1 B − : : : − θq Bq reasonably well. The basis of the method is to calculate the
ΦP ðBs Þ ¼ 1 − Φs Bs − : : : − ΦPs BPs distance of the clusters from the grand average of the sample
(Belgaman et al. 2017). In comparison to other methods, the Ward
ΘQ ðBs Þ ¼ 1 − Θs Bs − : : : − ΘQs BQs ð1Þ method performs better in hierarchical clustering, especially in
where p = autoregressive order; d = number of differencing climatological studies, as it minimizes the variance between the
operations; q = moving average order; P = order of seasonal au- elements within a cluster (Blashfield 1976; Hands and Everitt
toregressive; D = number of seasonal differencing; Q = order of 1987; Eszergár-Kiss and Caesar 2017; Farrelly et al. 2017). There-
seasonal moving average; s = season length; B = backward (time fore, the Ward method is selected for clustering in this study. After
difference) operator; ε = Gaussian white-noise term; t = operator clustering analysis, 12 linear regressions (LR) for 12 months are
for representation of time; φ = autoregressive parameter; θ = mov- established. For a given month, the LR model is trained based on
ing average parameter; Φ = seasonal autoregressive parameter; and the main statistics of the cluster of that month and precipitation
Θ = seasonal moving average parameter (Chen and Wang 2007). values.
The proposed ARIMA model by Box and Jenkins (1976) includes
three steps, namely (1) identifying the structure of model, (2) esti-
Proposed Hybrid Model
mation of the model parameters, and (3) performing goodness-of-
fit tests on the forecasted time series (Chen and Wang 2007). After identifying the clusters by the Ward method, the main statis-
tics of each cluster such as maximum, minimum, and truncated
mean are calculated that provide more information about time
ANN Model
series. The statistics of clusters summarize the main signals in pre-
ANN can be used to establish a relationship between predictors cipitation distribution and can be forecasted better than individual
(inputs) and predictands (outputs) without previous assumptions. precipitation events by ANN models in the hybrid method with
The structure of an ANN model is characterized by three input, cluster statistics as input and monthly precipitation values as out-
hidden, and output layers; the number of neurons in the hidden put. Therefore, 12 ANN models with the capacity to capture non-
layer; and type of transfer functions relating model inputs to the linear characteristics of precipitation instead of 12 LR models with
hidden layer, and the hidden layer to the output layer (Pannakkong linear characteristics are trained for 12 months. For a given month,
et al. 2016). Eq. (2) shows the relationship between ANN input and the ANN model is trained by main statistics of the cluster of that
output variables: month and precipitation values in that month. For example, if the
XR  X Q  three months of January, February, and November are clustered into
z t ¼ bh þ f wh g bi;h þ w i ; h pi ð2Þ Cluster 1, and two months of March and April are clustered into
h¼1 i¼1
Cluster 2, then five ANNs for 5 months linking the main statistics
of two clusters to a monthly time series are established. The next
where zt = simulated model output at time t; wi;h and wh = step is to train the ISARIMA model for each cluster and to predict
connection weights of the layers; Q and R = number of neu- the statistics of the clusters in the validation period. In the final step,
rons in the input and hidden layers; bi;h and bh = biases; f and the forecasted main statistics are applied to the ANNs discussed

© ASCE 04019053-3 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


Observed monthly precipitation

ANN SARIMA ISARIMA Hybrid model

(1) Train ANN, with

0.0 1.0 2.0 3.0


0.0 1.0 2.0 3.0
(1) Model
optimal neurons & identification
activation function;
inputs: antecedent
precipitation; outputs:
(2) Parameter
precipitation
estimation cluster 1 … cluster m cluster 1 … cluster m
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

(1) Obtain cluster statistics (1) Obtain cluster statistics


(2) Test ANN; inputs: (3) Diagnostic
antecedent checking
precipitation
(2) Train linear regression (2) Train ANN for each month;
(LR) for each month; inputs: inputs: cluster statistics;
cluster statistics; outputs: outputs: precipitation values
precipitation values

(3) Forecast cluster statistics (3) Forecast cluster statistics by


by ARIMA ISARIMA

(4) Test LR for each month; (4) Test ANN for each month;
inputs: cluster statistics inputs: cluster statistics

Model output: forecasted precipitation for next 36 months

Fig. 2. Comparison of the four statistical precipitation forecast models used in this study. The hybrid ISARIMA-ANN model is developed based on
the identified precipitation clusters and coupled ISARIMA and ANN models.

earlier and monthly precipitation values are estimated. In summary, where Pct;A = forecasted precipitation for month t within cluster
the steps of the hybrid model development are as follows: c over the assessment period (A); and f is the ANN model as in
• Step 1 involves establishing a relationship between cluster Step 1. The basic structure of the hybrid model can be summar-
statistics and precipitation values using ANN over the training ized as follows: (1) clustering analysis for grouping monthly
period time series with similar variations, (2) calculation of main
statistics for each cluster, and (3) coupling ANN to ISARIMA
Pct;T ¼ fðScT Þ for capturing nonlinear and linear signals. Fig. 2 shows the
structure of the hybrid model in comparison to the other models.
t ¼ 1; 2; 3; : : : ; 12 S ¼ 1∶min; 2∶max; 3∶mean c ¼ 1; 2; : : : ; m
ð3Þ Measures for Modeling Performance
For evaluating the performance of models, six statistical measures
where Pct;T is monthly precipitation at month t that is classified were used to assess the forecast accuracies, namely the root-mean
by the Ward method into cluster c; ScT are statistics of the cluster square error (RMSE), relative root-mean square error (RRMSE),
c in the training period (T); m = total number of clusters; and mean absolute error (MAE), mean relative error (MRE), mean
f is the ANN model as in Eq. (2). square error (MSE), and agreement index (AI), which are defined
• Step 2 involves estimating statistics for each cluster over the as follows:
assessment period
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N
1u X
ScA ¼ gðPct Þ ð4Þ RMSE ¼ t ðO − Pi Þ2 ð6Þ
N i¼1 i
where ScA is the estimated statistics of cluster c over the assess-
ment period (A); and g is the ISARIMA model. RMSE
RRMSE ¼ ð7Þ
• Step 3 consists of forecasting precipitation for each month over Ō
the assessment period
1X N
MAE ¼ jP − Oi j ð8Þ
Pct;A ¼ fðScA Þ ð5Þ N i¼1 i

© ASCE 04019053-4 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


N  
1X  Pi − Oi  The seasonal Mann-Kendall test (Mann 1945; Kendall 1975;
MRE ¼ ð9Þ
N i¼1  Oi 
Hirsch and Slack 1984) showed a significant decreasing trend in
the Gorgan precipitation but not significant changes in the Rasht
PN precipitation. The p-value of the detected trends in the Gorgan
ðPi − Oi Þ2 and Rasht precipitations are smaller and larger than the 5% signifi-
MSE ¼ i¼1
ð10Þ
N cance level, respectively. Because of the periodicity of statistics
PN such as mean and standard deviation, the seasonality is considered
ðPi − Oi Þ2 in modeling. Time series with a seasonal cycle and nonstationary
AI ¼ 1 − PN i¼1
2
ð11Þ
i¼1 ðjPi − Ōj þ jOi − ŌjÞ characteristics are more complex. Transformation can remove the
nonstationarity from complex time series and provide stationary
where Pi = predicted value; Oi = observed value; and Ō = mean time series. For example, the nonstationarity test of augmented
of observed value. A small value of error criteria indicates the prox- Dickey-Fuller (ADF) showed the value of 0.086 for monthly time
imity of the simulations to observations. Willmott (1981) proposed series of precipitation in the Rasht station, which is slightly larger
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

AI to overcome the insensitivity of the Nash-Sutcliffe efficiency than the significance level, α ¼ 0.05, confirming the nonstationar-
and coefficient of determination to differences between observation ity of the time series. However, after transformation the ADF value
and simulation statistics (e.g., mean and variance). The AI score is a dropped to less than 0.001, indicating that transformed data are
dimensionless measure that ranges between 0 and 1. High values of stationary. For selecting the SARIMA model order, a range of 0
AI indicate a good agreement of the observed and simulated values to 3 (four cases) is applied to each parameter of the model including
(Pannakkong et al. 2016; Moustris et al. 2011). In this paper, the seasonal moving average (SMA), seasonal autoregressive (SAR),
performance of the models is compared during a 3-year validation autoregressive (AR), and moving average (MA) (i.e., four param-
period similar to many other studies. We found that that increasing eters). From 4 × 4 × 4 × 4 ¼ 256 possible cases, only those that
the validation or forecast period to one-third of the entire observa- passed the diagnostic check (normality and white noise of the
tion period does increase the uncertainty in the precipitation fore- model residuals) are selected and those that are not feasible are dis-
casts. The length of validation period in the literature varies from carded. In order to identify the best model, the Schwarz-Bayesian
30% to less than 10% of the entire simulation period for the ANN criterion (SBC) is used. The lower the SBC score, the better the
model (e.g., Huesca et al. 2014; Nascimento Camelo et al. 2018). model performance. The best models are SARIMAð3; 0; 2Þ ×
The selection of the length of training and validation periods de- ð1; 1; 2Þ12 , where ARð1Þ ¼ 0.32, ARð2Þ ¼ −0.16, ARð3Þ ¼ −0.08,
pends on the length of observations recorded and the purpose of the SARð12Þ ¼ −0.59, MAð1Þ ¼ 0.3, MAð2Þ ¼ −0.1, SMAð12Þ ¼
modeling. For the forecast purpose in this study, we found that the 0.41, and SMAð12Þ ¼ 0.5 with a SBC score of −2.4 for the Rasht
ANN forecasts are uncertain beyond 3 years. Note that this model station; and SARIMAð0; 1; 1Þ × ð1; 1; 2Þ12 where SARð12Þ ¼
and other models used in this study are trained only with the pre- −0.66, MAð1Þ ¼ 0.95, SMAð12Þ ¼ 0.28, and SMAð24Þ ¼ 0.59
cipitation data or statistics of the precipitation data as inputs and not with a SBC score of 3.4 for the Gorgan station. The Kolmogorov–
other atmospheric variables. Furthermore, if the entire simulation Smirnov normality test of model residuals verifies the normality of
period is split into 70% for training and 30% for forecasting or if the the residuals. The significant level of the test is equal to 0.2 for both
k-fold validation is used, the ANN model outputs will not be com- time series, which is larger than α ¼ 0.05.
parable to the results of the SARIMA and ISARIMA models,
which are based solely on autoregression and moving average
parameters and their performance degrades as the forecast lead time ANN Model
increases.
The inputs of the ANN model are determined by calculating the
correlation coefficient between precipitation values at the current
time step, t and four previous time steps, t − 1, t − 2, t − 3, and
Results
t − 4. The maximum correlation coefficient was found between
The monthly time series of precipitation in the Rasht and Gorgan precipitations at the current time step and that in time step t − 1
stations for the period of 1976–2013 are used to train the following and it was 0.33 for the Rasht station and 0.17 for the Gorgan
models: conventional SARIMA, ANN, ISARIMA based on clus- station. In this study, a back-propagation neural network algorithm
tering analysis and regression analysis, improved ANN based on is used. Modeling steps such as selection of the optimal number of
clustering and regression analysis (IANN), and the proposed hybrid neurons in the hidden layer and types of the activation functions are
of ISARIMA and ANN based on clustering analysis (ISARIMA- necessary to appropriately train the ANN model. The activation
ANN). The performance of these models in forecasting precipita- functions that are used in this research include logistic sigmoid,
tion is assessed over the period of 2014–2016. tangent sigmoid, and pure linear. The performance criteria to obtain
the optimal number of the neurons in the hidden layer are shown in
Fig. 4. The optimal number of neurons in the hidden layer is found
SARIMA Model to be 7 and the activation functions of hidden and output layers are
Testing randomness and seasonal autocorrelation of time series found to perform better with tangent sigmoid and logistic sigmoid
is the first step in developing a statistical model. The precipitation functions.
time series over the training period and autocorrelation functions
(ACFs) are plotted in Fig. 3. The seasonal cycle in time series is
ISARIMA and IANN
repeated every 12 months as shown with ACF values [Figs. 3(b–f)].
The crossing of ACF spikes outside of the confidence limit The improved versions of the ANN and SARIMA models are ob-
[Figs. 3(b–f)] indicates that the precipitation time series in the tained when these models are applied to different clusters with sim-
two study stations are not random. The normality test shows that ilar precipitation patterns instead of a single model for the entire time
the time series are not normally distributed. Therefore, time series series. In fact, the ANN and SARIMA models are not directly ap-
must be transformed to a normal distribution and normally distrib- plied to time series, but they are applied to the statistics of each clus-
uted time series are used to develop the models. ter. The Ward clustering method discussed in the “Methodology”

© ASCE 04019053-5 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


600

150
500
Monthly precipitation [mm]”

Monthly precipitation [mm]”


400

100
300
200

50
100
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

0
0 100 200 300 400 0 100 200 300 400
(a) Month (1976−2013) (d) Month (1976−2013)
1.0

1.0
0.8

0.8
0.6

0.6
0.4
ACF

ACF
0.4
0.2

0.2
0.0

0.0
−0.2

−0.2

0 5 10 15 20 25 0 5 10 15 20 25
(b) Lag (Month) (e) Lag (Month)
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20
0.3
0.2
Partial ACF

Partial ACF
0.1
0.0
−0.1
−0.2

0 5 10 15 20 25 0 5 10 15 20 25
(c) Lag (Month) (f) Lag (Month)

Fig. 3. Monthly precipitation over the period of 1976–2013 and autocorrelation function (ACF) and partial autocorrelation function (PACF)
values for different lag times for (a–c) Rasht station; and (d–f) Gorgan station. The horizontal lines in ACF and PACF plots represent the 95%
confidence intervals. Not all of the autocorrelation values are within the confidence intervals. This shows that these time series are not random
and have a seasonal cycle.

section is applied to classify the monthly precipitation as shown in dendrogram by the number of vertical lines in the dendrogram
Fig. 5. A dendrogram is a tree diagram that presents the distance cut by a horizontal line that can transverse the maximum distance
and sequence at which the observations are clustered (Belgaman vertically without intersecting a cluster. The cutting point is based
et al. 2017). The number of clusters can be chosen from the on an accrued distance. Based on the clustering analysis and the

© ASCE 04019053-6 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


2
RRMSE MRE 7.5
1.6
Error criteria

7
1.2

Distance - Rasht
6.5
0.8

0.4 6

0 5.5
2 3 4 5 6 7 8 9
(a) number of neuron
5
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

70
(a) Jan Oct May Jun Nov Feb Mar Apr Aug Jul Dec Sep
RMSE
68
Error criteria

66
7.5
64
7

Distance - Gorgan
62
6.5
60
2 3 4 5 6 7 8 9
(b) number of neuron 6

Fig. 4. Identifying the optimal number of neurons in the hidden layer 5.5
of the ANN model based on assessment measures of (a) RRMSE,
MRE; and (b) RMSE in the Rasht station.
5

(b) Jun Oct Feb Aug Jan Dec Sep Mar Jul May Apr Nov

dendrogram in Fig. 5, four clusters of monthly precipitation for the Fig. 5. Dendrogram of monthly precipitation in (a) Rasht; and
Rasht station (Cluster 1 = January, October; Cluster 2 = June, (b) Gorgan stations. The y-axis shows the Euclidean distance.
November, May; Cluster 3 = July, December, September; and
Cluster 4 = February, March, April, August) and three clusters for
the Gorgan station (Cluster 1 = February, June, August, October;
Cluster 2 = January, September, December; and Cluster 3 = functions in February from Cluster 1 and January from Cluster 2.
April, March, May, July, November) are obtained. After clustering Minimum RMSE was found when activation functions of tangent
monthly precipitation, the statistics of each cluster, including maxi- sigmoid-tangent sigmoid for the January cluster and logistic
mum, minimum and truncated mean, are calculated. Based on clus- sigmoid-tangent sigmoid for the February cluster are chosen.
tering analysis, the number of maximum, minimum, and truncated The RMSE score varies substantially with the selection of activa-
mean time series is 4, 4, and 4 in the Rash station and it is 3, 3, tion function and ranges from 19 to 53 mm (Fig. 6). This shows the
and 3 in the Gorgan station. In the next step, 12 LR models are importance of activation function to be optimally selected for
developed for each station with the main statistics of each cluster reducing the modeling error. The autocorrelation of residuals,
as independent variables (model inputs) and monthly precipitation which can display the errors in the timing of peaks or systematic
of each cluster as dependent variables (model outputs). More spe-
overestimation and underestimation are plotted in Fig. 7 for the
cifically, 4, 3, and 5 LR models for three clusters in the Gorgan
hybrid model. The ACF spikes are within the confidence limit,
station, and 2, 3, 3, and 4 LR models for four clusters in the Rasht
therefore the residuals are random in nature. This indicates that the
station are obtained based on the cluster statistics. Then, ARIMA
hybrid model passes the diagnostic check.
and ANN are used to estimate the main statistics of each cluster in
The hybrid model had minimum values of RMSE, RRMSE,
ISARIMA and IANN models, respectively. For example, ARIMA
MSE, MRE, and MAE (Table 1). The model with the smallest
(1, 0, 3) and ARIMA(3, 0, 3) for time series of minima in the
MSE has the smallest magnitude of error and the smallest MAE
first cluster and maxima in the third cluster are obtained for the
has the smallest average of error magnitudes. The goodness-of-
Gorgan station. The established LR models are finally applied to
the estimated statistics in each cluster to back-estimate forecasted fit for high precipitation values is better reflected in the RMSE
precipitation values in the validation period. score (Pannakkong et al. 2016; Hamidi et al. 2015). Fig. 8 shows
the observed and forecasted precipitation time series for all of the
models used in this study for the validation period (2014–2016).
Hybrid of ISARIMA-ANN Based on Clustering Analysis The hybrid model was able to obtain better forecasts than the other
A simple ANN is used to simulate monthly precipitation and models and the R2 of the fitted line in the hybrid model has the
statistics of each cluster. In fact, ANN inputs are the statistics of maximum value as that it increased 59% in the Rasht station and
each cluster. To find the optimal activation functions of the hidden 40% in the Gorgan station relative to that for the ISARIMA model.
and output layers, combinations of multiple functions are tested. The comparison of the SARIMA and hybrid models in forecasting
The error criteria are shown in Fig. 6 for different activation precipitation in each month is shown in Fig. 9. The RRMSE values

© ASCE 04019053-7 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


60
class 2- January class 1- February
50

40

RMSE 30

20

10
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

0
logs- logs logs-purl logs-tangs purl-logs purl-purl purl-tangs tangs-logs tangs-purl tangs-tangs
Type of activation function

Fig. 6. Sensitivity of ANN performance in forecasting monthly precipitation to the type of activation function at the Gorgan station.

are large from May to August in the Rasht station, especially for the performance with the clustering analysis was investigated and
SARIMA model. The hybrid model decreases RRMSE in the Rasht we found that the clustering analysis can improve the performance
station to 1.8 in May and below 0.7 in other months [Fig. 9(a)]. The of the ANN and ARIMA forecast models with dividing data into
minimum RRMSE of the forecasts by the hybrid model is calcu- groups with similar characteristics. The smallest error criteria and
lated in February for the Rasht station [Fig. 9(a)] with a long-term largest AI values were obtained for the hybrid of the ISARIMA and
average precipitation of 117 mm [Fig. 9(b)], and in October for the ANN models based on the clustering analysis. For example, the
Gorgan station [Fig. 9(a)] with a long-term average precipitation of hybrid model improved the RRMSE and MRE scores from 0.67
55 mm [Fig. 9(b)]. To assess the model performance in forecasting to 0.50 and from 1.47 to 1 relative to the SARIMA model scores
seasonal precipitation, which is very crucial for the water resources in the Rasht station (Table 1). The outperformance of ANN, when
management and agricultural practices in the study region, the compared with the linear regression model, is well established in
monthly time series in the validation period were aggregated to ob- the literature (e.g., Soares dos Santos et al. 2016; Tosun et al. 2016;
tain the seasonal time series (Fig. 10). The seasonal precipitation Rasouli et al. 2012; Hung et al. 2009). Soares dos Santos et al.
forecasted by the hybrid model has the smallest error and largest AI (2016) suggested that ANN can represent nonlinear processes that
score and the seasonal precipitation forecasted by the SARIMA might not be captured by the linear regression model. Also, the
model has the largest error and smallest AI score (Fig. 10). The ANN model can be more descriptive than the linear regression
maximum values of observed and forecasted precipitation by the model. The empirical relationships between predictors and predic-
hybrid model were 597 and 584 mm in the Rasht station and 382 tands in the linear regression model are assumed to be known while
and 386 mm in the Gorgan station, which occurred in autumn. in the reality atmospheric processes are complex and we cannot
detect the exact relationships between the variables. Therefore,
linear regressions have insufficient capacity to forecast phenomena
Discussion with a nonlinear nature, e.g., precipitation (Adamowski and
Karapataki 2010). The results in this paper showed that the hybrid
Coupling clustering analysis and linear regression to SARIMA model outperformed the other models and that coupling the
and ANN models improved the simulation accuracy, and the RMSE ARIMA model to the hybrid model successfully improved the fore-
score decreased 21% by ISARIMA relative to SARIMA and 11% cast skills even better than those reported in previous studies
by IANN relative to ANN for the Rasht station and 5% and 15% by (e.g., Zeynoddin et al. 2018; Pannakkong et al. 2016; Wang et al.
ISARIMA and IANN, respectively, for the Gorgan station. The 2014, 2015). One advantage of ANN is its capacity to incorporate
improvements of the ISARIMA and IANN models prove that temporal variability, concurrent values, and various predictive val-
the clustering analysis on subseasonal precipitation has an impor- ues as input without any additional effort (Soares dos Santos et al.
tant role in the accuracy of the forecasts in the study areas. The 2016). The minimum values of RRMSE in the Rasht and Gorgan
clustering analysis extracts more information from data, and there- stations belong to February (0.17) and March (0.13), which are
fore increases the forecast accuracy. The number of clusters is similar to the results reported by Wang et al. (2014). The scatter
chosen by the dendrogram of the observations (Fig. 5). If the lo- diagram of observed and simulated precipitation in Fig. 8 shows
cation of a cutting point, a horizontal line in the dendrogram, moves that monthly precipitation values are overestimated slightly by
up (e.g., Euclidian distance of 7–7.5) (Fig. 5), the number of clus- the hybrid model and substantially by the other models. The pre-
ters becomes low, leading to challenges in classification; however, diction interval for the hybrid model is provided by the shaded area
when the location of a cutting point moves down (e.g., Euclidian around the precipitation forecast (Fig. 8). Wang et al. (2014)
distance of 5) (Fig. 5), the number of clusters becomes large, lead- showed an overestimation of precipitation, especially in October,
ing to complexity in finding months with similar precipitation pat- and similarly, in this study, precipitation is slightly overestimated
terns. The performance of improved models is consistent with the in the Rasht and Gorgan stations in the validation period. The maxi-
study of Wang et al. (2014) for the precipitation forecast. Wang mum and minimum values of the observed Rasht precipitation time
et al. (2014) used only the ARIMA model but in this study, in series in the validation period occurred in October and June in
addition to the ARIMA model, the improvement of ANN 2014, which are consistent with maximum and minimum values

© ASCE 04019053-8 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


1.0

1.0
0.8

0.8
0.6

0.6
0.4

0.4
ACF

ACF
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

0.2

0.2
0.0

0.0
−0.2

−0.2
(a) 0 5 10 15 (c) 0 5 10 15
0.3

0.3
0.2

0.2
0.1

0.1
Partial ACF

Partial ACF
0.0

0.0
−0.1

−0.1
−0.2

−0.2
−0.3

−0.3

2 4 6 8 10 12 14 2 4 6 8 10 12 14
(b) Lag (Month) (d) Lag (Month)

Fig. 7. Autocorrelation and partial autocorrelation functions of the residuals of the hybrid model in (a and b) Rasht; and (c and d) Gorgan stations.
Both ACF and PACF values are within the 95% confidence intervals, which show the randomness of the residuals.

of precipitation forecasted by the hybrid model. The maximum 39 mm [Fig. 9(b)]. The RMSE scores in the Rasht station are
monthly precipitation was observed in October 2014 (370 mm) relatively smaller in the wet season (October to January) than
for the Rasht station and in March 2014 (99 mm) for the Gorgan spring and summer months [Fig. 9(a)]. This shows that precipi-
station. The maximum observed and forecasted precipitation val- tation forecast in wet months is more skillful than the relatively
ues are in good agreement. The maximum RRMSE is calculated dry months, especially in the Rasht station with a humid climate.
in May for the Rasht station [Fig. 9(a)] with a long-term average This is further evidence that the proposed hybrid model performs
precipitation of 48 mm [Fig. 9(b)], and in May for the Gorgan better than the SARIMA model. The forecast accuracy measures
station [Fig. 9(a)] with a long-term average precipitation of also show that the ANN model, in comparison to the SARIMA

© ASCE 04019053-9 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


Table 1. Comparison of the performance of the five forecast models and associated error scores in the Rasht and Gorgan stations
Rasht Gorgan
Model RMSE (mm) RRMSE MRE MSE (mm2 ) MAE (mm) AI RMSE (mm) RRMSE MRE MAE (mm) MSE (mm2 ) AI
SARIMA 88.6 0.87 2.96 7,855 84.6 0.69 24.1 0.67 1.47 19.8 583 0.66
ISARIMA 69.7 0.68 0.88 4,867 47.0 0.71 22.9 0.64 1.44 17.0 526 0.68
ANN 63.3 0.62 1.47 4,011 43.7 0.62 23.4 0.65 1.46 18.3 549 0.45
IANN 56.6 0.55 1.00 3,209 41.4 0.87 19.9 0.55 1.10 14.7 396 0.66
ISARIMA-ANN 45.7 0.44 0.68 2,087 35.4 0.91 18.2 0.50 1.00 12.3 333 0.77

500
Obs
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

400 SARIMA
ISARIMA
300 ANN
IANN
200
Hybrid

100
Precipitation [mm]

(a)
120

100

80

60

40

20

0
2014 A J N 2015 A J N 2016 A J N
(b)
400 100
SARIMA
Forecasted precipitation [mm]

ISARIMA
ANN 80
300
IANN
Hybrid 60
200
40

100
20

0 0
0 100 200 300 400 0 20 40 60 80 100
Observed precipitation [mm]
(c) (d)

Fig. 8. Forecasted monthly precipitation in (a and c) Rasht; and (b and d) Gorgan stations for the period of January 2014 through December 2016. The
shaded areas in (a and b) show the prediction interval for the proposed hybrid model. For computing the prediction interval, a Gaussian distribution
with conditional mean equal to the hybrid model predictions and constant variance equal to the variance of the prediction residuals are used.

model, has smaller errors (Table 1). Better performance of the Conclusion
ANN model relative to the ARIMA model is reported in the lit-
erature (e.g., Sulaiman and Wahab 2018; Ouyang et al. 2016; Despite improvements in theory and computational capacities, pre-
Kumar Nanda et al. 2013; Somvanshi et al. 2006). Nonlinear cipitation forecast still remains one of the most challenging tasks in
characteristics of the ANN models allow them to capture nonlin- water resources management and planning. One of the widely used
ear atmospheric processes better than the linear ARIMA model statistical models to forecast precipitation is SARIMA, which
(Sulaiman and Wahab 2018). combines the values and noise so that the forecast values are very

© ASCE 04019053-10 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


5
similar to the observations. In order to account for the interannual
4.5 SARIMA - Gorgan
and intermonthly variations, the SARIMA and ANN models are
4 Hybrid - Gorgan
improved based on temporal precipitation clusters. The improved
3.5 SARIMA - Rasht
Hybrid - Rasht
versions of these models are called ISARIMA and IANN. In this
3
RRMSE

study, the ISARIMA model is further improved and a hybrid model


2.5
of ISARIMA and ANN is proposed based on subseasonal precipi-
2 tation groups obtained from hierarchical clustering analysis.
1.5 The intermonthly variation in the precipitation fluctuations is also
1 incorporated into the hybrid model. The model is evaluated in the
0.5 Gorgan and Rasht weather stations located near the Caspian Sea.
0 The results show that when the clustering analysis is conducted on
(a) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
precipitation data, the forecast skill improves substantially and the
RMSE score decreases 21% by ISARIMA relative to SARIMA and
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

SARIMA - Gorgan 11% by IANN relative to ANN in the Rasht station. The clustering
200 Hybrid - Gorgan 200 analysis improves the performance of the forecast models by
SARIMA - Rasht searching analogous precipitation patterns over the long observa-

Mean (1976-2016)
150 Hybrid - Rasht 150 tion period (1976–2013) and grouping them into clusters with sim-
Mean - Gorgan ilar characteristics. When ISARIMA is coupled to ANN in the
RMSE

Mean - Rasht
100 100 hybrid model, the forecast skill is further improved and the RMSE
score of the forecasted precipitation against the observations de-
50 50
creased by 48% for the Rasht station and 24% for the Gorgan sta-
tion relative to the SARIMA forecasts. This is because of the added
value of the ANN model by capturing temporal variability. Appli-
0 0
(b) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
cation of the ANN model, instead of a regression model, captures
the nonlinear relationships between main statistics of monthly pre-
Fig. 9. Performance of the SARIMA and hybrid models in monthly cipitation clusters and associated precipitation intensities. More
precipitation forecasting in the validation period with two statistical skillful monthly precipitation forecasts were obtained from the hy-
measures: (a) RRMSE; and (b) RMSE. brid model in comparison to the other statistical models, which are
reflected in the forecast skill of the seasonal precipitation. This sug-
gests that the hybrid model can be reliably used for forecasting pre-
cipitation and other complex phenomena in similar climates and
regions and across temporal scales and for better management
of water resources and agricultural activities.

RRMSE-Gorgan
1.4 MRE-Gorgan Acknowledgments
AI-Gorgan
1.2
RRMSE-Rasht
The research was funded by Azarbaijan Shahid Madani University.
1 MRE-Rasht
Kabir Rasouli was supported by the Natural Sciences and Engi-
Error criteria

AI-Rasht
0.8
neering Research Council, NSERC’s Postdoctoral fellowship.
The manuscript benefitted from the comments and suggestions
0.6 of the editor and the anonymous reviewers on a previous version.
0.4

0.2
References
0
(a) SARIMA ISARIMA ANN IANN ISARIMA-ANN Adamowski, J., and C. Karapataki. 2010. “Comparison of multivariate
regression and artificial neural networks for peak urban water-demand
250 forecasting: Evaluation of different ANN learning algorithms.” J. Hy-
drol. Eng. 15 (10): 729–743. https://doi.org/10.1061/(ASCE)HE.1943
200 RMSE-Gorgan -5584.0000245.
Belgaman, H. A., K. Ichiyanagi, R. Suwarman, M. Edvin Aldrian, A. I. D.
MAE-Gorgan
Utami, and D. A. Kusumaningtyas. 2017. “Characteristics of seasonal
Error criteria

150 RMSE-Rasht precipitation isotope variability in Indonesia.” Hydrol. Res. Lett. 11 (2):
MAE-Rasht 92–98. https://doi.org/10.3178/hrl.11.92.
100 Bezdek, J. C. 1981. Pattern recognition with fuzzy objective function algo-
rithms. New York: Plenum Press.
50 Blashfield, R. K. 1976. “Mixture model tests of cluster analysis: Accuracy
of four agglomerative hierarchical methods.” Psychol. Bull. 83 (3):
377–388. https://doi.org/10.1037/0033-2909.83.3.377.
0
(b) SARIMA ISARIMA ANN IANN ISARIMA-ANN Box, G. E. P., and G. M. Jenkins. 1976. Times series analysis-forecasting
and control. Englewood Cliffs, NJ: Prentice-Hall.
Fig. 10. Forecast accuracy of seasonal precipitation based on Caesar, L. K., O. M. Kvalheim, and N. B. Cech. 2018. “Hierarchical cluster
(a) RRMSE, MAE, and AI; and (b) RMSE and MRE for the Gorgan analysis of technical replicates to identify interferents in untargeted
mass spectrometry metabolomics.” Anal. Chim. Acta 1021 (Aug):
and Rasht stations.
69–77. https://doi.org/10.1016/j.aca.2018.03.013.

© ASCE 04019053-11 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


Cannon, A. J., S. R. Sobie, and T. Q. Murdock. 2015. “Bias correction of Jiang, B., S. Liang, W. Jindi, and Z. Xiao. 2010. “Modeling MODIS LAI
GCM precipitation by quantile mapping: How well do methods time series using three statistical methods.” Remote Sens. Environ.
preserve changes in quantiles and extremes?” J. Clim. 28 (17): 114 (7): 1432–1444. https://doi.org/10.1016/j.rse.2010.01.026.
6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1. Karamouz, M., K. Rasouli, and S. Nazif. 2009. “Development of a hybrid
Carter, G. M., J. P. Dallavalle, and H. R. Glahn. 1989. “Statistical forecasts index for drought prediction: Case study.” J. Hydrol. Eng. 14 (6):
based on the National Meteorological Center’s numerical weather pre- 617–627. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000022.
diction system.” Weather Forecasting 4 (3): 401–412. https://doi.org/10 Kendall, M. G. 1975. Rank correlation methods. London: Griffin.
.1175/1520-0434(1989)004<0401:SFBOTN>2.0.CO;2. Kohonene, T. 1989. Self-organization and associative memory. New York:
Chang, X., M. Gao, Y. Wang, and X. Hou. 2012. “Seasonal autoregressive Springer.
integrated moving average model for precipitation time series.” J. Math. Koppen, W. 1936. Das geographische system der climate. Berlin: Gebrüder
Stat. 8 (4): 500–505. https://doi.org/10.3844/jmssp.2012.500.505. Borntraeger.
Chen, K. Y., and C. H. Wang. 2007. “A hybrid SARIMA and support vector Kumar Nanda, S., D. P. Tripathy, S. Kumar Nayak, and S. Mohapatra.
machines in forecasting the production values of the machinery industry 2013. “Prediction of rainfall in India using artificial neural network
(ANN) models.” Int. J. Intell. Syst. Appl. 5 (12): 1–22. https://doi.org
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

in Taiwan.” Expert Syst. Appl. 32 (1): 254–264. https://doi.org/10.1016


/j.eswa.2005.11.027. /10.5815/ijisa.2013.12.01.
Coles, A. E., B. G. McConkey, and J. J. McDonnell. 2017. “Climate change Lee, N. U., J. S. Shim, Y. W. Ju, and S. C. Park. 2018. “Design and
implementation of the SARIMA-SVM time series analysis algorithm
impacts on hillslope runoff on the northern Great Plains, 1962-2013.”
for the improvement of atmospheric environment forecast accuracy.”
J. Hydrol. 550 (Jul): 538–548. https://doi.org/10.1016/j.jhydrol.2017
Soft Comput. 22 (13): 4275–4281. https://doi.org/10.1007/s00500
.05.023.
-017-2825-y.
Dabral, P. P., and M. Z. Murry. 2017. “Modelling and forecasting of rainfall
LIoyd, S. 1982. “Least squares quantization in PCM.” IEEE Trans. Inf.
time series using SARIMA.” Environ. Processes 4 (2): 399–419.
Theory 28 (2): 129–137. https://doi.org/10.1109/TIT.1982.1056489.
https://doi.org/10.1007/s40710-017-0226-y.
Mann, H. B. 1945. “Nonparametric tests against trend.” Econometrcia
Eni, D., and F. J. Adeyeye. 2015. “Seasonal ARIMA modeling and fore- 13 (3): 245–259. https://doi.org/10.2307/1907187.
casting of rainfall in Warri Town, Nigeria.” J. Geosci. Environ. Prot. Marzano, F. S., E. Fionda, and P. Ciotti. 2006. “Neural-network approach
3 (6): 91–98. https://doi.org/10.4236/gep.2015.36015. to ground-based passive microwave estimation of precipitation intensity
Eszergár-Kiss, D., and B. Caesar. 2017. “Definition of user groups applying and extinction.” J. Hydrol. 328 (1–2): 121–131. https://doi.org/10.1016
Ward’s method.” Transp. Res. Procedia 22: 25–34. https://doi.org/10 /j.jhydrol.2005.11.042.
.1016/j.trpro.2017.03.004. McCulloch, W. S., and W. Pitts. 1943. “A logical calculus of the ideas
Farrelly, C. M., S. J. Schwartz, A. L. Amodeo, D. J. Feaster, D. L. Steinley, immanent in nervous activity.” Bull. Math. Biophys. 5 (4): 115–133.
A. Meca, and S. Picariello. 2017. “The analysis of bridging https://doi.org/10.1007/BF02478259.
constructs with hierarchical clustering methods: An application to iden- Michalakes, J. 1999. Design of a next-generation regional weather re-
tity.” J. Res. Personality 70 (Oct): 93–106. https://doi.org/10.1016/j.jrp search and forecast model. Rep. No. ANL/MCS/CP-98150. Lemont,
.2017.06.005. IL: Argonne National Laboratory.
Fekete, B. M., C. J. Vörösmarty, J. O. Roads, and C. J. Willmott. 2004. Mislan, K., M. Haviluddin, S. Hardwinarto, M. Sumaryono, and M.
“Uncertainties in precipitation and their impacts on runoff estimates.” Aipassa. 2015. “Rainfall monthly prediction based on artificial neural
J. Clim. 17 (2): 294–304. https://doi.org/10.1175/1520-0442(2004) network: A case study in Tenggarong station, east Kalimantan–
017<0294:UIPATI>2.0.CO;2. Indonesia.” Procedia Comput. Sci. 59: 142–151. https://doi.org/10
Glahn, H. R., and D. A. Lowry. 1972. “The use of model output statistics .1016/j.procs.2015.07.528.
(MOS) in objective weather forecasting.” J. Appl. Meteorol. 11 (8): Moeeni, H., and H. Bonakdari. 2017. “Forecasting monthly inflow with
1203–1211. https://doi.org/10.1175/1520-0450(1972)011%3C1203: extreme seasonal variation using the hybrid SARIMA-ANN model.”
TUOMOS%3E2.0.CO;2. Stochastic Environ. Res. Risk Assess. 31 (8): 1997–2010. https://doi
Hamidi, O., J. Poorolajal, M. Sadeghifar, H. Abbasi, Z. Maryanaji, H. R. .org/10.1007/s00477-016-1273-z.
Faridi, and L. Tapak. 2015. “A comparative study of support vector ma- Moeeni, H., H. Bonakdari, and I. Ebtehaj. 2017. “Integrated SARIMA with
chines and artificial neural network for predicting precipitation in Iran.” neuro-fuzzy systems and neural networks for monthly inflow predic-
Theor. Appl. Climatol. 119 (3–4): 723–731. https://doi.org/10.1007 tion.” Water Resour. Manage. 31 (7): 2141–2156. https://doi.org/10
/s00704-014-1141-z. .1007/s11269-017-1632-7.
Hamill, T. M., and J. S. Whitaker. 2006. “Probabilistic quantitative precipi- Moustris, K. P., I. K. Larissi, P. T. Nastos, and A. G. Paliatsos. 2011. “Pre-
tation forecasts based on reforecast analogs: Theory and application.” cipitation forecast using artificial neural networks in specific regions of
Mon. Weather Rev. 134 (11): 3209–3229. https://doi.org/10.1175 Greece.” Water Resour. Manage. 25 (8): 1979–1993. https://doi.org/10
.1007/s11269-011-9790-5.
/MWR3237.1.
Narasimha Murthy, K. V., R. Saravana, and K. Vijaya Kumar. 2018.
Hands, S., and B. Everitt. 1987. “A Monte Carlo study of the recovery of
“Modeling and forecasting rainfall patterns of southwest monsoons
cluster structure in binary data by hierarchical clustering techniques.”
in north-east India as a SARIMA process.” Meteorol. Atmos. Phys.
Multivariate Behav. Res. 22 (2): 235–243. https://doi.org/10.1207
130 (1): 99–106. https://doi.org/10.1007/s00703-017-0504-2.
/s15327906mbr2202_6.
Nascimento Camelo, H., P. Sérgio Lucio, J. Verçosa Leal Junior, D. Von
Hirsch, R. M., and J. R. Slack. 1984. “A nonparametric trend test for Glehn dos Santos, and P. Cesar Marques de Carvalho. 2018. “Innovative
seasonal data with serial dependence.” Water Resour. Res. 20 (6): hybrid modeling of wind speed prediction involving time-series models
727–732. https://doi.org/10.1029/WR020i006p00727. and artificial neural networks.” Atmosphere 9 (2): 77 https://doi.org/10
Huesca, M., J. Litago, S. Merino-de-Miguel, V. Cicuendez-López-Ocaña, .3390/atmos9020077.
and A. Palacios-Orueta. 2014. “Modeling and forecasting MODIS- Osarumwense, O. I. 2013. “Applicability of Box Jenkins SARIMA model
based fire potential index on a pixel basis using time series models.” in rainfall forecasting: A case study of Port-Harcourt south south
Int. J. Appl. Earth Obs. Geoinf. 26 (Feb): 363–376. https://doi.org/10 Nigeria.” Can. J. Comput. Math. Nat. Sci. Eng. Med. 4 (1): 1–4.
.1016/j.jag.2013.09.003. Ouyang, Q., W. Lu, X. Xin, Y. Zhang, W. Cheng, and T. Yu. 2016.
Hung, N. Q., M. S. Babel, S. Weesakul, and N. K. Tripathi. 2009. “An “Monthly rainfall forecasting using EEMD-SVR based on phase-space
artificial neural network model for rainfall forecasting in Bangkok, reconstruction.” Water Resour. Manage. 30 (7): 2311–2325. https://doi
Thailand.” Hydrol. Earth Syst. Sci. 13 (8): 1413–1425. https://doi .org/10.1007/s11269-016-1288-8.
.org/10.5194/hess-13-1413-2009. Ozozen, A., G. Kayakutlu, M. Ketterer, and O. Kayalica. 2016. “A
Jain, A. K., and R. C. Dubes. 1998. Algorithms for clustering data. Upper combined seasonal ARIMA and ANN model for improved results in
Saddle River, NJ: Prentice-Hall. electricity spot price forecasting: Case study in Turkey.” In Proc. of

© ASCE 04019053-12 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053


PICMET ‘16: Technology Management for Social Innovation. Tealab, A., H. Hefny, and A. Badr. 2017. “Forecasting of nonlinear time
New York: IEEE. series using ANN.” Future Comput. Inf. J. 2 (1): 39–47. https://doi.org
Pannakkong, W., V. H. Pham, and V. N. Huynh. 2016. “A hybrid model /10.1016/j.fcij.2017.05.001.
of ARIMA, ANNs and k-means clustering for time series forecasting.” Thornthwaite, C. W. 1948. “An approach toward a rational classification of
In Vol. 9978 of Integrated uncertainty in knowledge modelling and de- climate.” Geogr. Rev. 38 (1): 55–94. https://doi.org/10.2307/210739.
cision making: IUKM 2016: Lecture notes in computer science, edited by Tosun, E., K. Aydin, and M. Bilgili. 2016. “Comparison of linear regression
V. N. Huynh, M. Inuiguchi, B. Le, and T. Denoeux, 195–206. Cham, and artificial neural network model of a diesel engine fueled with
Switzerland: Springer. biodiesel-alcohol mixtures.” Alexandria Eng. J. 55 (4): 3081–3089.
Rasouli, K., W. W. Hsieh, and J. A. Cannon. 2012. “Daily streamflow fore- https://doi.org/10.1016/j.aej.2016.08.011.
casting by machine learning methods with weather and climate inputs.” Valipour, M., M. E. Banihabib, and S. M. R. Behbahani. 2013. “Compari-
J. Hydrol. 414–415 (Jan): 284–293. https://doi.org/10.1016/j.jhydrol son of the ARMA, ARIMA, and the autoregressive artificial neural net-
.2011.10.039. work models in forecasting the monthly inflow of Dez dam reservoir.”
Sarhadi, A., D. H. Burn, M. Concepción Ausín, and M. P. Wiper. 2016. J. Hydrol. 476 (Jan): 433–441. https://doi.org/10.1016/j.jhydrol.2012
“Time-varying nonstationary multivariate risk analysis using a dynamic .11.017.
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.

Bayesian copula.” Water Resour. Res. 52 (3): 2327–2349. https://doi Wang, H. R., C. Wang, X. Lin, and J. Kang. 2014. “An improved ARIMA
.org/10.1002/2015WR018525. model for precipitation simulations.” Nonlinear Processes Geophys.
Sarkodie, S. A., and V. Strezov. 2018. “Assessment of contribution of 21 (6): 1159–1168. https://doi.org/10.5194/npg-21-1159-2014.
Australia’s energy production to CO2 emissions and environmental deg- Wang, P., and Y. Yao. 2018. “CE3: A three-way clustering method based on
radation using statistical dynamic approach.” Sci. Total Environ. mathematical morphology.” Knowledge-Based Syst. 155 (Sep): 54–65.
639 (Oct): 888–899. https://doi.org/10.1016/j.scitotenv.2018.05.204.
https://doi.org/10.1016/j.knosys.2018.04.029.
Sfetsos, A., and C. Siriopoulos. 2004. “Combinatorial time series forecast-
Wang, W. C., K. W. Chau, D. M. Xu, and X. Y. Chen. 2015. “Improving
ing based on clustering algorithms and neural networks.” Neural Com-
forecasting accuracy of annual runoff time series using ARIMA based
put. Appl. 13 (1): 56–64. https://doi.org/10.1007/s00521-003-0391-y.
on EEMD decomposition.” Water Resour. Manage. 29 (8): 2655–2675.
Slater, L. J., G. Villarini, A. A. Bradley, and G. A. Vecchi. 2017. “A
https://doi.org/10.1007/s11269-015-0962-6.
dynamical statistical framework for seasonal streamflow forecasting
Ward, J. H. 1963. “Hierarchical grouping to optimize an objective func-
in an agricultural watershed.” Clim. Dyn. 1–17. https://doi.org/10
tion.” J. Am. Stat. Assoc. 58 (301): 236–244. https://doi.org/10.1080
.1007/s00382-017-3794-7.
Soares dos Santos, T., D. Mendes, and R. Rodrigues Torres. 2016. /01621459.1963.10500845.
“Artificial neural networks and multiple linear regression model using Willmott, C. J. 1981. “On the validation of models.” Phys. Geogr. 2 (2):
principal components to estimate rainfall over South America.” Nonlin- 184–194. https://doi.org/10.1080/02723646.1981.10642213.
ear Processes Geophys. 23 (1): 13–20. https://doi.org/10.5194/npg-23 Yolcu, U., E. Egrioglu, and C. H. Aladag. 2013. “A new linear and non-
-13-2016. linear artificial neural network model for time series forecasting.”
Somvanshi, V. K., O. P. Pandey, P. K. Agrawal, N. V. Kalankerl, M. R. Decision Supp. Syst. 54 (3): 1340–1347. https://doi.org/10.1016/j.dss
Prakash, and C. Ramesh. 2006. “Modelling and prediction of rainfall .2012.12.006.
using artificial neural network and ARIMA techniques.” J. Ind. Geo- Zeynoddin, M., H. Bonakdari, A. Azari, I. Ebtehaj, B. Gharabaghi, and
phys. Union. 10 (2): 141–151. H. R. Madavar. 2018. “Novel hybrid linear stochastic with non-linear
Sulaiman, J., and S. H. Wahab. 2018. Heavy rainfall forecasting model extreme learning machine methods for forecasting monthly rainfall a
using artificial neural network for flood prone area. New York: tropical climate.” J. Environ. Manage. 222 (Sep): 190–206. https://doi
Springer. .org/10.1016/j.jenvman.2018.05.072.
Sun, H., and M. Koch. 2001. “Case study: Analysis and forecasting of Zhang, F., Z. Zhang, P. Zhang, and S. Wang. 2018. “UD-HMM: An un-
salinity in Apalachicola bay, Florida, using Box-Jenkins ARIMA mod- supervised method for shilling attack detection based on hidden Markov
els.” J. Hydraul. Eng. 127 (9): 718–727. https://doi.org/10.1061 model and hierarchical clustering.” Knowledge-Based Syst. 148 (May):
/(ASCE)0733-9429(2001)127:9(718). 146–166. https://doi.org/10.1016/j.knosys.2018.02.032.
Tamilselvi, R., B. Sivasakthi, and R. Kavitha. 2015. “A comparison of Zhang, G. P. 2003. “Time series forecasting using a hybrid ARIMA and
various clustering methods and algorithms in data mining.” Int. J. neural network model.” Neurocomputing 50 (Jan): 159–175. https://doi
Multidiscip. Res. Dev. 2 (5): 32–98. .org/10.1016/S0925-2312(01)00702-0.

© ASCE 04019053-13 J. Hydrol. Eng.

J. Hydrol. Eng., 2019, 24(12): 04019053

You might also like