Professional Documents
Culture Documents
Abstract: Forecasting precipitation remains challenging because of its large spatial and temporal variability, and the uncertainty in pre-
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
cipitation forecast leads to an important source of uncertainty in the prediction of other components of a hydrological system. In this study, in
order to forecast subseasonal precipitation and better characterize the temporal variability of precipitation, a hybrid precipitation forecast
model was developed based on (1) temporal clustering of subseasonal precipitation; and (2) coupling an improved seasonal autoregressive
integrated moving average (ISARIMA) model to an artificial neural network (ANN) model to take the advantages of both models and capture
precipitation persistence and statistics in each cluster. The performance of the proposed model was compared against different variations of
conventional statistical models in the Rasht station with a humid climate and Gorgan station with a Mediterranean climate, both located
south of the Caspian Sea in northern Iran. The model evaluation criteria indicated that the hybrid model can remarkably improve forecast
accuracy. The root-mean square error score of the forecasted precipitation by the hybrid model against observations decreased 48% and 24%
in the Rasht and Gorgan stations, respectively, when compared with the seasonal autoregressive integrated moving average (SARIMA)
model and the index of agreement increased 32% and 17%, respectively, when compared with the ANN models. The proposed hybrid model
can be a useful tool for forecasting subseasonal precipitation in humid and arid climates with persistent and nonpersistent precipitation
patterns. DOI: 10.1061/(ASCE)HE.1943-5584.0001862. © 2019 American Society of Civil Engineers.
Author keywords: Subseasonal precipitation; Forecast; Hybrid model; Artificial neural network (ANN); Clustering analysis.
ing the weakness of each model by preprocessing the training data prove the precipitation estimations. To overcome the limitation of
(Wang et al. 2014). Wang et al. (2014) improved the accuracy of ISARIMA model, the ANN model, instead of a regression model,
monthly precipitation forecast using an improved version of SAR- was used in this paper to capture the nonlinear relationships between
IMA model, called ISARIMA. However, the application of linear re- main statistics of monthly precipitation clusters and associated
gression in this model for capturing nonlinear features of precipitation precipitation intensities. Therefore, a hybrid model is developed
can lead to uncertainties in precipitation forecasts. Both linearity and in this study by integrating ISARMIA and ANN models to estimate
nonlinearity of time series are required to be considered in precipi- monthly precipitation in the future. Therefore, the objectives of this
tation forecasts (Chen and Wang 2007; Tosun et al. 2016). Satisfac- study are to cluster the temporal patterns of precipitation, improve
tory results of the linear models are possible when the linear monthly estimations of precipitation in two weather stations south of
component of the time series is dominant to the nonlinear component. the Caspian Sea, and evaluate the forecast performance of the pro-
Also, with a highly nonlinear component of the time series, nonlinear posed hybrid method against SARIMA, ANN, and their improved
models can substantially improve the forecast skill (Yolcu et al. versions. This paper is organized as follows. First, the study area is
2013). It is well established in the literature that ANN can signifi- introduced and the proposed hybrid model is developed. Next, pre-
cantly outperform linear regression models in environmental model- cipitation forecasts by data-driven models are compared with the
ing (Zhang 2003; Chen and Wang 2007; Zeynoddin et al. 2018). forecasts of the hybrid model and performance of the models is
One of the advantages of the data-driven models is their ability evaluated based on assessment measures. Finally, discussions of
to capture temporal variability, especially seasonal and subseasonal the findings and conclusions of this paper are described.
variations needed for better estimation of temporal variability in
hydrological processes and better managing water resources
(Dabral and Murry 2017; Jiang et al. 2010). Modeling temporal Methodology
variability should be carried out with care to avoid losing inter-
monthly variations (Wang et al. 2014). In the conventional
Study Site Description
SARIMA model structure intermonthly variations are usually ne-
glected, which can affect the simulation performance. Capturing The precipitation time series in the Rasht weather station in Gilan
both interannual and intermonthly variations in the hydrological Province, and the Gorgan weather station in Golestan Province in
Caspian Sea
35° N 35° N
30° N 30° N
Study area
Sea Persian Gulf
Province border
25° N 0 350 700 km
Gulf of Oman 25° N
Fig. 1. Location of weather stations south of the Caspian Sea in northern Iran. The Rasht station is located in the Gilan Province and the Gorgan
station is located in the Golestan Province.
(4) Test LR for each month; (4) Test ANN for each month;
inputs: cluster statistics inputs: cluster statistics
Fig. 2. Comparison of the four statistical precipitation forecast models used in this study. The hybrid ISARIMA-ANN model is developed based on
the identified precipitation clusters and coupled ISARIMA and ANN models.
earlier and monthly precipitation values are estimated. In summary, where Pct;A = forecasted precipitation for month t within cluster
the steps of the hybrid model development are as follows: c over the assessment period (A); and f is the ANN model as in
• Step 1 involves establishing a relationship between cluster Step 1. The basic structure of the hybrid model can be summar-
statistics and precipitation values using ANN over the training ized as follows: (1) clustering analysis for grouping monthly
period time series with similar variations, (2) calculation of main
statistics for each cluster, and (3) coupling ANN to ISARIMA
Pct;T ¼ fðScT Þ for capturing nonlinear and linear signals. Fig. 2 shows the
structure of the hybrid model in comparison to the other models.
t ¼ 1; 2; 3; : : : ; 12 S ¼ 1∶min; 2∶max; 3∶mean c ¼ 1; 2; : : : ; m
ð3Þ Measures for Modeling Performance
For evaluating the performance of models, six statistical measures
where Pct;T is monthly precipitation at month t that is classified were used to assess the forecast accuracies, namely the root-mean
by the Ward method into cluster c; ScT are statistics of the cluster square error (RMSE), relative root-mean square error (RRMSE),
c in the training period (T); m = total number of clusters; and mean absolute error (MAE), mean relative error (MRE), mean
f is the ANN model as in Eq. (2). square error (MSE), and agreement index (AI), which are defined
• Step 2 involves estimating statistics for each cluster over the as follows:
assessment period
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N
1u X
ScA ¼ gðPct Þ ð4Þ RMSE ¼ t ðO − Pi Þ2 ð6Þ
N i¼1 i
where ScA is the estimated statistics of cluster c over the assess-
ment period (A); and g is the ISARIMA model. RMSE
RRMSE ¼ ð7Þ
• Step 3 consists of forecasting precipitation for each month over Ō
the assessment period
1X N
MAE ¼ jP − Oi j ð8Þ
Pct;A ¼ fðScA Þ ð5Þ N i¼1 i
AI to overcome the insensitivity of the Nash-Sutcliffe efficiency than the significance level, α ¼ 0.05, confirming the nonstationar-
and coefficient of determination to differences between observation ity of the time series. However, after transformation the ADF value
and simulation statistics (e.g., mean and variance). The AI score is a dropped to less than 0.001, indicating that transformed data are
dimensionless measure that ranges between 0 and 1. High values of stationary. For selecting the SARIMA model order, a range of 0
AI indicate a good agreement of the observed and simulated values to 3 (four cases) is applied to each parameter of the model including
(Pannakkong et al. 2016; Moustris et al. 2011). In this paper, the seasonal moving average (SMA), seasonal autoregressive (SAR),
performance of the models is compared during a 3-year validation autoregressive (AR), and moving average (MA) (i.e., four param-
period similar to many other studies. We found that that increasing eters). From 4 × 4 × 4 × 4 ¼ 256 possible cases, only those that
the validation or forecast period to one-third of the entire observa- passed the diagnostic check (normality and white noise of the
tion period does increase the uncertainty in the precipitation fore- model residuals) are selected and those that are not feasible are dis-
casts. The length of validation period in the literature varies from carded. In order to identify the best model, the Schwarz-Bayesian
30% to less than 10% of the entire simulation period for the ANN criterion (SBC) is used. The lower the SBC score, the better the
model (e.g., Huesca et al. 2014; Nascimento Camelo et al. 2018). model performance. The best models are SARIMAð3; 0; 2Þ ×
The selection of the length of training and validation periods de- ð1; 1; 2Þ12 , where ARð1Þ ¼ 0.32, ARð2Þ ¼ −0.16, ARð3Þ ¼ −0.08,
pends on the length of observations recorded and the purpose of the SARð12Þ ¼ −0.59, MAð1Þ ¼ 0.3, MAð2Þ ¼ −0.1, SMAð12Þ ¼
modeling. For the forecast purpose in this study, we found that the 0.41, and SMAð12Þ ¼ 0.5 with a SBC score of −2.4 for the Rasht
ANN forecasts are uncertain beyond 3 years. Note that this model station; and SARIMAð0; 1; 1Þ × ð1; 1; 2Þ12 where SARð12Þ ¼
and other models used in this study are trained only with the pre- −0.66, MAð1Þ ¼ 0.95, SMAð12Þ ¼ 0.28, and SMAð24Þ ¼ 0.59
cipitation data or statistics of the precipitation data as inputs and not with a SBC score of 3.4 for the Gorgan station. The Kolmogorov–
other atmospheric variables. Furthermore, if the entire simulation Smirnov normality test of model residuals verifies the normality of
period is split into 70% for training and 30% for forecasting or if the the residuals. The significant level of the test is equal to 0.2 for both
k-fold validation is used, the ANN model outputs will not be com- time series, which is larger than α ¼ 0.05.
parable to the results of the SARIMA and ISARIMA models,
which are based solely on autoregression and moving average
parameters and their performance degrades as the forecast lead time ANN Model
increases.
The inputs of the ANN model are determined by calculating the
correlation coefficient between precipitation values at the current
time step, t and four previous time steps, t − 1, t − 2, t − 3, and
Results
t − 4. The maximum correlation coefficient was found between
The monthly time series of precipitation in the Rasht and Gorgan precipitations at the current time step and that in time step t − 1
stations for the period of 1976–2013 are used to train the following and it was 0.33 for the Rasht station and 0.17 for the Gorgan
models: conventional SARIMA, ANN, ISARIMA based on clus- station. In this study, a back-propagation neural network algorithm
tering analysis and regression analysis, improved ANN based on is used. Modeling steps such as selection of the optimal number of
clustering and regression analysis (IANN), and the proposed hybrid neurons in the hidden layer and types of the activation functions are
of ISARIMA and ANN based on clustering analysis (ISARIMA- necessary to appropriately train the ANN model. The activation
ANN). The performance of these models in forecasting precipita- functions that are used in this research include logistic sigmoid,
tion is assessed over the period of 2014–2016. tangent sigmoid, and pure linear. The performance criteria to obtain
the optimal number of the neurons in the hidden layer are shown in
Fig. 4. The optimal number of neurons in the hidden layer is found
SARIMA Model to be 7 and the activation functions of hidden and output layers are
Testing randomness and seasonal autocorrelation of time series found to perform better with tangent sigmoid and logistic sigmoid
is the first step in developing a statistical model. The precipitation functions.
time series over the training period and autocorrelation functions
(ACFs) are plotted in Fig. 3. The seasonal cycle in time series is
ISARIMA and IANN
repeated every 12 months as shown with ACF values [Figs. 3(b–f)].
The crossing of ACF spikes outside of the confidence limit The improved versions of the ANN and SARIMA models are ob-
[Figs. 3(b–f)] indicates that the precipitation time series in the tained when these models are applied to different clusters with sim-
two study stations are not random. The normality test shows that ilar precipitation patterns instead of a single model for the entire time
the time series are not normally distributed. Therefore, time series series. In fact, the ANN and SARIMA models are not directly ap-
must be transformed to a normal distribution and normally distrib- plied to time series, but they are applied to the statistics of each clus-
uted time series are used to develop the models. ter. The Ward clustering method discussed in the “Methodology”
150
500
Monthly precipitation [mm]”
100
300
200
50
100
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
0
0 100 200 300 400 0 100 200 300 400
(a) Month (1976−2013) (d) Month (1976−2013)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
ACF
ACF
0.4
0.2
0.2
0.0
0.0
−0.2
−0.2
0 5 10 15 20 25 0 5 10 15 20 25
(b) Lag (Month) (e) Lag (Month)
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20
0.3
0.2
Partial ACF
Partial ACF
0.1
0.0
−0.1
−0.2
0 5 10 15 20 25 0 5 10 15 20 25
(c) Lag (Month) (f) Lag (Month)
Fig. 3. Monthly precipitation over the period of 1976–2013 and autocorrelation function (ACF) and partial autocorrelation function (PACF)
values for different lag times for (a–c) Rasht station; and (d–f) Gorgan station. The horizontal lines in ACF and PACF plots represent the 95%
confidence intervals. Not all of the autocorrelation values are within the confidence intervals. This shows that these time series are not random
and have a seasonal cycle.
section is applied to classify the monthly precipitation as shown in dendrogram by the number of vertical lines in the dendrogram
Fig. 5. A dendrogram is a tree diagram that presents the distance cut by a horizontal line that can transverse the maximum distance
and sequence at which the observations are clustered (Belgaman vertically without intersecting a cluster. The cutting point is based
et al. 2017). The number of clusters can be chosen from the on an accrued distance. Based on the clustering analysis and the
7
1.2
Distance - Rasht
6.5
0.8
0.4 6
0 5.5
2 3 4 5 6 7 8 9
(a) number of neuron
5
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
70
(a) Jan Oct May Jun Nov Feb Mar Apr Aug Jul Dec Sep
RMSE
68
Error criteria
66
7.5
64
7
Distance - Gorgan
62
6.5
60
2 3 4 5 6 7 8 9
(b) number of neuron 6
Fig. 4. Identifying the optimal number of neurons in the hidden layer 5.5
of the ANN model based on assessment measures of (a) RRMSE,
MRE; and (b) RMSE in the Rasht station.
5
(b) Jun Oct Feb Aug Jan Dec Sep Mar Jul May Apr Nov
dendrogram in Fig. 5, four clusters of monthly precipitation for the Fig. 5. Dendrogram of monthly precipitation in (a) Rasht; and
Rasht station (Cluster 1 = January, October; Cluster 2 = June, (b) Gorgan stations. The y-axis shows the Euclidean distance.
November, May; Cluster 3 = July, December, September; and
Cluster 4 = February, March, April, August) and three clusters for
the Gorgan station (Cluster 1 = February, June, August, October;
Cluster 2 = January, September, December; and Cluster 3 = functions in February from Cluster 1 and January from Cluster 2.
April, March, May, July, November) are obtained. After clustering Minimum RMSE was found when activation functions of tangent
monthly precipitation, the statistics of each cluster, including maxi- sigmoid-tangent sigmoid for the January cluster and logistic
mum, minimum and truncated mean, are calculated. Based on clus- sigmoid-tangent sigmoid for the February cluster are chosen.
tering analysis, the number of maximum, minimum, and truncated The RMSE score varies substantially with the selection of activa-
mean time series is 4, 4, and 4 in the Rash station and it is 3, 3, tion function and ranges from 19 to 53 mm (Fig. 6). This shows the
and 3 in the Gorgan station. In the next step, 12 LR models are importance of activation function to be optimally selected for
developed for each station with the main statistics of each cluster reducing the modeling error. The autocorrelation of residuals,
as independent variables (model inputs) and monthly precipitation which can display the errors in the timing of peaks or systematic
of each cluster as dependent variables (model outputs). More spe-
overestimation and underestimation are plotted in Fig. 7 for the
cifically, 4, 3, and 5 LR models for three clusters in the Gorgan
hybrid model. The ACF spikes are within the confidence limit,
station, and 2, 3, 3, and 4 LR models for four clusters in the Rasht
therefore the residuals are random in nature. This indicates that the
station are obtained based on the cluster statistics. Then, ARIMA
hybrid model passes the diagnostic check.
and ANN are used to estimate the main statistics of each cluster in
The hybrid model had minimum values of RMSE, RRMSE,
ISARIMA and IANN models, respectively. For example, ARIMA
MSE, MRE, and MAE (Table 1). The model with the smallest
(1, 0, 3) and ARIMA(3, 0, 3) for time series of minima in the
MSE has the smallest magnitude of error and the smallest MAE
first cluster and maxima in the third cluster are obtained for the
has the smallest average of error magnitudes. The goodness-of-
Gorgan station. The established LR models are finally applied to
the estimated statistics in each cluster to back-estimate forecasted fit for high precipitation values is better reflected in the RMSE
precipitation values in the validation period. score (Pannakkong et al. 2016; Hamidi et al. 2015). Fig. 8 shows
the observed and forecasted precipitation time series for all of the
models used in this study for the validation period (2014–2016).
Hybrid of ISARIMA-ANN Based on Clustering Analysis The hybrid model was able to obtain better forecasts than the other
A simple ANN is used to simulate monthly precipitation and models and the R2 of the fitted line in the hybrid model has the
statistics of each cluster. In fact, ANN inputs are the statistics of maximum value as that it increased 59% in the Rasht station and
each cluster. To find the optimal activation functions of the hidden 40% in the Gorgan station relative to that for the ISARIMA model.
and output layers, combinations of multiple functions are tested. The comparison of the SARIMA and hybrid models in forecasting
The error criteria are shown in Fig. 6 for different activation precipitation in each month is shown in Fig. 9. The RRMSE values
40
RMSE 30
20
10
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
0
logs- logs logs-purl logs-tangs purl-logs purl-purl purl-tangs tangs-logs tangs-purl tangs-tangs
Type of activation function
Fig. 6. Sensitivity of ANN performance in forecasting monthly precipitation to the type of activation function at the Gorgan station.
are large from May to August in the Rasht station, especially for the performance with the clustering analysis was investigated and
SARIMA model. The hybrid model decreases RRMSE in the Rasht we found that the clustering analysis can improve the performance
station to 1.8 in May and below 0.7 in other months [Fig. 9(a)]. The of the ANN and ARIMA forecast models with dividing data into
minimum RRMSE of the forecasts by the hybrid model is calcu- groups with similar characteristics. The smallest error criteria and
lated in February for the Rasht station [Fig. 9(a)] with a long-term largest AI values were obtained for the hybrid of the ISARIMA and
average precipitation of 117 mm [Fig. 9(b)], and in October for the ANN models based on the clustering analysis. For example, the
Gorgan station [Fig. 9(a)] with a long-term average precipitation of hybrid model improved the RRMSE and MRE scores from 0.67
55 mm [Fig. 9(b)]. To assess the model performance in forecasting to 0.50 and from 1.47 to 1 relative to the SARIMA model scores
seasonal precipitation, which is very crucial for the water resources in the Rasht station (Table 1). The outperformance of ANN, when
management and agricultural practices in the study region, the compared with the linear regression model, is well established in
monthly time series in the validation period were aggregated to ob- the literature (e.g., Soares dos Santos et al. 2016; Tosun et al. 2016;
tain the seasonal time series (Fig. 10). The seasonal precipitation Rasouli et al. 2012; Hung et al. 2009). Soares dos Santos et al.
forecasted by the hybrid model has the smallest error and largest AI (2016) suggested that ANN can represent nonlinear processes that
score and the seasonal precipitation forecasted by the SARIMA might not be captured by the linear regression model. Also, the
model has the largest error and smallest AI score (Fig. 10). The ANN model can be more descriptive than the linear regression
maximum values of observed and forecasted precipitation by the model. The empirical relationships between predictors and predic-
hybrid model were 597 and 584 mm in the Rasht station and 382 tands in the linear regression model are assumed to be known while
and 386 mm in the Gorgan station, which occurred in autumn. in the reality atmospheric processes are complex and we cannot
detect the exact relationships between the variables. Therefore,
linear regressions have insufficient capacity to forecast phenomena
Discussion with a nonlinear nature, e.g., precipitation (Adamowski and
Karapataki 2010). The results in this paper showed that the hybrid
Coupling clustering analysis and linear regression to SARIMA model outperformed the other models and that coupling the
and ANN models improved the simulation accuracy, and the RMSE ARIMA model to the hybrid model successfully improved the fore-
score decreased 21% by ISARIMA relative to SARIMA and 11% cast skills even better than those reported in previous studies
by IANN relative to ANN for the Rasht station and 5% and 15% by (e.g., Zeynoddin et al. 2018; Pannakkong et al. 2016; Wang et al.
ISARIMA and IANN, respectively, for the Gorgan station. The 2014, 2015). One advantage of ANN is its capacity to incorporate
improvements of the ISARIMA and IANN models prove that temporal variability, concurrent values, and various predictive val-
the clustering analysis on subseasonal precipitation has an impor- ues as input without any additional effort (Soares dos Santos et al.
tant role in the accuracy of the forecasts in the study areas. The 2016). The minimum values of RRMSE in the Rasht and Gorgan
clustering analysis extracts more information from data, and there- stations belong to February (0.17) and March (0.13), which are
fore increases the forecast accuracy. The number of clusters is similar to the results reported by Wang et al. (2014). The scatter
chosen by the dendrogram of the observations (Fig. 5). If the lo- diagram of observed and simulated precipitation in Fig. 8 shows
cation of a cutting point, a horizontal line in the dendrogram, moves that monthly precipitation values are overestimated slightly by
up (e.g., Euclidian distance of 7–7.5) (Fig. 5), the number of clus- the hybrid model and substantially by the other models. The pre-
ters becomes low, leading to challenges in classification; however, diction interval for the hybrid model is provided by the shaded area
when the location of a cutting point moves down (e.g., Euclidian around the precipitation forecast (Fig. 8). Wang et al. (2014)
distance of 5) (Fig. 5), the number of clusters becomes large, lead- showed an overestimation of precipitation, especially in October,
ing to complexity in finding months with similar precipitation pat- and similarly, in this study, precipitation is slightly overestimated
terns. The performance of improved models is consistent with the in the Rasht and Gorgan stations in the validation period. The maxi-
study of Wang et al. (2014) for the precipitation forecast. Wang mum and minimum values of the observed Rasht precipitation time
et al. (2014) used only the ARIMA model but in this study, in series in the validation period occurred in October and June in
addition to the ARIMA model, the improvement of ANN 2014, which are consistent with maximum and minimum values
1.0
0.8
0.8
0.6
0.6
0.4
0.4
ACF
ACF
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
0.2
0.2
0.0
0.0
−0.2
−0.2
(a) 0 5 10 15 (c) 0 5 10 15
0.3
0.3
0.2
0.2
0.1
0.1
Partial ACF
Partial ACF
0.0
0.0
−0.1
−0.1
−0.2
−0.2
−0.3
−0.3
2 4 6 8 10 12 14 2 4 6 8 10 12 14
(b) Lag (Month) (d) Lag (Month)
Fig. 7. Autocorrelation and partial autocorrelation functions of the residuals of the hybrid model in (a and b) Rasht; and (c and d) Gorgan stations.
Both ACF and PACF values are within the 95% confidence intervals, which show the randomness of the residuals.
of precipitation forecasted by the hybrid model. The maximum 39 mm [Fig. 9(b)]. The RMSE scores in the Rasht station are
monthly precipitation was observed in October 2014 (370 mm) relatively smaller in the wet season (October to January) than
for the Rasht station and in March 2014 (99 mm) for the Gorgan spring and summer months [Fig. 9(a)]. This shows that precipi-
station. The maximum observed and forecasted precipitation val- tation forecast in wet months is more skillful than the relatively
ues are in good agreement. The maximum RRMSE is calculated dry months, especially in the Rasht station with a humid climate.
in May for the Rasht station [Fig. 9(a)] with a long-term average This is further evidence that the proposed hybrid model performs
precipitation of 48 mm [Fig. 9(b)], and in May for the Gorgan better than the SARIMA model. The forecast accuracy measures
station [Fig. 9(a)] with a long-term average precipitation of also show that the ANN model, in comparison to the SARIMA
500
Obs
Downloaded from ascelibrary.org by MCMASTER UNIVERSITY on 09/29/19. Copyright ASCE. For personal use only; all rights reserved.
400 SARIMA
ISARIMA
300 ANN
IANN
200
Hybrid
100
Precipitation [mm]
(a)
120
100
80
60
40
20
0
2014 A J N 2015 A J N 2016 A J N
(b)
400 100
SARIMA
Forecasted precipitation [mm]
ISARIMA
ANN 80
300
IANN
Hybrid 60
200
40
100
20
0 0
0 100 200 300 400 0 20 40 60 80 100
Observed precipitation [mm]
(c) (d)
Fig. 8. Forecasted monthly precipitation in (a and c) Rasht; and (b and d) Gorgan stations for the period of January 2014 through December 2016. The
shaded areas in (a and b) show the prediction interval for the proposed hybrid model. For computing the prediction interval, a Gaussian distribution
with conditional mean equal to the hybrid model predictions and constant variance equal to the variance of the prediction residuals are used.
model, has smaller errors (Table 1). Better performance of the Conclusion
ANN model relative to the ARIMA model is reported in the lit-
erature (e.g., Sulaiman and Wahab 2018; Ouyang et al. 2016; Despite improvements in theory and computational capacities, pre-
Kumar Nanda et al. 2013; Somvanshi et al. 2006). Nonlinear cipitation forecast still remains one of the most challenging tasks in
characteristics of the ANN models allow them to capture nonlin- water resources management and planning. One of the widely used
ear atmospheric processes better than the linear ARIMA model statistical models to forecast precipitation is SARIMA, which
(Sulaiman and Wahab 2018). combines the values and noise so that the forecast values are very
SARIMA - Gorgan 11% by IANN relative to ANN in the Rasht station. The clustering
200 Hybrid - Gorgan 200 analysis improves the performance of the forecast models by
SARIMA - Rasht searching analogous precipitation patterns over the long observa-
Mean (1976-2016)
150 Hybrid - Rasht 150 tion period (1976–2013) and grouping them into clusters with sim-
Mean - Gorgan ilar characteristics. When ISARIMA is coupled to ANN in the
RMSE
Mean - Rasht
100 100 hybrid model, the forecast skill is further improved and the RMSE
score of the forecasted precipitation against the observations de-
50 50
creased by 48% for the Rasht station and 24% for the Gorgan sta-
tion relative to the SARIMA forecasts. This is because of the added
value of the ANN model by capturing temporal variability. Appli-
0 0
(b) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
cation of the ANN model, instead of a regression model, captures
the nonlinear relationships between main statistics of monthly pre-
Fig. 9. Performance of the SARIMA and hybrid models in monthly cipitation clusters and associated precipitation intensities. More
precipitation forecasting in the validation period with two statistical skillful monthly precipitation forecasts were obtained from the hy-
measures: (a) RRMSE; and (b) RMSE. brid model in comparison to the other statistical models, which are
reflected in the forecast skill of the seasonal precipitation. This sug-
gests that the hybrid model can be reliably used for forecasting pre-
cipitation and other complex phenomena in similar climates and
regions and across temporal scales and for better management
of water resources and agricultural activities.
RRMSE-Gorgan
1.4 MRE-Gorgan Acknowledgments
AI-Gorgan
1.2
RRMSE-Rasht
The research was funded by Azarbaijan Shahid Madani University.
1 MRE-Rasht
Kabir Rasouli was supported by the Natural Sciences and Engi-
Error criteria
AI-Rasht
0.8
neering Research Council, NSERC’s Postdoctoral fellowship.
The manuscript benefitted from the comments and suggestions
0.6 of the editor and the anonymous reviewers on a previous version.
0.4
0.2
References
0
(a) SARIMA ISARIMA ANN IANN ISARIMA-ANN Adamowski, J., and C. Karapataki. 2010. “Comparison of multivariate
regression and artificial neural networks for peak urban water-demand
250 forecasting: Evaluation of different ANN learning algorithms.” J. Hy-
drol. Eng. 15 (10): 729–743. https://doi.org/10.1061/(ASCE)HE.1943
200 RMSE-Gorgan -5584.0000245.
Belgaman, H. A., K. Ichiyanagi, R. Suwarman, M. Edvin Aldrian, A. I. D.
MAE-Gorgan
Utami, and D. A. Kusumaningtyas. 2017. “Characteristics of seasonal
Error criteria
150 RMSE-Rasht precipitation isotope variability in Indonesia.” Hydrol. Res. Lett. 11 (2):
MAE-Rasht 92–98. https://doi.org/10.3178/hrl.11.92.
100 Bezdek, J. C. 1981. Pattern recognition with fuzzy objective function algo-
rithms. New York: Plenum Press.
50 Blashfield, R. K. 1976. “Mixture model tests of cluster analysis: Accuracy
of four agglomerative hierarchical methods.” Psychol. Bull. 83 (3):
377–388. https://doi.org/10.1037/0033-2909.83.3.377.
0
(b) SARIMA ISARIMA ANN IANN ISARIMA-ANN Box, G. E. P., and G. M. Jenkins. 1976. Times series analysis-forecasting
and control. Englewood Cliffs, NJ: Prentice-Hall.
Fig. 10. Forecast accuracy of seasonal precipitation based on Caesar, L. K., O. M. Kvalheim, and N. B. Cech. 2018. “Hierarchical cluster
(a) RRMSE, MAE, and AI; and (b) RMSE and MRE for the Gorgan analysis of technical replicates to identify interferents in untargeted
mass spectrometry metabolomics.” Anal. Chim. Acta 1021 (Aug):
and Rasht stations.
69–77. https://doi.org/10.1016/j.aca.2018.03.013.
Bayesian copula.” Water Resour. Res. 52 (3): 2327–2349. https://doi Wang, H. R., C. Wang, X. Lin, and J. Kang. 2014. “An improved ARIMA
.org/10.1002/2015WR018525. model for precipitation simulations.” Nonlinear Processes Geophys.
Sarkodie, S. A., and V. Strezov. 2018. “Assessment of contribution of 21 (6): 1159–1168. https://doi.org/10.5194/npg-21-1159-2014.
Australia’s energy production to CO2 emissions and environmental deg- Wang, P., and Y. Yao. 2018. “CE3: A three-way clustering method based on
radation using statistical dynamic approach.” Sci. Total Environ. mathematical morphology.” Knowledge-Based Syst. 155 (Sep): 54–65.
639 (Oct): 888–899. https://doi.org/10.1016/j.scitotenv.2018.05.204.
https://doi.org/10.1016/j.knosys.2018.04.029.
Sfetsos, A., and C. Siriopoulos. 2004. “Combinatorial time series forecast-
Wang, W. C., K. W. Chau, D. M. Xu, and X. Y. Chen. 2015. “Improving
ing based on clustering algorithms and neural networks.” Neural Com-
forecasting accuracy of annual runoff time series using ARIMA based
put. Appl. 13 (1): 56–64. https://doi.org/10.1007/s00521-003-0391-y.
on EEMD decomposition.” Water Resour. Manage. 29 (8): 2655–2675.
Slater, L. J., G. Villarini, A. A. Bradley, and G. A. Vecchi. 2017. “A
https://doi.org/10.1007/s11269-015-0962-6.
dynamical statistical framework for seasonal streamflow forecasting
Ward, J. H. 1963. “Hierarchical grouping to optimize an objective func-
in an agricultural watershed.” Clim. Dyn. 1–17. https://doi.org/10
tion.” J. Am. Stat. Assoc. 58 (301): 236–244. https://doi.org/10.1080
.1007/s00382-017-3794-7.
Soares dos Santos, T., D. Mendes, and R. Rodrigues Torres. 2016. /01621459.1963.10500845.
“Artificial neural networks and multiple linear regression model using Willmott, C. J. 1981. “On the validation of models.” Phys. Geogr. 2 (2):
principal components to estimate rainfall over South America.” Nonlin- 184–194. https://doi.org/10.1080/02723646.1981.10642213.
ear Processes Geophys. 23 (1): 13–20. https://doi.org/10.5194/npg-23 Yolcu, U., E. Egrioglu, and C. H. Aladag. 2013. “A new linear and non-
-13-2016. linear artificial neural network model for time series forecasting.”
Somvanshi, V. K., O. P. Pandey, P. K. Agrawal, N. V. Kalankerl, M. R. Decision Supp. Syst. 54 (3): 1340–1347. https://doi.org/10.1016/j.dss
Prakash, and C. Ramesh. 2006. “Modelling and prediction of rainfall .2012.12.006.
using artificial neural network and ARIMA techniques.” J. Ind. Geo- Zeynoddin, M., H. Bonakdari, A. Azari, I. Ebtehaj, B. Gharabaghi, and
phys. Union. 10 (2): 141–151. H. R. Madavar. 2018. “Novel hybrid linear stochastic with non-linear
Sulaiman, J., and S. H. Wahab. 2018. Heavy rainfall forecasting model extreme learning machine methods for forecasting monthly rainfall a
using artificial neural network for flood prone area. New York: tropical climate.” J. Environ. Manage. 222 (Sep): 190–206. https://doi
Springer. .org/10.1016/j.jenvman.2018.05.072.
Sun, H., and M. Koch. 2001. “Case study: Analysis and forecasting of Zhang, F., Z. Zhang, P. Zhang, and S. Wang. 2018. “UD-HMM: An un-
salinity in Apalachicola bay, Florida, using Box-Jenkins ARIMA mod- supervised method for shilling attack detection based on hidden Markov
els.” J. Hydraul. Eng. 127 (9): 718–727. https://doi.org/10.1061 model and hierarchical clustering.” Knowledge-Based Syst. 148 (May):
/(ASCE)0733-9429(2001)127:9(718). 146–166. https://doi.org/10.1016/j.knosys.2018.02.032.
Tamilselvi, R., B. Sivasakthi, and R. Kavitha. 2015. “A comparison of Zhang, G. P. 2003. “Time series forecasting using a hybrid ARIMA and
various clustering methods and algorithms in data mining.” Int. J. neural network model.” Neurocomputing 50 (Jan): 159–175. https://doi
Multidiscip. Res. Dev. 2 (5): 32–98. .org/10.1016/S0925-2312(01)00702-0.