You are on page 1of 10

European Journal of Scientific Research ISSN 1450-216X Vol.38 No.3 (2009), pp.386-395 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ejsr.

htm

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods
Bambang Widjanarko Otok Department of Statistics, Institut Teknologi Sepuluh Nopember 60111 Surabaya, Indonesia E-mail: bw_otok@statistika.its.ac.id Suhartono Department of Statistics, Institut Teknologi Sepuluh Nopember 60111 Surabaya, Indonesia E-mail: bw_otok@statistika.its.ac.id Abstract The aim of this research is to find the best method to most rainfall index data in Indonesia by comparing the forecast accuracy among ARIMA, ASTAR, Single-input Transfer Function, and Multi-input Transfer Function models. Three location of rainfall data at East Java are used as case study, i.e. Ngale, Karangjati, and Mantingan. In this research, Seasonal ARIMA, as the appropriate type for rainfall index data, is used. Three kinds of ASTAR models are used. Single-input Transfer Function model use Dipole Mode Index (DMI) and Sea Surface Temperature (SST) as the input one by one, and Multi-input Transfer Function model use these inputs simultaneously in the model. The results show that multi-input transfer function model yields better forecast at in-sample data in Ngale and Karangjati. The comparison of forecast accuracy at out-sample data show that singleinput transfer function model yields better forecast at these locations (Ngale and Karangjati). For rainfall data in Mantingan the best model is ASTAR model both in insample and out-sample data.

Keywords: ASTAR, ARIMA, Transfer function, rainfall data

1. Introduction
Climate is one of contributing factors of harvest failure in Indonesia, especially extreme climate (ElNino Southern Oscillation or ENSO). In El-Nino years, Indonesia usually gets long drought because of the decrease of rainfall under normal value. Vice versa, in La-Nina years, rainfall has value above normal so that in some regions there are floods. El-Nino year observations in 1994 and 1997 show that field cumulative area having drought from May to August is more than 400,000 ha, while in normal and La-Nina years it is less than 75,000 ha. In La-Nina year 1995, flood cumulative area from October to December reached 250,000 ha, while in normal and El-Nino years it is usually less than 100,000 ha [2]. Boer [3] says that rice loss due to the drought and flood especially in extreme climate years could reach 2 million tons. One of the approaches which could solve this problem is through tactical approach. Tactical approach is an anticipation effort through development of reliable season forecasting technique and

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods

387

method, and also through various model and data application [18]. Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) has done ten-days rainfall prediction for the use of farming. This prediction is used to determine the beginning and the end of dry and rainy seasons. This information is used to determine when rice planting should be started. At recent days, technology of season and climate forecasting has been developed in Indonesia. Usually model for weather forecasting uses deterministic approach, while model for season and climate forecasting, such as ENSO or monthly rains, often uses stochastic approach or statistics model [9]. Some stochastic models developed in Indonesia are time series model (ARIMA, Winters-additive), Fourier regression, fractal analysis, trend surface analysis, neural network, and Kalman Filter [1,4,7,8,24]. The use of deterministic (dynamic) model is still on the model significance level. Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) now has used 3 methods to forecast climate, i.e. ARIMA, wavelet transformation, and Adaptive Neuro-Fuzzy Inference System or ANFIS [10,11,17]. Statistics models that recently developed for climate forecasting have not given satisfying results. There are some factors presumed as cause of the low of forecast accuracy. i.e.: Data provided is not adequate (the period is short), Developed method is good at certain places (same method is not generally used in all places), The reliable model period is usually short, and Most methods for climate forecasting exclude other climate indicator variables (the method is univariate forecasting method). Therefore, in this paper a new method is developed to solve these problems. The developed methods are Adaptive Splines Threshold Autoregressive (ASTAR) and Single Input Transfer Function. Both methods will be compared to ARIMA model to understand the advantages of the developed models.

2. Forecasting Methods
There are many quantitative forecasting methods based on time series approach. In this section, some forecast methods used in this research, such as ARIMA model and Neural Network, will be explained concisely. 2.1. ARIMA Model One of time series models which is popular and mostly used is ARIMA model. Based on Wei [23], autoregressive (AR) model shows that there is a relation between a value in the present (Zt) and values in the past (Zt-k), added by random value. Moving average (MA) model shows that there is a relation between a value in the present (Zt) and residuals in the past ( at k with k = 1,2,). ARIMA(p,d,q) model is a mixture of AR(p) and MA(q), with a non-stationery data pattern and d differencing order. The form of ARIMA(p,d,q) is p ( B)(1 B) d Z t = q ( B)at (1) where p is AR model order, q is MA model order, d is differencing order, and p ( B) = (1 1 B 2 B 2 ... p B p ) , . Generalization of ARIMA model for a seasonal pattern data, which is written as ARIMA(p,d,q)(P,D,Q)s, is [16] (2) p ( B ) P ( B s )(1 B ) d (1 B s ) D Z t = q ( B ) Q ( B s )a t where s is seasonal period,

q ( B ) = (1 1 B 2 B 2 ... q B q )

388
P ( B s ) = (1 1 B s 2 B 2 s ... P B Ps ) , and
Q ( B s ) = (1 1 B s 2 B 2 s ... Q B Qs ) .

Bambang Widjanarko Otok and Suhartono

There are some methods which are used to estimate the ARIMA model parameter. Examples given are Maximum Likelihood (ML) estimation and Conditional Least Square (CLS). The complete explanation of MLE, CLS, and other parameter estimation methods can be read in Box et al. [5] or Cryer and Chan [6]. ARIMA model has some residual assumptions that should be fulfilled. The test of the assumptions is often called diagnostic checking. The assumptions are white noise and normal distribution [23]. If the ARIMA model has satisfied both assumptions, the model could be classified as a good model. 3.2. Transfer Function Model Transfer function model is different from ARIMA model. ARIMA model is univariate time series model, but transfer function is multivariate time series model. This means that ARIMA model relates the series only to its past. Besides the past series, transfer function model also relates the series to other time series. Transfer function models can be used to model single-output and multiple-output systems [8]. In the case of single-output model, only one equation is required to describe the system. It is referred to as a single-equation transfer function model. A multiple-output transfer function model is referred to as a multi-equation transfer function model or a simultaneous transfer function (STF) model (see [6, 9, 10, 15]). A more complete description of modeling and forecasting using multi-equation models can be found in Liu [7]. A single-equation transfer function model may contain more than one input variable, as in multiple regression models. In this paper, both single-input and multi-input transfer function model is applied, so that single-input and multi-input transfer function model are discussed. Assuming that input and output series are both stationary, the general form of a single-input transfer function model is as follows [14]. ( B) b Zt = C + s B X t + Nt (3) r ( B) ( B) where N t = at , ( B)

s (B) = 0 + 1B + 2 B2 + ...+ s Bs ,
r ( B ) = 1 + 1 B + 2 B 2 + ... + r B r ,
( B ) = 1 1 B 2 B 2 ... q B q , and ( B ) = 1 1 B 2 B 2 ... p B p .

Multi-input transfer function model is easily extended from the single-input transfer function model. Assuming that we have m input variables in the system, the multi-input transfer function model can written as ( B ) b1 ( B) b 2 ( B ) bm Z t = C + s1 B X 1t + s 2 B X 2 t + ... + sm B Xmt + N t (4) r1 ( B ) r 2 ( B) rm ( B) ( B ) bi where the rational transfer function si B for each input variable has the form defined in Equation ri ( B ) (3). Transfer function model building can be seen in Wei [14].

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods 3.3. ASTAR Model

389

Previous methods, ARIMA and transfer function models, are common used as linear models. However, more frequently than not, nonlinear time dependent systems abound that are not adequately handled by linear models. The use of linear models during the analysis of these nonlinear systems may require invalid assumptions that could lead to erroneous or misleading conclusions. For these systems we need to consider general classes of nonlinear models that readily adapt to the precise form of a nonlinear system of interest [19,21]. Threshold time series models (models with partition points) are a class of nonlinear models that emerge naturally as a result of changing physical behavior. Within the domain of the predictor variables, different model forms are necessary to capture changes to the relationship between the predictor and response variables. Tong [20] provides one threshold modeling methodology for this behavior (TAR - Threshold Autoregression) that identifies piecewise linear pieces of nonlinear functions over disjoint subregions of the domain D of the time series {Zt}, i.e., identify linear models within each disjoint subregion of the domain. One application of Tong's threshold modeling methodology is for nonlinear systems thought to possess periodic behavior in the form of stationary sustained oscillations (limit cycles). Tong's threshold methodology has tremendous power and flexibility for modeling of many times series. However, unless Tong's methodology is constrained to be continuous, it creates disjoint subregion models that are discontinuous at subregion boundaries. By letting the predictor variables be lagged values of a time series, one admits a more general class of continuous nonlinear threshold models than permitted by Tong's TAR approach. The methodology for developing this class of nonlinear threshold models is called ASTAR (Adaptive Spline Threshold Autoregression). The fact that one obtains a more general class of continuous nonlinear threshold models can be shown using a simple example. Let Zt for t = 1,.., N, be a time series we wish to model with ASTAR using, for example, p = 3 lagged predictor variables namely, Zt-1,Zt-2 and Zt-3. Each forward step of the ASTAR algorithm selects one and only one set of new terms for the ASTAR model from the candidates specified by previously selected terms of the model. Let the predictor variables in MARS for the rth value in a time series {Zt} be Zt-1, Zt-2, , Zt-p, which we represent as Z tp 1 . The functional form of the ASTAR model that estimates Zt is
= c K Z p Z t j j t 1
j =1 S

(5)

is an additive function of the product spline basis functions. The functional form of the where Z t ASTAR model in Equation (5) may be expanded using the ordered sequences of truncated spline functions that define each product spline basis function. Let a and b be dummy variables that index the ordered sequence of truncated spline functions such that 0 _< a < b < j. The functional form of the ASTAR model in Equation (5) for the t-th value in a time series {Zt) using this expansion is

= c Z t j
j =1

t a ,rb K j

[sgn v (Z t v t )]+

(6)

p where the argument, Z tp 1 and K j ( Z t 1 ) is suppressed for simplicity. Also rb = (v, t ), and sgn v is the

sign of v that determines a left (-v) or right (+v) truncated spline function.

4. Empirical Results
Rainfall index in 3 different locations are used as the case study. The data is taken from Badan Meteorologi, Klimatologi, dan Geofisika (BMKG). The locations are Ngale, Karangjati, and Mantingan. The data is monthly from January 1989 to December 2008. The input variables used in single-input transfer function model is Dipole Mode Index (DMI) and Sea Surface Temperature (SST). The results from each methods, ARIMA, Transfer Function model, and ASTAR model, are described

390

Bambang Widjanarko Otok and Suhartono

one by one. Then, the comparison of the models is done to find the best model for each region. The model selection is based on RMSE, of both training and testing data. Time series plots of rainfall index data for each region are shown in Figure 1. Based on BoxJenkins procedure, the beginning step is identification by using autocorrelation function and partial autocorrelation function plots.
Figure 1: Time series plot of rainfall index data in (a) Ngale, (b) Karangjati, and (c) Mantingan

(a)
Time Series Plot of Rainfall Index in Ngale
700 600 500 Rainfall Index 400 300 200 100 0 Month Jan Year 1989 Jan 1992 Jan 1995 Jan 1998 Jan 2001 Jan 2004 Jan 2007

(b)
Time Series Plot of Rainfall Index in Karangjati
1400 1200 1000 Rainfall Index 800 600 400 200 0 Month Jan Year 1989 Jan 1992 Jan 1995 Jan 1998 Jan 2001 Jan 2004 Jan 2007

(c)

Time Series Plot of Rainfall Index in Mantingan


800 700 600 Rainfall Index 500 400 300 200 100 0 Month Jan Year 1989 Jan 1992 Jan 1995 Jan 1998 Jan 2001 Jan 2004 Jan 2007

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods

391

Based on ACF and PACF plots, the ARIMA model prediction is summarized in Table 1. Table 1 shows that different region has different ARIMA model prediction. It is understandable that each region have different model due to different condition and weather. Transfer function models are built for each region. Single-input transfer function models are chosen in order to know which factor contributes best in each region. There are 5 factors used in this research, i.e. Sea Surface Temperature (SST of NINO 1.2, NINO 3, NINO 4, and NINO 3.4) and Dipole Mode Index (DMI).
Table 1:
Region Ngale Karangjati Mantingan ARIMA(0,1,1)12 ARIMA([32,47],0,0)(1,1,0)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA([1,11,14,15],0,0)(1,1,0)12 ARIMA(0,1,1)12 ARIMA(3,1,0)12

ARIMA Model for Each Region


ARIMA Model

Transfer function model building steps are followed [23]. The result shows that the transfer function model order varies in different factors for each region. This is to find the best transfer function model that satisfies its assumptions, such as white noise process of the residual. The summary of the best single-input transfer function model and its order is shown in Table 2.
Table 2:
Region

Single-Input Transfer Function Model for Each Region


Input NINO 1.2 NINO 3 NINO 4 NINO 3.4 DMI NINO 1.2 NINO 3 NINO 4 NINO 3.4 DMI NINO 1.2 NINO 3 NINO 4 NINO 3.4 DMI b 9 9 0 8 0 0 8 10 8 0 11 11 0 0 0 s 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ARIMA Model ARIMA(1,0,0)(0,1,1)12 ARIMA(1,0,0)(0,1,1)12 ARIMA(0,1,1)12 ARIMA([1,5],0,0)(0,1,1)12 ARIMA(0,1,1)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA(0,0,[1,11,14])(0,1,1)12 ARIMA(0,1,1)12 ARIMA(0,1,1)12 ARIMA(0,1,1)12 ARIMA(0,1,1)12 ARIMA(0,1,1)12

Ngale

Karangjati

Mantingan

Multi-input transfer function model is built based on single-input transfer function model. The input variables are entered simultaneously. The insignificant input variable is excluded from the model. Multi-input transfer function model for Mantingan is not built because 4 input variables are excluded so that single-input transfer function is built.

392
Table 3:
Region

Bambang Widjanarko Otok and Suhartono


Multi-Input Transfer Function Model for Each Region
Input NINO 1.2 NINO 3 NINO 4 NINO 3.4 DMI NINO 1.2 NINO 4 DMI NINO 3 NINO 4 NINO 3.4 DMI NINO 3 NINO 4 NINO 3.4 b 9 9 0 8 0 1 0 0 8 0 0 0 8 0 0 s 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ARIMA Model

ARIMA(1,0,0)(0,1,1)12

Ngale

ARIMA(1,0,0)(0,1,1)12

ARIMA(0,0,[11,14])(0,1,1)12

Karangjati

ARIMA(1,0,[11,14)(0,1,1)12

ASTAR model is similar to ARIMA model in case of other input exclusion. This model only represents the relation between the present value and the past value(s). In this research, we try 3 different approaches, i.e. ASTAR model without interaction, with 2 interactions, and with 3 interactions among the past values. Besides that, variables of lag 1 to lag 12 are used as the initial predictors, which will be selected in order to get the best ASTAR model. The summary of ASTAR models and its predictors is reviewed in Table 4.
Table 4:

ASTAR Model for Each Region


Interaction No Interaction 2 Interactions 3 Interactions No Interaction 2 Interactions 3 Interactions No Interaction 2 Interactions 3 Interactions Input Yt-1, Yt-6, Yt-11, Yt-12 Yt-1, Yt-6, Yt-11, Yt-12 Yt-1, Yt-6, Yt-11, Yt-12 Yt-1, Yt-5, Yt-6, Yt-12 Yt-1, Yt-5, Yt-6, Yt-7, Yt-12 Yt-1, Yt-2, Yt-5, Yt-6, Yt-7, Yt-12 Yt-1, Yt-6, Yt-7, Yt-10, Yt-12 Yt-1, Yt-2, Yt-6, Yt-10, Yt-12 Yt-1, Yt-2, Yt-6, Yt-10, Yt-12

Region Ngale

Karangjati

Mantingan

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods
Table 5:

393

Comparison based on Model RMSE for Each Region


Ngale RMSE In-sample Outsample 83.66 114.92 83.74 84.98 82.93 85.70 83.21 107.44 97.07 97.07 83.37 84.16 Karangjati RMSE OutIn-sample sample 140.70 282.23 146.09 298.86 138.45 284.43 131.23 275.59 135.92 272.71 135.12 273.07 138.69 277.77 131.99 292.75 131.01 289.24 128.60 292.04 128.35 279.70 128.97 289.33 Mantingan RMSE OutIn-sample sample 122.75 91.21 124.71 105.91 120.63 92.94 122.04 90.63 120.05 92.26 121.42 91.61 121.85 91.33 111.54 92.93 112.64 85.06 112.64 85.06 -

Model ARIMA I ARIMA II Transfer Function with NINO 1.2 Transfer Function with NINO 3 Transfer Function with NINO 4 Transfer Function with NINO 3.4 Transfer Function with DMI ASTAR without Interaction ASTAR with 2 Interactions ASTAR with 3 Interactions Multi-Input Transfer Function I Multi-Input Transfer Function II 103.55 111.46 102.54 102.97 102.53 101.50 101.29 98.90 99.00 99.00 95.79 98.33

Model selection in this paper is using RMSE criteria. By comparing RMSE of the model, the best model is chosen. From Table 5, we can see that based on RMSE of in-sample data Multi-Input Transfer Function model is the best model in 2 regions, i.e. Ngale and Karangjati. Multi-input transfer function model with more input variables (see Multi-input Transfer Function I) is better than multiinput transfer function model with less input variables (multi-input transfer function II). Based on RMSE of out-sample data, Single-Input Transfer Function model with Sea Surface Temperature (SST) of NINO 4 is the best model among the proposed models in the same 2 regions, i.e. Ngale and Karangjati. While in Mantingan, based on both RMSE of in-sample and out-sample data, ASTAR model with 2 interactions of the input variables is the best model.

6. Conclusion
Model for rainfall forecasting is different in each region. Multi-input transfer function model is a better model, especially in in-sample data, for rainfall prediction. More contributing input variables in MultiInput Transfer Function model might improve the forecast accuracy of a model. Ngale and Karangjati have the same characteristics, i.e. they are influenced mostly by Sea Surface Temperature (SST) of NINO 4. In-sample data of Ngale and Karangjati is best modeled by Multi-Input Transfer Function model and the out-sample data is by Single-Input Transfer Function model. Mantingan has different pattern from Ngale and Karanjati, so that both in-sample and out-sample data are best modeled by ASTAR model with 2 interactions.

394

Bambang Widjanarko Otok and Suhartono Andriansyah. 1998. Comparison of Geostatistics and Box-Jenkins Models on Monthly Rainfall Forecasting. Unpublished Bachelor Project, Department of Statistics, FMIPA, IPB, Bogor Boer, R., Pawitan, H., and June, T. 2000. Approaches for Anticipating Drought and Flood. Paper presented at Lokakarya Antisipasi Kejadian Iklim Ekstrim. Department of Agriculture, Jakarta Boer, R. 2001. Strategy to anticipate climate extreme events. Paper presented at the Training Institute on Climate and Society in the Asia-Pacific Region, 5-23 February 2001, East-West Center, Honolulu, USA Boer, R., Notodiputro, K. A., and Las, I. 2000. Prediction of daily rainfall characteristics from monthly climate indices. Proceeding of the Second International Conference on Science and Technology for the Assessment of Global Climate Change and Its Impacts on Indonesian Maritime Continent, 29 November-01 December 1999 Box, G.E.P., Jenkins, G.M., and Reissel, G.C., 1994. Time Series Analysis Forecasting and Control, 3rd edition. Prentice Hall Cryer, J.D. and Chan, K.S., 2008. Time Series Analysis. With Application in R, 2nd Edition. Springer Dupe, Z. L. 1999. Prediction Nino3.4 SST anomaly using simple harmonic model. Paper presented at the Second International Conference on Science and Technology for the Assessment of Global Climate Change and Its Impacts on Indonesian Maritime Continent, 29 November-01 December 1999 Estiningtyas, W., Ramadhani, F., and Surmaini, E. 2005. Validation of Rainfall Prediction Model using Kalman Filter Method. Proceeding of Lokakarya Nasional Forum Prakiraan, Evaluasi, dan Validasi. BMG. Hotel Nam Center Kemayoran Jakarta 15-16 Desember 2005 Gooddard, L. 2000. Current approaches to seasonal to interannual climate prediction. Paper presented at the Training Institute on Climate and Society in the Asia-Pacific Region, 5-23 February 2001, East-West Center, Honolulu, USA Ingragustari. 2005a. Rainfall prediction using Wavelet transformation. Proceeding of Lokakarya Nasional Forum Prakiraan, Evaluasi, dan Validasi. BMG. Hotel Nam Center Kemayoran Jakarta 15-16 Desember 2005 Indragustari. 2005b. Rainfall prediction using ANFIS. Proceeding of Lokakarya Nasional Forum Prakiraan, Evaluasi, dan Validasi. BMG. Hotel Nam Center Kemayoran Jakarta 15-16 Desember 2005 Liu, L.-M. 1987. Sales forecasting using multi-equation transfer function models. Journal of Forecasting, 6, 223-238 Liu. L.-M. 1997. Forecasting and Time Series Analysis Using the SCA Statistical System: Volume 2. Chicago: Scientific Computing Associates Corp. Liu, L.-M. 2006. Time Series Analysis and Forecasting, 2nd Edition. Scientific Computing Associates Corp Liu, L.-M. and Hudak, G.B. 1985. Unified econometric model building using simulation transfer function equations. Time Series Analysis: Theory and Practice, 7, 277-288 Liu, L.-M., Hudak, G.B., Box. G.E.P., Muller, M.E., and Tiao, G.C. 1983. The SCA Statistical System: Reference Manual for Forecasting and Time Series Analysis. Chicago: Scientific Computing Associates Corp. Nuryadi. 2005. Validation of Long-tern Prediction Model using ARIMA Model. Proceeding of Lokakarya Nasional Forum Prakiraan, Evaluasi dan Validasi. BMG. Hotel Nam Center Kemayoran Jakarta 15-16 Desember 2005 Peragi and Perhimpi. 1994. Discussion Panel Formulation of Drought and Long-term Anticipation. In I. Las, N. Sinulingga, R. Boer., Handoko, E. Syamsudin Dan D. Sopandi (Editor). Proceeding of Discussion Panel of Drought and Long-term Anticipation. Perhimpunan Agronomi Indonesia dan Perhimpunan Meteorologi Pertanian Indonesia

References
[1] [2]

[3]

[4]

[5] [6] [7]

[8]

[9]

[10]

[11]

[12] [13] [14] [15] [16]

[17]

[18]

Development of Rainfall Forecasting Model in Indonesia by using ASTAR, Transfer Function, and ARIMA Methods [19] [20] [21] [22] [23] [24]

395

Priestley, M. B. 1988. Non-Linear and Non-Stationary Times Series. Academic Press Tong, H. 1983. Threshold Models in Non-linear Time Series Analysis. Springer-Verlag Tong, H. 1990. Nonlinear Time Series. Oxford University Press Wall, K.D. 1976. FIML estimation of rational distributed lag structural form models. Annals of Economic and Social Measurement, 5, 53-64 Wei, W.W.S., 2006. Time Series Analysis, Univariate and Multvariate Methods, 2nd Edition, Boston: Pearson Addison Wesley Zifwen. 1999. ENSO Forecasting and ENSO Relationship to Monsoon Rainfall. Unpublished Bachelor Project, Department of Statistic, FMIPA IPB, Bogor.