You are on page 1of 5

SARIMA ( S easonal ARIMA) Implementation

on Time S eries to F orecast


The Number of Malaria Incidence
Adhistya Erna Permanasari, Indriana Hidayah, Isna Alfi Bustoni
Department of Electrical Engineering and Information Technology
Gadj ah Mada University, n. Graflka no. 2, Yogyakarta 5528 1 , Indonesia
adhistya@ugm.ac.id, indriana.h@ugm.ac.id, alfJebie@gmail.com

Abstract- The usefulness of forecasting method in predicting the Mexican Secretariat of Health between 1 988 and 20 1 1 .
number of disease incidence is important. It motivates ARIMA model was constructed to estimate onchocerciasis
development of a system that can predict the future number of
cases for two years ahead. The results reported a decreasing
disease occurrences. Fluctuation analysis of forecasting result
can be used to support the making of policy from the stake
trend of the disease over time.
holder. This paper analyses and presents the use of Seasonal The number of human incidence of Schistosoma
Autoregressive Integrated Moving Average (SARIMA) method haematobium at Niono, Mali was proj ected online by using
for developing a forecasting model that able to support and exponential smoothing method [4] . The method was used as
provide prediction number of diasease incidence in human. The a core of a proposed state-space framework. Data was
dataset for model development was collected from time series
collected from 1 7 community health center in the range of
data of Malaria occurrences in United States obtained from a
study published by Centers for Disease Control and Prevention
1 996 to 2004. The final framework could assist to manage
(CDC). It resulted SARIMA (0, 1,1)(1,1,1)1 2 as the selected and to assess the transmission and intervention impact of S.
model. The model achieved 2 1,6% for Mean Absolute haematobium.
Percentage Error (MAPE). It indicated the capability of final Three different methods were used to forecast the SARS
model to closely represent and made prediction based on the epidemic in China [5] . The existing time series was
Malaria historical dataset. computed by AR( I), ARIMA(O, I ,0), and ARMA( 1 , I). The
Keywords - disease forecasting; time series; SARIMA.
result of this study was used to monitor the dynamic of
SARS in China based on the daily data. Hence, the result
could be used to support the disease reports.
I. INTRODUCTION A Bayesian dynamic model also could be used to monitor
influenza surveillance as one factor of SARS epidemic [6] .
Disease forecasting is one of an important area of medical
This model was developed to link pediatric and adult
area. It also supports policies making by stakeholders such as
syndromic data to the traditional measures of influenza
health services and healthcare needs [ 1 ] . There is no single
morbidity and mortality. The findings showed the
approach to disease forecasting, and so various methods have
importance of modeling influenza surveillance data, and
often been adopted to forecast aggregate or specific health
recommend dynamic Bayesian Network.
conditions. Meanwhile, there are no specific methods to
The monthly data of Cutaneous leishmaniasis (CL)
match the choices of disease forecasting approaches that are
incidence in Costa Rica from 1 9 9 1 to 200 1 was computed by
often applied. The selection of an appropriate method is
using seasonal autoregressive models. This work was
important to achieve a better prediction.
studying the relationship between the interannual cycles of
Time series analysis regarding forecasting model is widely
the diseases with the climate variables using frequency and
used in various fields such as energy demand prediction,
time-frequency techniques of time series analysis [7] . This
economic field, traffic prediction, and in the health support.
model supported the dynamic link between the disease and
Indeed, predicting the number of disease incidence need to
climate.
be focused because the obtained result is needed for further
The application of additive decomposition method was
decisions.
used to predict Salmonellosis incidence in US [8] . This
Many researchers have developed different forecasting
method was selected because of the relatively constant trend
methods to predict human. The number of dengue cases in a
of the historical data. Fourteen years historical data from
population was proj ected using SARIMA model [2] . The
1 993 to 2006 was collected to compute the forecast values
model was developed based on the reported monthly
until 12 months-ahead.
incidence of dengue from 1 998 to 2008, and then it was
This paper analyses the empirical results for evaluating
validated using data in 2009. The selected model showed that
and predicting the number of zoonosis incidence by using
the monthly cases could be predicted using one, two and
Autoregressive Integrated Moving Average (ARIMA). This
twelve months prior. This model indicated that the number
model is selected because of the capability to correct the
of prediction was close to the historical data.
local trend in data, where the pattern in the previous period
Monthly number of onchocerciasis data in Mexico was
can be used to forecast the future. Thus, this model also
analysed using software R [3 ] . Data was collected from
supports the modeling of one perspective as a function of

978-1-4799-0425-9/13/$31.00 ©2013 IEEE


time [9] . Due to the seasonal trend of time series used, the Average (q) and the non-seasonal order of Autoregressive
Seasonal ARIMA (SARIMA) is selected for the model (Q). The number of order can be identified by observing the
development. sample autocorrelations (SAC) and sample partial
The remainder of the paper is structured as follows. autocorrelations (SPAC).
Section II introduces basic theory of SARIMA. Section III 2) Step 2: Estimation
describes time series data collection. Section IV reports The historical data is used to estimate the parameters of
model development. Finally, Section V presents conclusion. the tentatively model in Step I .
II. SEASONAL ARIMA (SARIMA)
3) Step 3 : Diagnostic checking
Diagnostic test is used to check the adequacy of the
This section describes the basic theory of Autoregressive tentatively model.
Integrated Moving Average (ARIMA). The general class of 4) Step 4: Forecasting
ARIMA (p,d,q) comes from three parts : d is the level of The final model in step 3 is used to forecast the forecast
differencing, p is the autoregressive order, and q is the values.
moving average order [ 1 0] . The ARIMA model is shown in
( I ) as This approach is widely used to examine the SARIMA
model because of the capability to capture the appropriate
Zt = <5 + Q)l Zt - l + Q)2 Zt -2 + . . . + Q)p Z t -p trend by examining historical pattern. The BJ methodology
+ at - 8 1 at - 1 - 8 2 at -2 - . . . - 8q at _� (I) has several advantages, involving extract a great deal of
information from the time series using a minimum number of
where ZI i s level o f differencing, the constant i s notated by parameters and the capability in handling stationery and non­
J, while ¢ is an autoregressive operator, a is a random shock stationary time series in non-seasonal and seasonal elements
corresponding to time period t, and e is a moving average [ 1 2] , [ 1 3 ] .
operator.
III. DATA COLLECTION
Seasonal ARIMA (SARIMA) is used when the time series
exhibits a seasonal variation. A seasonal autoregressive Malaria disease dataset was selected for model
notation (P) and a seasonal moving average notation (Q) will development because these incidences can be found in any
form the multiplicative process of SARIMA as country. Malaria is a mosquito-borne infectious disease came
(p,d,q)(P,D,Q)s' The subscripted letter ' s ' shows the length from a parasite. It causes symptoms including fever, chills,
of seasonal period. For example, in a hourly data time series and flu-like illness. Without further treatment, the patients
s 7, in a quarterly data s 4, and in a monthly data s 12.
= = = can progress to coma or death. WHO reported that it was
In order to formalize the model, the backshift operator (B) etimated 2 1 9 million cases of malaria occurred worldwide
is used. The time series observation backward in time by k with 660,000 death cases in 20 1 0 [ 1 4 ] .
period is symbolized by B\ such that BkYI = YI-k This study collected time series data o f Malaria
Formerly, the backshift operator is used to present a occurrences in United State for the 204 month period from
general stationarity transformation, where the time series is January 1 993 to December 2009. The data was obtained
stationer if the statistical properties (mean and variance) are from the summary of notifiable diseases in United States
constant through time. The general stationarity from the Morbidity and Mortality Weekly Report (MMWR)
transformation is presented below: that published by Centers for Disease Control and Prevention
(CDC). The original data is plotted as presented in Fig. 1 .
(2) The Original Data

where z is the time series differencing, d is the degree of


nonseasonal differencing used and D is the degree of
seasonal differencing used.
Then, the general form of SARIMA model SARIMA
(p,P,q,Q) is �

'"

i 150

(3)

George Box and Gwilym Jenkins studied the simplified


step to obtain the comprehensive information of
understanding ARIMA model and using the univariate �...-93 lan-94 JUI-9S 1... -96 Im-9 7 1..,,- 98 1....-99 1m-OO J..,Hl l J...-fl2 1...-03 J&D.-04 h.n-OS 1....-06 1...1-"4
J7 JUI-08 I
J�
ARIMA model [ 1 0] , [ 1 1 ] . The Box-Jenkins (BJ) �10}1I1h

methodology consists of four iterative steps: Figure I . Malaria Original Dataset.
1) Step 1 : Identification.
This step focus on selection of the order of regular Fig. I shows the line chart of the historical data from 1 993
differencing (d), seasonal differencing (D), the non-seasonal until 2009. The x axis represents the specific year while the y
order of Autoregressive (P), the seasonal order of axis represents the associated number of incidence. Since
Autoregressive (P), the non-seasonal order of Moving time series plot of the historical data exhibited the seasonal
variations which present similar trend every year, then selected to illustrate the plot. Fig. 2 shows SAC and SPAC
SARIMA was chosen as the appropriate approach to develop correlogram of the original data. Based on these figures, it
a model prediction. can be observed that the correlogram of time series is likely
IV. MODEL DEVELOPMENT to have seasonal cycles especially in SAC where some lags
have values > 10.2 1 . Then, it implies level non-stationary.
This following section discusses the result of BJ iterative Furthermore, three variations of differencing are applied to
steps to forecast the available dataset. the original time series. They are the regular differencing, the
seasonal differencing, the reguler and seasonal differencing
A. Step 1: Identification (Fig. 3 , 4, and 5) respectively. In these figures, x-axis shows
The first step in BJ method is time series identification. It number of lags, where the value of each lag is showed by y­
is done by plotting sample autocorrelations (SAC) dan aXIs.
sample partial autocorrelations (SPAC) based on the original
data. In this model development, four periodical data was

SAC SPAC

1.0 1.0
O.B O.B
0.6 0
0.6
.2


0.4 0.4


c ..


.2
0.2

;8
0.0 tL'-'-nTT,-ll-Lll-'--rTTlrr-LLL.......TT"TTr...Ll.LL 0.0
t L TTTTTT-.LJUj :;
-0.2 .. - 0.2
:!!

i
.. -0.4 -0.4

-0.6 - 0.6

-O.B - 0.8

-1.0 -1.0

w � w � H � � � W 15 20 25 30 35 40 45
lag lag

w �
Figure 2. (a) SAC Correlogram of the Original Data; (b) SPAC Correlogram of the Original Data.

" .

' !t •

w �
Figure 3 . (a) SAC Correlogram of Regular Diflerencing; (b) SPAC Correlogram of Regular Diflerencing.

0.; - - - - - - - - - - - - - - - - - - - - - - -

••

.. 1

••

" " " . "

10 l' . ,. • 2� . • • j(J l!

-0,3

(a) (b)
Figure 4. (a) SAC Correlogram of Seasonal Differencing; (b) SPAC Correlogram of Seasonal Differencing.
w 00
Figure 5. (a) SAC Correlogram of Regular aud Seasonal Diflerencing; (b) SPAC Correlogram of Regular aud Seasonal Differencing.

Selection of whether to use regular or seasonal C. Step 3: Diagnostic checking


differencing was based on the correlogram. In order to SAC and SP AC were calculated using residuals values of
develop ARIMA, time series should be stationary. The the original data. It yielded the autocorrelation values that
observation of correlogram Fig. 3, 4, and 4 indicates that closed to 10.21 and within the 95% confidence interval. It
more spikes are found in the regular differencing and indicated that the selected model was fit.
seasonal differencing. Then, the regular and seasonal
differencing is chosen for the model development. D. Step 4: Forecasting
The final SARIMA model (0, 1 , 1 )( I , 1 , 1 ) 12 was used to
B. Step 2: Estimation forecast the values of the 3 6 months-ahead from t205 through
This step aims to determine the suitable model based on t240 as presented in Table I. While, the whole forecasting plot
the observation on step I . To produce the model, the is shown in Fig. 6.
separated non-seasonal and seasonal model was computed Time Series Plot

first. It was followed by combining these models to describe JlO

the final model. - .. - Origiu;al Dm


_ FOflCall

Nonseasonal level: SAC cuts off at lag 1 and SPAC


)00


dies down. Then, the tentatively moving average
nonseasonal q model:
.
I
Zt = o + at - (}l at- 1 (4) i. ! i
Seasonal level: SAC dies down and SPAC dies down. �, � !!
It results the combination model of autoregressive­
moving average seasonal orde (P,Q):
Zt = 0 + Q)1,12 Zt-12 + at - (}1,12 at-12 (5)
Since the final model consists of nonseasonal and
seasonal level, then equation (4) and (5) are combined I U � n _ u u � n � W ill W m _ m � ill ill m

to get the final model as shown in equation (6) :


;\fODlb

Zt = 0 + Q)1,12 Zt- 1 2 + at - (}l at- 1 - (}1,12 at-12 (6) Figure 6. Time Series Plot of The Original Data and Forecast Values.
Rewrite (6) yields (7)
0 + at + Q)1,12 Zt- 1 2 - (}l at- 1 - (}1,12 at-12 (7)
TABLE I. THE FORECASTING RESULT.
Zt=
Prediction
Month
Values (}l at _ 1 and (}1,12 at-12 from (7) are unified to form 2010 201 1 2012
Jauuary 70.482 62. 9 1 3 56.920
the multiplicative term (8):
February 7 1 .205 65.744 59. 703
( - (}l ) ( - (}1,12 ) at-13 = (}l (}1,12 at-13 (8) March 64. 899 59.390 53.301
The form - (}l and - (}1,12 are multiplied and added the April 7 l .00l 65.443 59.306
negative numbers (- I and - 1 2) after the t in the random shock May 9 1 .320 85.714 79. 529
subscripts a. Then, the overall tentatively model become (9): June 1 04.908 99.254 93 . 020

Zt = 0 + at + Q)1,12 Zt-12 - (}l at- 1 - (}1,12 at-12 +


July 137.838 132. 1 3 5 125.853
August 1 64.604 1 58.853 1 52.522
(}l (}1,12 at-13 (9) September 141.918 1 3 6. 1 1 9 1 29.740
October 1 1 5 . 832 1 09.985 1 03 . 5 5 8
The final SARIMA model has arrived to: (0, 1 , 1 )( 1 , I, 1 ) 12 November 87. 1 1 0 8 1 .2 1 4 74.738
December 200.933 1 94.989 1 88 .466
Next, the model was used to compute the estimation
output of model coeffisient using Minitab software. It
produced the following results: MA( 1 ) = 0.956 1 ; MA(2) = -
E. Error Measures
The accuracy of the forecasting can be evaluated using
0. 1 0 1 0 ; SMA( 1 2) = 0.9 1 74 and Constant = -0.0483 1 .
error measures. It is achieved by comparing the original data
and the forecast values. In this paper, Mean Absolute
Percentage Error was used as the error measure. The result [2] E. Z. Martinez, et aI. , "A SARIMA forecasting model to predict the
number of cases of dengue in Campinas, State of Sao Paulo, Brazil,"
showed MAPE value for the selected model was 2 1 .6%. Revista da Sociedade Brasileira de Medicina Tropical, vol. 44, pp.
Thus, the empirical result indicated that the model was able 436-440, 20 1 1 .
to accurately represent the Malaria historical dataset. [3] E. E. Lara-Ramirez, et aI. , "Time Series Analysis of Onchocerciasis
Data from Mexico: A Trend towards Elimination," PioS Negl Trop
V. CONCLUSION Dis, vol. 7, p. e2033, 20 1 3 .
[4] D. C. Medina, et aI. , " State-Space Forecasting of Schistosoma
The prediction of the future incidence of disease is haematobium Time-Series in Niono, Mali," PioS Neglected Tropical
important to make a better policy. In this paper, the use of Diseases, vol. 2, pp. 1 - 1 2, 2008.
[5] D. Lai, "Monitoring the SARS Epidemic in China: A Time Series
forecasting method was applied to predict the number of Analysis," Journal ofData Science, vol. 3, pp. 279-293 , 2005.
Malaria incidence in US based on monthly data. The [6] P. Sebastiani, et aI. , "A Bayesian Dynamic Model for Influenza
adjusted model prediction was developed by using SARIMA Surveillance," Statistics in Medicine, vol. 25, pp. 1 8 03 - 1 825, 2006.
model based on the historical data. SARIMA model can be [7] L. F. Chaves and M. Pascual, "Climate Cycles and Forecasts of
Cutaneous Leishmaniasis, a Nonstationary Vector-Borne Disease,"
obtained by using four iteratively Box-Jenkins steps and PioS Medicine, vol. 3 , pp. 1320-1 328, 2006.
provide the prediction of the number of human incidence in [8] A. E. Permanasari, et aI. , "Forecasting of Zoonosis Incidence in
other zoonosis to help the stakeholder make further decisions. Human Using Decomposition Method of Seasonal Time Series," in
NPC 2009, Tronoh, Malaysia, 2009, pp. 1 -7.
The result indicate that SARIMA (0, 1 , 1 )( 1 , 1 , 1 ) 12 was the fit [9] E. S. Shtatland, et aI. , "Biosurveillance and outbreak detection using
model. The model was also be able to represent the historical the ARIMA and logistic procedures," in SUGI 31, Cary, NC : SAS
data with MAPE value 2 1 .6%. A further work is still needed Institute, Inc, 2006, pp. 1 97-3 1 .
[ 1 0] B . L . Bowerman and R . T . O'Connell, Forecasting and Time Series
to evaluate and apply other forecasting methods into the time An Applied Approach, 3rd ed. : Duxbury Thomson Learning, 1 993 .
series in order to obtain better accuracy of forecast value. [1 1] S. Makridakis and S. C. Wheelwright, Forecasting Methods and
Applications: John Wiley & Sons. Inc, 1978.
[ 1 2] J. G. Caldwell. (2006, The Box-Jenkins Forecasting Technique.
REFERENCES Available : http://www. foundationwebsite.orgj
[1] I. Soyiri and D. Reidpath, "An overview o f health forecasting," [1 3 ] C. Chia-Lin, et aI. , "Modelling and forecasting tourism from East
Environmental Health and Preventive Medicine, vol. 1 8, pp. 1 -9, Asia to Thailand under temporal and spatial aggregation, " Math.
2013/0 110 1 20 1 3 . Comput. Simul. , vol. 79, pp. 1 730-1744, 2009.
[ 1 4] WHO. (20 1 3 , June 1 6). Malaria Fact Sheet.

You might also like