You are on page 1of 13

Gemille Isabel P.

Gloria Math 297 (Applied Actuarial Statistics) SDEF


2010-48494 Time Series Term Paper
PM Applied Mathematics (Actuarial Science) December 17, 2016
Time Series Analysis and Forecasting on the National Government’s Expenditures from January 2012
to October 2016 using ARIMA modeling process
I. Abstract
This paper presents a time series analysis on the monthly expenditures of the National
Government from January 2012 to October 2016, as recorded from the fiscal and treasury operations of
the Bureau of the Treasury (SDAD-RS, 2016), and generates a predictive model to forecast future
expenditures using the Autoregressive Integrated Moving Average or ARIMA modeling process. As a
result of the study, the current National Government expenditures are affected by the previous month’s
expenditures and the same month of the previous year since the due date of the expenditures are
following a schedule which should reflect in the books/journals and also in the financial reports. The
final ARIMA model is ARIMA(1,0,0)(1,0,0).
II. Introduction
The time series data of the National Government Expenditures is composed of the expenditures
allotted to the Local Government Units (LGUs), interest payments, tax expenditures, subsidies,
equities, net lending and others not specified. Among the expenditures specified above, the other
expenditures take most of the total percentage of the national government expenditures followed by the
expenditures allotted to LGUs and interest payments.
Expenditures made by the national government must be monitored closely since the money
used for the transactions and operations is from the hard-earned pay of the Filipinos. Through the end
result of this study, the monthly expenditures may be closely monitored through the model by
forecasting the future expenditures and compare it with the actual. When there is a suspicious disparity
on the predicted and the actual amount, there will be room for investigation or auditing.
III. Data Characteristics
The statistical data used in this study was obtained from the site of the Bureau of the Treasury
which specially generates time series on national government cash operations that may be deemed
useful to the private and public sectors. The frequency of the time series available in the site is monthly
and annually. For the purpose of the study, monthly frequency is used and expenditures from the start
of the year of 2012 up to October of 2016. Using the R software, the plot of the time series of the
National government expenditure can be generated through the code below:
>plot.ts(PHts)

Page 1 of 13
Figure 1 Plot of the Time Series of National Government Expenditures (Jan. 2012 – Oct. 2016)
From the plot, as shown in Figure 1, it can be seen that there is an existing trend in the
expenditures for the past years which is in an increasing manner. Also, there seems to be seasonal
variation in the amount of expenditures per month since there is a peak every first middle (around May
to June) and end of the calendar year and a trough every beginning and second middle (June to July)
of the calendar year. The seasonal fluctuations are roughly constant in size and the random
fluctuations seem to be roughly constant in size over time. The time series could be described by using
an additive model.
Upon checking the boxplot of the time series data of the national government expenditures, no
outliers were found as shown in Figure 2.

Page 2 of 13
Figure 2 Boxplot of the Time Series of the National Government Expenditures
In using the processes to generate an ARIMA model, the time series must be stationary. Using
the R code below to determine the number of differences to achieve the stationary status:
>ndiffs(PHts, alpha=0.05, test="adf", max.d=3)
The code resulted to zero differences which means that no differencing is needed to make the time
series be stationary. To formally check the stationary status of the time series, Augmented Dickey-
Fuller Test is used. Using the R code for the said test given below:
>adf.test(PHts, alternative="stationary")

The resulting p-value of the test is less than 0.01. This implies that at 95% confidence level, there is a
sufficient evidence to reject the null hypothesis that the time series data of the national government
expenditure is not stationary. Another requirement for the time series data to be used in the ARIMA
model is that the data for each period must be correlated. Using the autocorrelation test in R, which is
the Box-Ljung test with the given code below:
>Box.test(PHts, type="Ljung-Box")

With the resulting p-value of 0.008819 from the test, we can say that at 95% confidence level, there is a
sufficient evidence to reject the null hypothesis that the monthly expenditures in the time series data are

Page 3 of 13
not autocorrelated. With the given conditions above that is satisfied by the time series data of the
national government expenditures, selection of ARIMA model can be pursued.
IV. Methodology on selecting ARIMA Model
A. Selecting an ARIMA Model
Since the time series used in this study is already stationary, the next step is to finding an
appropriate ARIMA model. This means that appropriate values for p and q for an ARIMA(p,d,q) model
must be attained. This can be done by examining first the correlogram and then the partial correlogram
of the stationary time series. Using the R codes below in generating the correlogram and partial
correlogram, autocorrelation function (acf) and partial autocorrelation function (pacf) will be used,
respectively:
>acf(PHts, lag.max=20)

Figure 3 Correlogram of PHts

>pacf(PHts, lag.max=20)

Page 4 of 13
Figure 4 Partial correlogram of PHts

In Figures 3 and 4, the correlogram and the partial corellogram of the time series are shown,
respectively. From the correlogram, the autocorrelation at lags 1, 3, 5, 6 and 12 exceed the Bartletts,
but all other autocorrelations do not exceed the Bartletts. On the other hand, the partial correlogram
shows that the partial autocorrelation at lags 1, 6 and 12 exceed the Bartletts, but all other partial
autocorrelations do not exceed the Bartletts. Also, from the partial correlogram, the seasonal
fluctuations that were implied in the plot of the time series at the beginning of this paper are manifested
in the lags 1, 6 and 12.
From the above conclusion, many combinations of ARIMA models can be made. To start with
and taking into consideration the principle of parsimony, ARIMA(1,0,0)x(1,0,0) or AR(1)(1,0,0) with non-
zero mean will be used as the preliminary model and will be justified at the end of this paper as the final
model using the required diagnostic checkings and, in turn, will be used in forecasting. AR(1)(1) model
with non-zero mean is an autoregressive model of order p=1 with autoregressive seasonality of order
P=1. The equation for this model is given below:
𝑦𝑡 = ∅𝑦𝑡−1 + 𝜗𝑦𝑡−12 + 𝑎𝑡 + 𝑐
where 𝑦𝑡 is the amount of the national government expenditure at time t, ∅ is the coefficient of the past
value 𝑦𝑡−1 , 𝜗 is the coefficient of the past value 𝑦𝑡−12, 𝑎𝑡 is the error term and 𝑐 be the intercept.
B. Coefficients of the model
Using the R software, the coefficients for AR(1)(1) model can be generated with the given code
below:

Page 5 of 13
>PHmod <- arima(PHts, order = c(1,0,0), seasonal = c(1,0,0), include.mean = TRUE)
>coeftest(PHmod)

From the result above, at 95% confidence level there is a sufficient evidence to reject the null
hypothesis that the coefficients for the AR(1)(1) model are equal to zero when in fact the coefficients
are not equal to zero and are significant.
C. Diagnostic Checking
The following tests are done to the AR(1)(1) model of this study with the results.
a. Stationary – It is stated from the section of the Data Characteristics that the time series
used for the model is stationary.
b. Test for Autocorrelation using Box-Ljung Test – The test is used to know if the residuals of
the model have autocorrelations. The null hypothesis for this test is that there is no
autocorrelation among the data points. Using the R code below for this test:
>Box.test(PHmod$residuals, type = “Ljung-Box”)

Given the above result, at 5% significance level, there is sufficient evidence to accept the
null hypothesis that the residuals of the model are not autocorrelated.
c. Test for Normality using Jarque-Bera test – the residuals must be normally distributed. The
null hypothesis for this test is that the distribution is normal for the data points. Using the R
code below, the result of the test for normality of the residuals of the model is generated:
>jarque.bera.test(PHmod$residuals)

Given the above result, at 95% confidence level, there is sufficient evidence to accept the
null hypothesis that the residuals of the model are normally distributed.

Page 6 of 13
d. Test for Homoscedasticity using ARCH test – the data point of the time series must not be
heteroscedastic when used to generate the model. Using the R code below, the result for
the ARCH test is given as:
>jarque.bera.test(PHmod$residuals)

Given the above result, there is sufficient evidence, at 95% confidence level, to accept the
null hypothesis that the residuals of the model are homoscedastic.
e. Acf/ correlogram – the autocorrelation of the residuals at all lags must not exceed the
Bartletts in the correlogram. Using the R code below, the correlogram is generated:
>acf(PHmod$residuals)

Figure 5 Correlogram of the residuals of the model AR(1)(1)

From the result above reflected in Figure 5, the autocorrelation of the residuals at each lag
does not exceed the Bartletts
f. Pacf/ partial correlogram – the partial autocorrelation of the residuals at all lags must not
exceed the Bartletts in the partial correlogram. Using the R code below, the correlogram is
generated:
>pacf(PHmod$residuals)

Page 7 of 13
Figure 6 Partial Correlogram of the residuals of the model AR(1)(1)

From the result above reflected in Figure 6, the partial autocorrelation of the residuals at
each lag does not exceed the Bartletts.
Given that the model satisfies the required diagnostic checking mentioned above, the
model can be used in forecasting. Substituting the significant coefficients of the model
generated, the final model becomes:
𝑦𝑡 = 0.34591𝑦𝑡−1 + 0.73985𝑦𝑡−12 + 173,190
The model is interpreted as: the amount of the month’s National Government expenditures
is affected by the preceding month’s expenditure by a factor of 0.3459, by the same month
of the preceding year by a factor of 0.73985 and an intercept of 173,190.
IV. Forecasting using the In-sample data
To start with the forecasting, the original time series data must be split into in-sample and out-
sample data points. For this study, the in-sample used includes data points of the original series from
January 2012 to May 2016 to maintain the at least 50 data points, whereas, the out-sample used
includes the remaining data points from June 2016 to October 2016. Using the R codes below, the
forecasts for the next 5 months and the next 12 months worth of data on the amount of National
Government expenditures are generated:
>PHts_in <- window(PHts, frequency=12, start=c(2012,1), end=c(2016,5))
>PH <- arima(PHts_in, order=c(1,0,0), seasonal=c(1,0,0), include.mean=TRUE)

Page 8 of 13
>plot(PH)

Figure 7 Time series plot of the AR(1)(1) model using the In-Sample data

>coeftest(PH)

>forecast.Arima(PH, h=5, level=c(0.95))

>plot.forecast(forecast.Arima(PH, h=5, level=c(0.95)), plot.conf=TRUE)

Page 9 of 13
Figure 8 Forecasts from AR(1)(1) model using the In-Sample data points for succeeding 5 months
(June 2016 to October 2016)

>forecast.Arima(PH, h=5, level=c(0.95))

>plot.forecast(forecast.Arima(PH, h=12, level=c(0.95)), plot.conf=TRUE)

Page 10 of 13
Figure 9 Forecasts from AR(1)(1) model using the In-Sample data points for succeeding 12 months
(June 2016 to May 2017)

V. Summary and conclusion


The time series data of the National Government expenditures are founded to have seasonality
every half of the year. Also, the current month’s expenditures can be explained or is affected by the
previous month’s expenditures and the same month of the previous year which is reasonable since
the payment due date for each of the components of the expenditures namely, expenses allotted to
LGUs, interest payments, tax expenditures, subsidy, equity, net lending and others are somewhat
fixed and are pre-determined since they follow a schedule to which they are realized in the books.
VI. Appendix
A. R Code:
PHExp <- read.table(choose.files(), header=TRUE, sep=",")
PHts <- ts(PHExp$NatlGovtExp, frequency=12, start=c(2012,1))
plot.ts(PHts)
boxplot(PHts, main = "Boxplot of PHts")
#determines the number of differences to be stationary
ndiffs(PHts, alpha=0.05, test="adf", max.d=3)

#formal test to determine if the time series is stationary


adf.test(PHts, alternative="stationary")

#Correlogram and Partial Correlogram

Page 11 of 13
Acf(PHts, lag.max = 20) #requires stats package
Pacf(PHts, lag.max = 20) #requires stats package

#Test if the time series data has autocorrelation


Box.test(PHts, type="Ljung-Box")

#ARIMA model
PHmod <- Arima(PHts, order = c(1,0,0), seasonal = c(1,0,0), include.mean = TRUE)
#requires stats package
coeftest(PHmod)

#Diagnostic Checking
jarque.bera.test(PHmod$residuals)
Box.test(PHmod$residuals, type="Ljung-Box")
ArchTest()

#Residual analysis
plot(PHmod$residuals)
Acf(PHmod$residuals)
Pacf(PHmod$residuals)
ArchTest(PHmod$residuals)

# Suppose in sample until May 2016, out sample = June 2016 to October 2016
PHts_in <- window(PHts, frequency=12, start=c(2012,1), end=c(2016,5))
PHts_in
ndiffs(PHts_in, alpha=0.05, test="adf", max.d=3)
adf.test(PHts_in, alternative="stationary")
Acf(PHts_in)
Pacf(PHts_in)

PH <- Arima(PHts_in, order=c(1,0,0), seasonal=c(1,0,0), include.mean=TRUE)


accuracy(PH)
coeftest(PH)
boxplot(PH)

Page 12 of 13
summary(PH)
plot(PH)

#Diagnostic checking
plot(PH$residuals)
Acf(PH$residuals)
Pacf(PH2$residuals)
jarque.bera.test(PH$residuals)
ArchTest(PH$residuals, lag=12) #requires FinTS package
Box.test(PH$residuals)

#Forecast
#5 months worth of forecast
forecast.Arima(PH, h=12, level=c(0.95))
plot.forecast(forecast.Arima(PH, h=12, level=c(0.95)), plot.conf=TRUE)
#1 year worth of forecast
forecast.Arima(PH, h=12, level=c(0.95))
plot.forecast(forecast.Arima(PH, h=12, level=c(0.95)), plot.conf=TRUE)
VII. References
SDAD-RS. (2016, November 29). Statistical Data | Bureau of the Treasury. Retrieved December 2016,
from Bureau of the Treasury: http://www.treasury.gov.ph/?page_id=746
Coghlan, A. (2016, July). A Little Book of R for Time Series. Retrieved December 2016, from Read the
Docs: https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-
series.pdf

Page 13 of 13

You might also like