You are on page 1of 10

Building ARIMA Models

The simple ARIMA model


AutoRegressive Integrated Moving Average (ARIMA) models intend to describe
the current behavior of variables in terms of linear relationships with their past
values. These models are also called Box-Jenkins (1984) models on the basis of
these authors pioneering work regarding time-series forecasting techniques. An
ARIMA model can be decomposed in two parts. First, it has an Integrated (I)
component (d), which represents the amount of differencing to be performed on the
series to make it stationary. The second component of an ARIMA consists of an
ARMA model for the series rendered stationary through differentiation. The
ARMA component is further decomposed into AR and MA components. The
autoregressive (AR) component captures the correlation between the current value
of the time series and some of its past values. For example, AR(1) means that the
current observation is correlated with its immediate past value at time t-1. The
Moving Average (MA) component represents the duration of the influence of a
random (unexplained) shock. For example, MA(1) means that a shock on the value
of the series at time t is correlated with the shock at t-1. The Autocorrelation
Function (ACF) and Partial Autocorrelation Function (PCF) are used to estimate
the values of p and q, using the rules reported in Table 1. In the next section, we
provide an example of a simple ARIMA model.


ARIMA models have three parts, although not all parts are always necessary. The
three parts are the autoregression part (AR), the integration part (I) and the moving
average part (MA).
The main assumption surrounding the AR part of a time series is that the observed
value depends on some linear combination of previous observed values up to a
defined maximum lag (denoted p), plus a random error term t. The main
assumption surrounding the MA part of a time series is that the observed value is a
random error term plus some linear combination of previous random error terms up
to a defined maximum lag (denoted q).
To analyse a time series we require that all of the observations are independently
identifiable. Hence there should be no autocorrelation in the series and the series
should have zero mean. In order for these requirements to be met all of the signal
(the trend and seasonal components of the series being modelled) must have been
removed from the series so that we are left with only noise. Therefore it is only the
irregular component of the series which is being modelled, not the trend or
seasonal components. If the series has zero mean and other moments such as the
variance and covariance do not depend on the passage of time, then the series is
said to be stationary. In order to achieve stationarity the series must be differenced
(unless it is stationary to begin with). This means taking the differences between
successive observations and then analysing these differences instead of the actual
observations. This process of differencing is known as integration and the order of
differencing is denoted d.

Definition of
'Autocorrelation
'

A mathematical
representation of
the degree of
similarity between a
given time series
and a lagged
version of itself
over successive
time intervals. It is
the same as
calculating the
correlation between
two different time
series, except that
the same time
series is used twice
- once in its original
form and once
lagged one or more
time periods.

The term can also
be referred to as
"lagged correlation"
or "serial
correlation".

Investopedia explains 'Autocorrelation'

When computed, the resulting number can range
from +1 to -1. An autocorrelation of +1 represents
perfect positive correlation (i.e. an increase seen in
one time series will lead to a proportionate increase in
the other time series), while a value of -1 represents
perfect negative correlation (i.e. an increase seen in
one time series results in a proportionate decrease in
the other time series).
statistics, the occurrence of several independent variables in a multiple
regression model are closely correlated to one another. Multicollinearity can
cause strange results when attempting to study how well individual
independent variables contribute to an understanding of the dependent
variable. In general, multicollinearity can cause wide confidence intervals and
strange P values for independent variables
In statistics, when the standard deviations of a variable, monitored over a
specific amount of time, are non-constant. Heteroskedasticity often arises in
two forms, conditional and unconditional. Conditional heteroskedasticity
identifies non-constant volatility when future periods of high and low
volatility cannot be identified. Unconditional heteroskedasticity is used when
futures periods of high and low volatility can be identified
In an ARIMA model, we do not have a priori for forecasting model before model
identification takes place. ARIMA helps us to choose a right model to best fit the time
series. Put it in a flow chart:

Demonstration to find "right ARIMA model (p, d, q)" to fit the time series
through trial and error:
Firstly, download the excel file called "exchange_rate" from the "Sample Data" of
Econ3600 homepage.
Second, open EVIEWS program in this way: click "File", "New", "Workfile"
commands, then in the "Workfile Range", choose "Monthly" and type
"1990.01" for the "Start observation" and "2000.07" for "End observation" in
the dialogue box. Then, we will get a workfile. Next, import the data from the
excel file to generate the following result: (Remember to change "B8" for upper
left data cell.)

Double click the variable "yen" to check its data whether it is consistent with the
Excel file and choose "View", "Line" to get a general idea about the time series is
stationary or no. Also, choose "View", "Correlogram" to get the tentatively
identify patterns and model components (i.e. the degree of p, d, q of ARIMA) The
resulting graphs are:


From the above graphs, you can see that the time series is likely to have random
walk pattern, which random walk up and down in the line graph. Also, in
correlogram, the ACFs are suffered from linear decline and there is only one
significant spike for PACFs. The graph of correlogram suggests that ARIMA(1, 0,
0) may be an appropriate model. Then, we take the first-difference of "Yen" to see
whether the time series becomes stationary before further finding AR(p) and
MA(q). (Remember that I(d) is used to get stationary series if necessary.)
To see whether first difference can get level-stationary time series or not, you need
to generate it by choosing "GENR", type "dyen=d(yen)". Then, you will get
"dyen" item in the "Workfile", and use it to draw a line graph and also get a
correlogram graph. the results are:



Now, the first-difference series "DYEN" becomes stationary as showing in line
graph and is white noise as shown no significant patterns in the graph of
correlgram. And the unit root test also confirms the first-difference becomes
stationary. The strong evidents support that the ARIMA(0,1,0) is suitable for the
time series. Then, we can construct the ARIMA model as following steps:
Step 1. Choosing "Quick", "Estimate Equation", then specify the mode and type
" yen c ar(1)",

click "OK", the result is:

Step 2. choosing " View", "Residual tests", "Correlogram-Q- Statistic" the result
is:

(Since there is no significant spikes of ACFs and PACFs, it means that the
resduals of this selected ARIMA model are white noise, so that there is no other
significant patterns left in the time series, then we can stop at here and don't need
to further consider another AR(p) and MA(q))
The criterions to judge for the best model are as follows:
Relatively small of BIC (Schwarz criterion which is measured by
nLog(SEE)+kLog(n))
Relatively small of SEE
Relatively high adjust R
2

Q- statistics and correlogram show that there is no significant pattern left in
the ACFs and PACFs of the residuals, it means the residuals of the selected
model are white noise.
You may try another ARIMAs and compare the statistical results as in the
following table:
ARIMA model BIC Adjusted R2 SEE
(1, 0 , 0) 5.708 0.93476 4.075
(1, 0, 1) 5.734 0.93503 4.067
(2, 0, 0) 5.725 0.93425 4.047
(0, 0, 1) 7.384 0.65598 9.422
(0, 0, 2) 6.888 0.79754 7.220
(1, 1, 0) 5.724 0.0019 4.108
(0, 1, 0) 5.708 0.9347 4.075
As you can see that ARIMA(1,0,0) is a relatively best model,
Remark: The ARIMA (1, 0, 0) is same as ARIMA (0, 1, 0). The result of
ARIMA(0,1,0) is:

In our several trial and error procedures, the ARIMA(1,0,0) or ARIMA(0,1,0) is
selected as the best model.
Now, we can express this selected best model as

Students are encouraged to try to find the best ARIMA model for the series of
"pound".
Now, let's try another complicated time series. Click Here to continue.

You might also like