Time Series Forecasting
A set of observations of a variable taken at regular intervals of time constitutes a Time Series.
Daily temperature, monthly rainfall, daily NASDAC index values are some of the examples of time series.
The most widely used method for forecasting a time series in the BoxJenkins ARIMA models
Some definitions
Auto Correlation (AC)
Auto Correlation of lag k is the correlation coefficient between the original series and another series which is nothing but the original series lagged by k terms. A plot of these values at various lags is called the Auto Correlation Function (ACF)
Partial Auto Correlation (PAC)
Partial autocorrelation of lag k is the correlation coefficient between the original series and another series which is the original series lagged by k terms, after the effect of other lags are removed. A plot of these values at various lags is called the Partial Auto Correlation Function (PACF)
Stationarity
In statistics, a stationary process (or strict(ly) stationary process or strong(ly) stationary process) is a stochastic process whose joint probability distribution does not change when shifted in time. Consequently, parameters such as the mean and variance, also do not change over time and do not follow any trends.
Seasonality
Patterns that repeat over known, fixed periods of time within the data set of a tie series is known as seasonality.
ARIMA models
ARIMA models are a class of models which are used very often to forecast time series values. This concept was propose by George Box and Gwilym Jenkins and hence they are also referred to as BoxJenkins method.
The overall procedure for forecasting a time series consists of the following 4 steps.
Step 1 : Identification Step 2 : Estimation (and selection)
Step 3 : Diagnostic checking Step 4 : Model’s use
Auto Regressive model (AR model)
Autoregressive model specifies that the output variable depends linearly on its own previous values.
If the output variable depends on the past p values of itself, we can write the AR(p) model as
Here, c is a constant,
regression and
is white noise.
are parameters to be determined by linear
Moving Averages model (MA model)
Another way to model as time series is to consider the past error terms and the mean of the series. If the output depends on the past q error terms, we can write the MA(q) model as
Here, μ is the mean of the series, the θ1,
and the εt, εt−1,
...
,
,
θq are the parameters of the model
... εt−q are white noise error terms.
Thus, a movingaverage model is conceptually a linear regression of the current value of the series against current and previous (unobserved) white noise error terms or random shocks. The random shocks at each point are assumed to be mutually independent and to come from the same distribution, typically a normal distribution, with mean zero and constant variance.
ARMA (p,q) model
A combination of AR and MA models results in the right hand side having both the AR terms and the MA terms.
The underlying assumption in ARMA models is that the series is stationary in the mean and variance. In case the original series is non stationary, we use differencing to make the series stationary. We, then, proceed to decide the values of p and q to
fit a ARMA model to the stationary series. The number of differencing generally will not be more than 2.
The following figure illustrates the effect of differencing.
The first graph clearly indicates that the series is not stationary. After first differencing, the series is modified, but is still not stationary. After the second differencing, we see that the series has become stationary.
If d is the level of differencing used, then the model is described as ARIMA (p,d,q). In the above example, d=2.
Seasonality
A stochastic process is said to be a seasonal (or periodic) time series with periodicity s if Zt and Zt+ks have the same distribution.
In other words, in a scatter plot, if we see a pattern repeating at regular intervals, we can conclude that the series has seasonality.
Seasonal differencing will generally remove seasonality.
The following plot shows a series which displays seasonality.
Once the seasonality is removed, if there is non stationarity (or trend), we need to do a normal differencing and make the series stationary before proceeding with further analysis.
The following plot shows a series with seasonality along with a trend (indicating nonstationary series).
The chart below shows the steps followed for defining a model and validating it.
Estimating p and q
Once stationarity and seasonality have been addressed, the next step is to identify the order (i.e. the p and q) of the autoregressive and moving average terms.
This can be done using the ACF and PACF plots.
The partial autocorrelation of an AR(p) process is zero at lag p + 1 and greater, so the appropriate maximum lag is the one beyond which the partial autocorrelations are all zero.
The autocorrelation function of an MA(q) process becomes zero at lag q + 1 and greater, so we determine the appropriate maximum lag for the estimation by examining the sample autocorrelation function to see where it becomes insignificantly different from zero for all lags beyond a certain lag, which is designated as the maximum lag q.
The rules for determining the values of p and q are summarized below.
AR(p) 
MA(q) 

ACF 
Damped sinusoidal / exponential 
Zero at lag q+1 and greater 
decay 

PACF 
Zero at lag p+1 and greater 
Damped sinusoidal / exponential 
decay 
ACF decays exponentially to zero 
Autoregressive model (use the partial autocorrelation plot to identify the order 
p) 

ACF has one or more spikes, rest are zero 
Moving average model (order q identified by where autocorrelation plot 
becomes zero) 

Exponential decay starting after a few lags 
Mixed autoregressive and moving average model 
Florian 

No significant autocorrelations 
White noise 
No decay to zero or very slow decay 
Nonstationarity – make the series 
stationary 

High values at fixed intervals 
Seasonality – use seasonal differencing 
Estimating parameters
While the pure AR model parameters can be estimated by least square method, the MA parameters need trial and error. Another most commonly used method is the ‘Maximum Likelihood method’.
Selection of the model
Generally, this is a trial and error procedure and the skill is developed by experience. The most often used way is to try out several models based on the ACF and the PACF and choose the one which minimizes the residuals.
If the model is a very good fit, then the residuals will be pure white noise. That means the ACF of the residuals will not have any significant values (they will all be close to zero, which can be tested by comparing it with 1.96/sqrt(n), where n is the length of the data).
So, among the several models, chose the one which gives minimum residuals and which has the autocorrelation of the residuals not different from zero).
There are other criteria which can be used to select the best model. The most popular one is the Akaike’s Information Criterion (AIC).
References
1. Box, George; Jenkins, Gwilym (1970), Time series analysis: Forecasting and control, San Francisco: HoldenDay
2. Makridakis, Wheelwright and Hyndman (2005), Forecasting Methods and Applications,3 ^{r}^{d} ed. John Wiley and sons.
3. Abraham, Bovas and Ledolter, Johannes (2005) Statistical Methods for Forecasting, John Wiley and sons
Note: The graphs and figures are taken from the web from various sites.