Professional Documents
Culture Documents
GENERAL
repeatable experiments (Box-Jenkins, 1970, Robeson and Steyn, 1990, Schlink et al.,
1997). Air quality data constitutes a good example of time series (Chock et al., 1975). The
Box-Jenkins approach has been thoroughly applied to the analysis of air quality data (Khare
and Sharma, 2002, Robeson and Steyn, 1990, Schlink et al., 1997, Chock et al., 1975). This
approach extracts all the trends and serial correlations among the data until only a sequence
of white noise (shock) remains. The extraction is accomplished via the difference,
autoregressive and moving average operators. To remove trends or the non stationarity in
the time series (Chock, 1975). The definitions of various statistical terms used in univariate
Mean : It is one of the measure to represent the central tendency of the air quality data. In
time series analysis mean () values are calculated for each segment of data to check the
1 n
zt
n t 1
( F.1)
Where, zt = air quality observation at time t, t= 1,..n
Variance : It is the another condition to check the stationarity of the data in time series
analysis. The variance (2) of the series express the degree of variation around the assumed
constant mean level and as such gives a measure of uncertainty around this mean.
1 n
2
n
( z t ) 2 ( F.2)
t 1
Stationarity: A series which contains no systematic change in mean (no trend), variance
and periodic variations; then it is called a stationary series. The analysis of time series
(b) Square root transformation:- this can be employed when variance of the series
Trend: Any systematic change in the level of a time series. Box-Jenkins (1970) have
advocated a method called ‘differencing’ for removal of trends in the series. The method
of differencing a time series consists of subtracting the values of the observations from one
another in some defined time dependent order. Eg. a first (order) difference transformation
is defined as the difference between the values of two adjacent observations; second (order)
Autocorrelation function (ACF): This is an important tool for a time series model
apart. The autocorrelation function of a time series process at lag k is defined as:
ACF ( k )
cov z , z
t t 1
(F..3)
var(z )
t
The set of values k and the plot of k against k = 1, 2, , are known as the autocorrelation
is defined as the correlation between time series terms k lags apart, after the correlation
due to intermediate terms has been removed (Milions and Davies, 1994a) i e. the partial
autocorrelations constitute a device for summarizing all the information contained in the
1983). The lag k partial autocorrelation is the partial regression coefficient kk in the kth
It also measures the additional correlation between Zt and Zt-k after adjustments have been
made for the intermediate variables Zt-1, Zt-2, , Zt-k+1. The kk is obtained from Yule-
Walker equations (Mills, 1991). In general, it difficult to know the population values of
Consequently sample autocorrelation and partial autocorrelation functions are used for
are only estimates, they are subject to sampling errors, and as such will never match exactly
to the underlying true autocorrelations and partial autocorrelations. Table F.1 shows the
properties of ACF and PACF for different B-J seasonal and non-seasonal models that help
Table F.1a Properties of the ACF and PACF for non-seasonal B-J models.
Model ACF PACF
Tails off.
Exponential and/or sine wave
AR (p) Cuts off after lag p (p spikes).
decay; may contain damped
oscillations.
Tails off.
Dominated by linear combination of
MA (q) Cuts off after lag q (q spikes). damped exponentials and/or sine
waves; may contain damped
oscillations.
Tails off after q-p lags. Tails off after p-q lags. Dominated by
ARMA
Exponential and/ or sine exponential and/or sine waves after p-
(p,q)
wave decay after q-p lags. q lags.
Table F.1b Properties of the ACF and PACF for seasonal B-J models.
Model ACF PACF
AR (p), Seasonal AR (P) Tails off. Cuts off after lag p + sP.
MA (q), Seasonal MA (Q) Cuts off after lag q + sQ. Tails off.
respectively. These two conditions provide a diagnostic tool to check the stationarity of the
fitted model. For stationarity series, the autoregressive and moving average parameters of
the fitted model should be less than one. If the model fails to fulfill these two conditions,
White noise: White noise is a sequence of random shocks drawn from a fixed distribution,
Residual analysis : The residuals of the ARIMA model can be defined as the difference
between the observed and fitted values. If the model adequately depicts the ARIMA
process governing in the sample data series then residuals are white noise. The whiteness
of the residuals is examined by two approaches. First by seeing ACF plots. Second, by
Ljung-Box Q statistic test (also called Portmanteau lack-of-fit-test). If the residuals are
truly white noise, then their ACF should have no spikes and autocorrelations should be
small. Thus, the autocorrelations rk, which lie, say, outside the range 2 (i.e. outside
n
the approximate 95% large sample confidence limits) are significantly different from zero.
where n = the length of the series after any differencing, K= number of residual
autocorrelation used to calculate Q, k = lag period. If the fitted model is appropriate (i.e. if
variable with (K-p-q-P-Q) degrees of freedom, where p, q, P and Q are the numbers
autoregressive and the seasonal moving average parameters respectively. The Q statistic
is sensitive to the value of K. However, Davies et al.(1977) and, Chatfield (1996) suggested
that just “looking” at a few values of rk, particularly at lags 1, 2 and the first seasonal lag
(if any), and examining whether any are significantly different from zero using the crude
limits of the series 2 , is sufficient to test the whiteness of the residuals (Khare and
n
Sharma, 2002).
Metadiagnosis This includes omitting or fitting extra parameters where the model is over
diagnostic checking of model adequacy advocated by Box-Jenkins (1970). This checks the
presence of redundant parameters in the fitted model. Redundant parameters can be spotted
by calculating the t-ratio, which is the ratio of the parameter estimate to the standard error.
A parameter is significantly different from zero, if the t-ratio is equal to or greater than 2
overspecified and simplification of the model is possible (Khare and Sharma, 2002).
Underspecified model (fitting extra parameters): This procedure verify that the tentative
model contains the appropriate number of parameters to represent the data and check that
if additional parameter results improvement over the original model. Example if the fitted
model is ARMA (p, d, q), more elaborate models ARMA (p+1, d, q) and ARMA(p, d, q+1)
are fitted to the data. The model is then tested to see whether the additional parameters
improve the fit significantly. This is seen by examining residual variances; if the white
noise variances are reduced by 10% by fitting an overfit model, then the overfit model is