You are on page 1of 7

UNIVARIATE TIME SERIES ANALYSIS

GENERAL

Time series analysis is purely based on statistical method applicable to non

repeatable experiments (Box-Jenkins, 1970, Robeson and Steyn, 1990, Schlink et al.,

1997). Air quality data constitutes a good example of time series (Chock et al., 1975). The

Box-Jenkins approach has been thoroughly applied to the analysis of air quality data (Khare

and Sharma, 2002, Robeson and Steyn, 1990, Schlink et al., 1997, Chock et al., 1975). This

approach extracts all the trends and serial correlations among the data until only a sequence

of white noise (shock) remains. The extraction is accomplished via the difference,

autoregressive and moving average operators. To remove trends or the non stationarity in

the time series (Chock, 1975). The definitions of various statistical terms used in univariate

time series analysis have been presented in the following section.

DEFINITION OF TERMS IN UNIVARIATE TIME SERIES ANALYSIS

Stochastic process: A stochastic process is defined as a statistical phenomenon that evolves

over time according to probability laws.

Mean : It is one of the measure to represent the central tendency of the air quality data. In

time series analysis mean () values are calculated for each segment of data to check the

stationarity of the series.

1 n
 zt
n t 1
( F.1)
Where, zt = air quality observation at time t, t= 1,..n

n = number of observations in a segment

Variance : It is the another condition to check the stationarity of the data in time series

analysis. The variance (2) of the series express the degree of variation around the assumed

constant mean level and as such gives a measure of uncertainty around this mean.

Mathematically it is expressed as:

1 n
2 
n
 ( z t  ) 2 ( F.2)
t 1
Stationarity: A series which contains no systematic change in mean (no trend), variance

and periodic variations; then it is called a stationary series. The analysis of time series

requires stationary series. Many possible data transformation techniques available to

convert non-stationary series into stationary series.

(a) logarithmic transformation:- it can be employed when variance of the series is

proportional to standard deviation of the series.

(b) Square root transformation:- this can be employed when variance of the series

is proportional to mean value of the series.

Trend: Any systematic change in the level of a time series. Box-Jenkins (1970) have

advocated a method called ‘differencing’ for removal of trends in the series. The method

of differencing a time series consists of subtracting the values of the observations from one

another in some defined time dependent order. Eg. a first (order) difference transformation
is defined as the difference between the values of two adjacent observations; second (order)

differencing consists of differencing of the differenced series; and so on.

Autocorrelation function (ACF): This is an important tool for a time series model

identification. It measures the correlation between observations at different distances (lags)

apart. The autocorrelation function of a time series process at lag k is defined as:

ACF  ( k ) 

cov z , z
t t 1
 (F..3)
var(z )
t

Where cov = covariance , var = variance.

The set of values k and the plot of k against k = 1, 2, , are known as the autocorrelation

function (ACF) or correllgram.

Partial autocorrelation function (PACF): The partial autocorrelation function at lag k

is defined as the correlation between time series terms k lags apart, after the correlation

due to intermediate terms has been removed (Milions and Davies, 1994a) i e. the partial

autocorrelations constitute a device for summarizing all the information contained in the

ACF of an autoregressive process in a small number of non-zero statistics (Vandaele,

1983). The lag k partial autocorrelation is the partial regression coefficient kk in the kth

order autoregression (equation F.4).

Zt  k1Zt 1  k2 Zt 2   kk Zt k  a t (F.4)

It also measures the additional correlation between Zt and Zt-k after adjustments have been

made for the intermediate variables Zt-1, Zt-2, , Zt-k+1. The kk is obtained from Yule-
Walker equations (Mills, 1991). In general, it difficult to know the population values of

autocorrelation and partial autocorrelation of the underlying stochastic processes.

Consequently sample autocorrelation and partial autocorrelation functions are used for

identification of tentative model. Since sample autocorrelation and partial autocorrelation

are only estimates, they are subject to sampling errors, and as such will never match exactly

to the underlying true autocorrelations and partial autocorrelations. Table F.1 shows the

properties of ACF and PACF for different B-J seasonal and non-seasonal models that help

in selection of tentative model.

Table F.1a Properties of the ACF and PACF for non-seasonal B-J models.
Model ACF PACF
Tails off.
Exponential and/or sine wave
AR (p) Cuts off after lag p (p spikes).
decay; may contain damped
oscillations.
Tails off.
Dominated by linear combination of
MA (q) Cuts off after lag q (q spikes). damped exponentials and/or sine
waves; may contain damped
oscillations.

Tails off after q-p lags. Tails off after p-q lags. Dominated by
ARMA
Exponential and/ or sine exponential and/or sine waves after p-
(p,q)
wave decay after q-p lags. q lags.
Table F.1b Properties of the ACF and PACF for seasonal B-J models.
Model ACF PACF

AR (p), Seasonal AR (P) Tails off. Cuts off after lag p + sP.

MA (q), Seasonal MA (Q) Cuts off after lag q + sQ. Tails off.

Tails off after (p + sP) - (q


Tails off after (q + sQ) – (p
+ sQ) lags.
+ sP) lags. Exponential and/
Mixed models Exponential and/ or sine
or sine wave decay after (q
wave decay after (p + sP) -
+ sQ) – (p + sP) lags.
(q + sQ) lags.

Stationarity and invertibility conditions: Stationarity and invertibility conditions impose

restrictions on the parameters of the autoregressive and moving average processes

respectively. These two conditions provide a diagnostic tool to check the stationarity of the

fitted model. For stationarity series, the autoregressive and moving average parameters of

the fitted model should be less than one. If the model fails to fulfill these two conditions,

it implies that series is non-stationary, therefore additional differencing, is required in order

to induce stationarity (Khare and Sharma, 2002).

White noise: White noise is a sequence of random shocks drawn from a fixed distribution,

with zero mean and constant variance.

Residual analysis : The residuals of the ARIMA model can be defined as the difference

between the observed and fitted values. If the model adequately depicts the ARIMA

process governing in the sample data series then residuals are white noise. The whiteness

of the residuals is examined by two approaches. First by seeing ACF plots. Second, by
Ljung-Box Q statistic test (also called Portmanteau lack-of-fit-test). If the residuals are

truly white noise, then their ACF should have no spikes and autocorrelations should be

small. Thus, the autocorrelations rk, which lie, say, outside the range  2 (i.e. outside
n

the approximate 95% large sample confidence limits) are significantly different from zero.

In second approach for analyzing the residual autocorrelations is to rely on the

Ljung-Box Q statistic, defined in equation F.5.

Q  QK  nn  2 


k 1
rk2 â ( F.5)
k 1 n  k

where n = the length of the series after any differencing, K= number of residual

autocorrelation used to calculate Q, k = lag period. If the fitted model is appropriate (i.e. if

the residuals are white noise), Q is approximately distributed as a Chi-square distribute

variable with (K-p-q-P-Q) degrees of freedom, where p, q, P and Q are the numbers

parameters in ARIMA model, representing autoregressive, moving average, the seasonal

autoregressive and the seasonal moving average parameters respectively. The Q statistic

is sensitive to the value of K. However, Davies et al.(1977) and, Chatfield (1996) suggested

that just “looking” at a few values of rk, particularly at lags 1, 2 and the first seasonal lag

(if any), and examining whether any are significantly different from zero using the crude

limits of the series  2 , is sufficient to test the whiteness of the residuals (Khare and
n

Sharma, 2002).
Metadiagnosis This includes omitting or fitting extra parameters where the model is over

specified or underspecified as described below.

Overspecified model (omitting parameters): Overfitting is one of the procedure for

diagnostic checking of model adequacy advocated by Box-Jenkins (1970). This checks the

presence of redundant parameters in the fitted model. Redundant parameters can be spotted

by calculating the t-ratio, which is the ratio of the parameter estimate to the standard error.

A parameter is significantly different from zero, if the t-ratio is equal to or greater than 2

in absolute value. An insignificant parameter is an indication that the model is

overspecified and simplification of the model is possible (Khare and Sharma, 2002).

Underspecified model (fitting extra parameters): This procedure verify that the tentative

model contains the appropriate number of parameters to represent the data and check that

if additional parameter results improvement over the original model. Example if the fitted

model is ARMA (p, d, q), more elaborate models ARMA (p+1, d, q) and ARMA(p, d, q+1)

are fitted to the data. The model is then tested to see whether the additional parameters

improve the fit significantly. This is seen by examining residual variances; if the white

noise variances are reduced by 10% by fitting an overfit model, then the overfit model is

appropriate (Khare and Sharma, 2002).

You might also like