You are on page 1of 5

Time Series Analysis

an introduction


A time series is defined as a collection of observations made sequentially in time. This
means that there must be equal intervals of time in between observations.

Types of Time Series Data
Continuous vs. Discrete
Continuous - observations made continuously in time
Examples:
1. Seawater level as measured by an automated sensor.
2. Carbon dioxide output from an engine.
Discrete - observations made only at certain times.
Examples:
1. Animal species composition measured every month.
2. Bacteria culture size measured every six hours.
Stationary vs. on-stationary
Stationary - Data that fluctuate around a constant value
Non-stationary - A series having parameters of the cycle (i.e., length, amplitude or phase)
change over time

Deterministic vs. Stochastic
Deterministic time series - This data can be predicted exactly.
Stochastic time series - Data are only partly determined by past values and future values
have to be described with a probability distribution. This is the case for most, if not all,
natural time series. So many factors are involved in a natural system that we can not
possibly correctly apply all of them.

Autocorrelation
A series of data may have observations that are not independent of one another.
Example:
A population density on day 8 depends on what that population density was at on day 7.
And likewise, that in turn is dependent on day 6 and so forth.
The order of these data has to be taken into account so that we can assess the
autocorrelation involved..
To find out if autocorrelation exists:
Autocorrelation Coefficients measure correlations between observations a certain
distance apart.
Based on the ordinary correlation coefficient r, we can see if successive observations
are correlated. An autocorrelation coefficient at lag k can be found by:

This is the covariance (x
t
x
t+k
)divided by the variance (x
t
).
An r
k
value of ( 2/ ) denotes a significant difference from zero and signifies an
autocorrelation.
Also note that as k gets large, r
k
becomes smaller.

Correlograms
The autocorrelation coefficient r
k
can then be plotted against the lag (k) to develop a
correlogram. This will give us a visual look at a range of correlation coefficients at
relevant time lags so that significant values may be seen.
The correlogram in Fig.2 shows a short-term correlation being significant at low k and
small correlation at longer lags. Remember that an r
k
value of ( 2/ ) denotes a
significant difference (a = 0.05) from zero and signifies an autocorrelation. Some
procedures may call for a higher a value since this constitues expectation that one out of
every twenty obsservations in a truly random data series will be significant.

Figure 2. A time series showing short-term autocorrelation together with its correlogram.
Fig. 3 shows an alternating (negative correlation) time series.
The coefficient r
k
alternates as does the raw data (r
1
is negative and r
2
is positive ..)
This series of r
k
is negative.

Figure 3. An alternating time series with its correlogram.
i

Box-Jenkins Models (Forecasting)
Box and Jenkins developed the AutoRegressive Integrative Moving Average (ARIMA)
model which combined the AutoRegresive (AR) and Moving Average (MA) models
developed earlier with a differencing factor that removes in trend in the data.
This time series data can be expressed as: Y
1
, Y
2
, Y
3
,, Y
t-1
, Y
t

With random shocks (a) at each corresponding time: a
1
, a
2
, a
3
,,a
t-1
, a
t

In order to model a time series, we must state some assumptions about these 'shocks'.
They have:
1. a mean of zero
2. a constant variance
3. no covariance between shocks
4. a normal distribution (although there are procedures for dealing with this)
An ARIMA (p,d,q) model is composed of three elements:
p: Autoregression
d: Integration or Differencing
q: Moving Average
A simple ARIMA (0,0,0) model without any of the three processes above is written as:
Y
t
= a
t

The autoregression process [ARIMA (p,0,0)] refers to how important previous values are
to the current one over time. A data value at t
1
may affect the data value of the series at t
2

and t
3
. But the data value at t
1
will decrease on an exponential basis as time passes so that
the effect will decrease to near zero. It should be pointed out that is constrained
between -1 and 1 and as it becomes larger, the effects at all subsequent lags increase.
Y
t
=
1
Y
t-1
+ a
t

The integration process [ARIMA (0,d,0)] is differenced to remove the trend and drift of
the data (i.e. makes non-stationary data stationary). The first observation is subtracted
from the second and the second from the third and . So the final form without AR or
MA processes is the ARIMA (0,1,0) model:
Y
t
= Y
t-1
+ a
t

The order of the process rarely exceeds one (d < 2 in most situations).
The moving average process [ARIMA (0,0,q)] is used for serial correlated data. The
process is composed of the current random shock and portions of the q previous shocks.
An ARIMA (0,0,1) model is described as:
Y
t
= a
t
-
1
a
t-1

As with the integration process, the MA process rarely exceeds the first order.















i
Reference: http://userwww.sfsu.edu/~efc/classes/biol710/meseries/TimeSeriesAnalysis.html