an introduction
A time series is defined as a collection of observations made sequentially in time. This
means that there must be equal intervals of time in between observations.
Types of Time Series Data
Continuous vs. Discrete
Continuous  observations made continuously in time
Examples:
1. Seawater level as measured by an automated sensor.
2. Carbon dioxide output from an engine.
Discrete  observations made only at certain times.
Examples:
1. Animal species composition measured every month.
2. Bacteria culture size measured every six hours.
Stationary vs. onstationary
Stationary  Data that fluctuate around a constant value
Nonstationary  A series having parameters of the cycle (i.e., length, amplitude or phase)
change over time
Deterministic vs. Stochastic
Deterministic time series  This data can be predicted exactly.
Stochastic time series  Data are only partly determined by past values and future values
have to be described with a probability distribution. This is the case for most, if not all,
natural time series. So many factors are involved in a natural system that we can not
possibly correctly apply all of them.
Autocorrelation
A series of data may have observations that are not independent of one another.
Example:
A population density on day 8 depends on what that population density was at on day 7.
And likewise, that in turn is dependent on day 6 and so forth.
The order of these data has to be taken into account so that we can assess the
autocorrelation involved..
To find out if autocorrelation exists:
Autocorrelation Coefficients measure correlations between observations a certain
distance apart.
Based on the ordinary correlation coefficient r, we can see if successive observations
are correlated. An autocorrelation coefficient at lag k can be found by:
This is the covariance (x
t
x
t+k
)divided by the variance (x
t
).
An r
k
value of ( 2/ ) denotes a significant difference from zero and signifies an
autocorrelation.
Also note that as k gets large, r
k
becomes smaller.
Correlograms
The autocorrelation coefficient r
k
can then be plotted against the lag (k) to develop a
correlogram. This will give us a visual look at a range of correlation coefficients at
relevant time lags so that significant values may be seen.
The correlogram in Fig.2 shows a shortterm correlation being significant at low k and
small correlation at longer lags. Remember that an r
k
value of ( 2/ ) denotes a
significant difference (a = 0.05) from zero and signifies an autocorrelation. Some
procedures may call for a higher a value since this constitues expectation that one out of
every twenty obsservations in a truly random data series will be significant.
Figure 2. A time series showing shortterm autocorrelation together with its correlogram.
Fig. 3 shows an alternating (negative correlation) time series.
The coefficient r
k
alternates as does the raw data (r
1
is negative and r
2
is positive ..)
This series of r
k
is negative.
Figure 3. An alternating time series with its correlogram.
i
BoxJenkins Models (Forecasting)
Box and Jenkins developed the AutoRegressive Integrative Moving Average (ARIMA)
model which combined the AutoRegresive (AR) and Moving Average (MA) models
developed earlier with a differencing factor that removes in trend in the data.
This time series data can be expressed as: Y
1
, Y
2
, Y
3
,, Y
t1
, Y
t
With random shocks (a) at each corresponding time: a
1
, a
2
, a
3
,,a
t1
, a
t
In order to model a time series, we must state some assumptions about these 'shocks'.
They have:
1. a mean of zero
2. a constant variance
3. no covariance between shocks
4. a normal distribution (although there are procedures for dealing with this)
An ARIMA (p,d,q) model is composed of three elements:
p: Autoregression
d: Integration or Differencing
q: Moving Average
A simple ARIMA (0,0,0) model without any of the three processes above is written as:
Y
t
= a
t
The autoregression process [ARIMA (p,0,0)] refers to how important previous values are
to the current one over time. A data value at t
1
may affect the data value of the series at t
2
and t
3
. But the data value at t
1
will decrease on an exponential basis as time passes so that
the effect will decrease to near zero. It should be pointed out that is constrained
between 1 and 1 and as it becomes larger, the effects at all subsequent lags increase.
Y
t
=
1
Y
t1
+ a
t
The integration process [ARIMA (0,d,0)] is differenced to remove the trend and drift of
the data (i.e. makes nonstationary data stationary). The first observation is subtracted
from the second and the second from the third and . So the final form without AR or
MA processes is the ARIMA (0,1,0) model:
Y
t
= Y
t1
+ a
t
The order of the process rarely exceeds one (d < 2 in most situations).
The moving average process [ARIMA (0,0,q)] is used for serial correlated data. The
process is composed of the current random shock and portions of the q previous shocks.
An ARIMA (0,0,1) model is described as:
Y
t
= a
t

1
a
t1
As with the integration process, the MA process rarely exceeds the first order.
i
Reference: http://userwww.sfsu.edu/~efc/classes/biol710/meseries/TimeSeriesAnalysis.html