Professional Documents
Culture Documents
Introduction
Time series data have a natural temporal ordering. This makes time series
analysis distinct from other common data analysis problems, in which there is no
natural ordering of the observations (e.g. explaining people's wages by reference
to their education level, where the individuals' data could be entered in any order).
Time series analysis is also distinct from spatial data analysis where the
observations typically relate to geographical locations (e.g. accounting for house
prices by the location as well as the intrinsic characteristics of the houses). A time
series model will generally reflect the fact that observations close together in time
will be more closely related than observations further apart. In addition, time series
models will often make use of the natural one-way ordering of time so that values
for a given period will be expressed as deriving in some way from past values,
rather than from future values.
There are several types of data analysis available for time series which are
appropriate for different purposes.
General exploration
Description
Models for time series data can have many forms and represent
different stochastic processes. When modeling variations in the level of a process,
three broad classes of practical importance are the autoregressive (AR) models,
the integrated (I) models, and the moving average(MA) models. These three
classes depend linearly on previous data points. Combinations of these ideas
produce autoregressive moving average (ARMA) and autoregressive integrated
moving average (ARIMA) models. The autoregressive fractionally integrated
moving average(ARFIMA) model generalizes the former three. Extensions of
these classes to deal with vector-valued data are available under the heading of
multivariate time-series models and sometimes the preceding acronyms are
extended by including an initial "V" for "vector". An additional set of extensions of
these models is available for use where the observed time-series is driven by
some "forcing" time-series (which may not have a causal effect on the observed
series): the distinction from the multivariate case is that the forcing series may be
deterministic or under the experimenter's control. For these models, the acronyms
are extended with a final "X" for "exogenous".
Among other types of non-linear time series models, there are models to represent
the changes of variance along time. These models are called autoregressive
conditional heteroskedasticity (ARCH) and the collection comprises a wide variety
of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc). Here
changes in variability are related to, or predicted by, recent past values of the
observed series. This is in contrast to other possible representations of locally-
varying variability, where the variability might be modelled as being driven by a
separate time-varying process, as in a doubly stochastic model.
In recent work on model-free analyses, wavelet transform based methods (for
example locally stationary wavelets and wavelet decomposed neural networks)
have gained favor. Multiscale (often referred to as multiresolution) techniques
decompose a given time series, attempting to illustrate time dependence at
multiple scales.
Notation
Y = {Yt: t T}.
Conditions
There are two sets of conditions under which much of the theory is built:
Stationary process
Ergodicity
In most cases, where only a single time series exists to be analysed, the
variance of the e's is estimated by fitting a trend, thus allowing a t + b to be
removed and leaving the e's as residuals, and calculating the variance of
the e's from the residuals — this is often the only way of estimating the
variance of the e's.
One particular special case of great interest, the (global) temperature time
series, is known not to be homogeneous in time: apart from anything else, the
number of weather observations has (generally) increased with time, and thus
the error associated with estimating the global temperature from a limited set
of observations has decreased with time. In fitting a trend to this data, this can
be taken into account, as described above.
Once we know the "noise" of the series, we can then assess the significance
of the trend by making the null hypothesis that the trend, a, is not significantly
different from 0. From the above discussion of trends in random data with
known variance, we know the distribution of trends to be expected from
random (trendless) data. If the calculated trend, a, is larger than the value, V,
then the trend is deemed significantly differentiable from zero at significance
level S.
It is harder to see a trend in a noisy time series. For example, if the true series is
0, 1, 2, 3 all plus some independent normally distributed "noise" e of standard
deviation E, and we have a sample series of length 50, then if E = 0.1 the trend
will be obvious; if E = 100 the trend will probably be visible; but if E = 10000 the
trend will be buried in the noise.
Review Question
3. In what way, trend neutralizes the noise (Seasonal, Cyclical and Irregular
Oscillations)
Time Series Analysis with Moving Average
Given a series of numbers and a fixed subset size, the moving average can be
obtained by first taking the average of the first subset. The fixed subset size is
then shifted forward, creating a new subset of numbers, which is averaged. This
process is repeated over the entire data series. The plot line connecting all the
(fixed) averages is the moving average. Thus, a moving average is not a single
number, but it is a set of numbers, each of which is the average of the
corresponding subset of a larger set of data points. A moving average may also
use unequal weights for each data value in the subset to emphasize particular
values in the subset.
When calculating successive values, a new value comes into the sum and an
old value drops out, meaning a full summation each time is unnecessary,
In technical analysis there are various popular values for n, like 10 days,
40 days, or 200 days. The period selected depends on the kind of
movement one is concentrating on, such as short, intermediate, or long
term. In any case moving average levels are interpreted as support in a
rising market, or resistance in a falling market.
In all cases a moving average lags behind the latest data point, simply
from the nature of its smoothing. An SMA can lag to an undesirable
extent, and can be disproportionately influenced by old data points
dropping out of the average. This is addressed by giving extra weight to
more recent data points, as in the weighted and exponential moving
averages.
In some data acquisition systems, the data arrives in an ordered data stream and
the statistician would like to get the average of all of the data up until the current
data point. For example, an investor may want the average price of all of the stock
transactions for a particular stock up until the current time. As each new
transaction occurs, the average price at the time of the transaction can be
calculated for all of the transactions up to that point using the cumulative average.
This is the cumulative average, which is typically an unweighted average of the
sequence of i values x1... xi up to the current time:
The brute force method to calculate this would be to store all of the data and
calculate the sum and divide by the number of data points every time a new
data point arrived. However, it is possible to simply update cumulative
average as a new value xi+1 becomes available, using the formula:
The brute force method to calculate this would be to store all of the data and
calculate the sum and divide by the number of data points every time a new data
point arrived. However, it is possible to simply update cumulative average as a
new value xi+1 becomes available, using the formula:
Thus the current cumulative average for a new data point is equal to the
previous cumulative average plus the difference between the latest data point
and the previous average divided by the number of points received so far.
When all of the data points arrive (i = N), the cumulative average will equal
the final average.
A weighted average is any average that has multiplying factors to give different
weights to different data points. Mathematically, the moving average is
the convolution of the data points with a moving average function; in technical
analysis, a weighted moving average (WMA) has the specific meaning of
weights that decrease arithmetically. In an n-day WMA the latest day has weight
n, the second latest n − 1, etc, down to zero.
WMA weights n = 15
as
When calculating the WMA across successive values, it can be noted the
difference between the numerators of WMA M+1 and
WMAM is npM+1 − pM − ... − pM−n+1. If we denote the sumpM + ... + pM−n+1 by Total
M, then
Review Question
1. Explain the term moving average. Why this method is more reliable than the
trend to analyze the time series?
Introduction
Values of α close to one have less of a smoothing effect and give greater
weight to recent changes in the data, while values of α closer to zero have a
greater smoothing effect and are less responsive to recent changes. There is
no formally correct procedure for choosing α. Sometimes the statistician's
judgment is used to choose an appropriate factor. Alternatively, a statistical
technique may be used to optimizethe value of α. For example, the method of
least squares might be used to determine the value of α for which the sum of
the quantities (sn-1 − xn-1)2 is minimized.
Why is it "exponential"?
Simple exponential smoothing does not do well when there is a trend in the data.
[1]
In such situations, double exponential smoothing can be used.
Review Question
Given a time series of data Xt, the ARMA model is a tool for understanding and,
perhaps, predicting future values in this series. The model consists of two parts,
an autoregressive (AR) part and a moving average (MA) part. The model is usually
then referred to as the ARMA(p,q) model where p is the order of the
autoregressive part and q is the order of the moving average part
Autoregressive model
The notation AR(p) refers to the autoregressive model of order p. The AR(p)
model is written
Some constraints are necessary on the values of the parameters of this model
in order that the model remains stationary. For example, processes in the
AR(1) model with |φ1| ≥ 1 are not stationary.
Autoregressive moving average model
In some texts the models will be specified in terms of the lag operator L. In these
terms then the AR(p) model is given by
or more concisely,
Fitting models
Applications
When one of the terms is zero, it's usual to drop AR, I or MA. For example,
an I(1) model is ARIMA(0,1,0), and a MA(1) model is ARIMA(0,0,1).
Definition
In these cases the ARIMA model can be viewed as a "cascade" of two models.
The first is non-stationary:
Description
There are algorithms, not discussed here, for estimating the partial autocorrelation
based on the sample autocorrelations. These algorithms derive from the exact
theoretical relation between the partial autocorrelation function and the
autocorrelation function.
Partial autocorrelation plots (Box and Jenkins, pp. 64–65, 1970) are a commonly
used tool for model identification in Box-Jenkins models. Specifically, partial
autocorrelations are useful in identifying the order of an autoregressive model. The
partial autocorrelation of an AR(p) process is zero at lag p+1 and greater. If the
sample autocorrelation plot indicates that an AR model may be appropriate, then
the sample partial autocorrelation plot is examined to help identify the order. One
looks for the point on the plot where the partial autocorrelations for all higher lags
are essentially zero. Placing on the plot an indication of the sampling uncertainty
of the sample PACF is helpful for this purpose: this is usually constructed on the
basis that the true value of the PACF, at any given positive lag, is zero. This can
be formalised as described below.
An approximate test that a given partial correlation is zero (at a 5% significance
level) is given by comparing the sample partial autocorrelations against the critical
region with upper and lower limits given by , where n is the record length
(number of points) of the time-series being analysed. This approximation relies on
the assumption that the record length is moderately large (say n>30) and that the
underlying process has a multivariate normal distribution.
Review Questions