You are on page 1of 22

Unit III

Time Series Analysis


Lesson 6

Introduction

A time series is a sequence of data points, measured typically at successive


times spaced at uniform time intervals. Examples of time series are the daily
closing value of the Dow Jones index or the annual flow volume of the Nile River
at Aswan. Time series analysis comprises methods for analyzing time series
data in order to extract meaningful statistics and other characteristics of the
data. Time series forecasting is the use of a model to forecast future events
based on known past events: to predict data points before they are measured. An
example of time series forecasting in econometrics is predicting the opening price
of a stock based on its past performance. Time series are very frequently plotted
via line charts.

Time series data have a natural temporal ordering. This makes time series
analysis distinct from other common data analysis problems, in which there is no
natural ordering of the observations (e.g. explaining people's wages by reference
to their education level, where the individuals' data could be entered in any order).
Time series analysis is also distinct from spatial data analysis where the
observations typically relate to geographical locations (e.g. accounting for house
prices by the location as well as the intrinsic characteristics of the houses). A time
series model will generally reflect the fact that observations close together in time
will be more closely related than observations further apart. In addition, time series
models will often make use of the natural one-way ordering of time so that values
for a given period will be expressed as deriving in some way from past values,
rather than from future values.
There are several types of data analysis available for time series which are
appropriate for different purposes.

General exploration

 Graphical examination of data series


 Autocorrelation analysis to examine serial dependence
 Spectral analysis to examine cyclic behaviour which need not be related
to seasonality. For example, sun spot activity varies over 11 year cycles. [1]
[2]
 Other common examples include celestial phenomena, weather patterns,
neural activity, commodity prices, and economic activity.

Description

 Separation into components representing trend, seasonality, slow and fast


variation, cyclical irregular: see decomposition of time series
 Simple properties of marginal distributions

Prediction and forecasting

 Fully-formed statistical models for stochastic simulation purposes, so as to


generate alternative versions of the time series, representing what might
happen over non-specific time-periods in the future
 Simple or fully-formed statistical models to describe the likely outcome of
the time series in the immediate future, given knowledge of the most recent
outcomes (forecasting).
Models

Models for time series data can have many forms and represent
different stochastic processes. When modeling variations in the level of a process,
three broad classes of practical importance are the autoregressive (AR) models,
the integrated (I) models, and the moving average(MA) models. These three
classes depend linearly on previous data points. Combinations of these ideas
produce autoregressive moving average (ARMA) and autoregressive integrated
moving average (ARIMA) models. The autoregressive fractionally integrated
moving average(ARFIMA) model generalizes the former three. Extensions of
these classes to deal with vector-valued data are available under the heading of
multivariate time-series models and sometimes the preceding acronyms are
extended by including an initial "V" for "vector". An additional set of extensions of
these models is available for use where the observed time-series is driven by
some "forcing" time-series (which may not have a causal effect on the observed
series): the distinction from the multivariate case is that the forcing series may be
deterministic or under the experimenter's control. For these models, the acronyms
are extended with a final "X" for "exogenous".

Non-linear dependence of the level of a series on previous data points is of


interest, partly because of the possibility of producing a chaotictime series.
However, more importantly, empirical investigations can indicate the advantage of
using predictions derived from non-linear models, over those from linear models.

Among other types of non-linear time series models, there are models to represent
the changes of variance along time. These models are called autoregressive
conditional heteroskedasticity (ARCH) and the collection comprises a wide variety
of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc). Here
changes in variability are related to, or predicted by, recent past values of the
observed series. This is in contrast to other possible representations of locally-
varying variability, where the variability might be modelled as being driven by a
separate time-varying process, as in a doubly stochastic model.
In recent work on model-free analyses, wavelet transform based methods (for
example locally stationary wavelets and wavelet decomposed neural networks)
have gained favor. Multiscale (often referred to as multiresolution) techniques
decompose a given time series, attempting to illustrate time dependence at
multiple scales.

Notation

A number of different notations are in use for time-series analysis:

X = {X1, X2, ...}

is a common notation which specifies a time series X which is indexed by


the natural numbers. Another common notation is:

Y = {Yt: t T}.

Conditions

There are two sets of conditions under which much of the theory is built:

 Stationary process
 Ergodicity

However, ideas of stationarity must be expanded to consider two important


ideas: strict stationarity and second-order stationarity. Both models and
applications can be developed under each of these conditions, although the
models in the latter case might be considered as only partly specified.

In addition, time-series analysis can be applied where the series are seasonally


stationary or non-stationary. Situations where the amplitudes of frequency
components change with time can be dealt with in time-frequency analysis which
makes use of a time–frequency representationof a time-series or signal.[4].
Review Questions

Give a general explanation of Time Series. Elucidate the models in Time


series Analysis
Lesson 7

Time Series Analysis

Trend in Time Series

To analyse a (time) series of data, we assume that it may be represented as trend


plus noise which is represented by Seasonal Variations, Cyclical Variations and
Irregular oscillations:

Where a and b are (usually unknown) constants and the e's are independent


randomly distributed "errors". Unless something special is known about
the e's, they will be assumed to have a normal distribution. It is simplest if
the e's all have the same distribution, but if not (if some have higher variance,
meaning that those data points are effectively less certain) then this can be
taken into account during the least squares fitting, by weighting each point by
the inverse of the variance of that point.

In most cases, where only a single time series exists to be analysed, the
variance of the e's is estimated by fitting a trend, thus allowing a t + b to be
removed and leaving the e's as residuals, and calculating the variance of
the e's from the residuals — this is often the only way of estimating the
variance of the e's.

One particular special case of great interest, the (global) temperature time
series, is known not to be homogeneous in time: apart from anything else, the
number of weather observations has (generally) increased with time, and thus
the error associated with estimating the global temperature from a limited set
of observations has decreased with time. In fitting a trend to this data, this can
be taken into account, as described above.
Once we know the "noise" of the series, we can then assess the significance
of the trend by making the null hypothesis that the trend, a, is not significantly
different from 0. From the above discussion of trends in random data with
known variance, we know the distribution of trends to be expected from
random (trendless) data. If the calculated trend, a, is larger than the value, V,
then the trend is deemed significantly differentiable from zero at significance
level S.

Variations in time series with example

It is harder to see a trend in a noisy time series. For example, if the true series is
0, 1, 2, 3 all plus some independent normally distributed "noise" e of standard
deviation E, and we have a sample series of length 50, then if E = 0.1 the trend
will be obvious; if E = 100 the trend will probably be visible; but if E = 10000 the
trend will be buried in the noise.

If we consider a concrete example, the global surface temperature record of the


past 140 years as presented by the IPCC: [1], then the interannual variation is
about 0.2°C and the trend about 0.6°C over 140 years, with 95% confidence limits
of 0.2°C (by coincidence, about the same value as the interannual variation).
Hence the trend is statistically different from 0.

Review Question

1. Elucidate the trend and its modeling in time series.

2. What do you understand by noise (errors/ variations) in time series? How


do you explain these oscillations?

3. In what way, trend neutralizes the noise (Seasonal, Cyclical and Irregular
Oscillations)
Time Series Analysis with Moving Average

A moving average, also called rolling average, rolling mean or running


average, is a type of finite impulse response filter used to analyze a set of data
points by creating a series of averages of different subsets of the full data set.

Given a series of numbers and a fixed subset size, the moving average can be
obtained by first taking the average of the first subset. The fixed subset size is
then shifted forward, creating a new subset of numbers, which is averaged. This
process is repeated over the entire data series. The plot line connecting all the
(fixed) averages is the moving average. Thus, a moving average is not a single
number, but it is a set of numbers, each of which is the average of the
corresponding subset of a larger set of data points. A moving average may also
use unequal weights for each data value in the subset to emphasize particular
values in the subset.

A moving average is commonly used with time series data to smooth out short-


term fluctuations and highlight longer-term trends or cycles. The threshold
between short-term and long-term depends on the application, and the parameters
of the moving average will be set accordingly. For example, it is often used
in technical analysis of financial data, like stock prices, returns or trading volumes.
It is also used in economics to examine gross domestic product, employment or
other macroeconomic time series. Mathematically, a moving average is a type
of convolution and so it is also similar to the low-pass filter used in signal
processing. When used with non-time series data, a moving average simply acts
as a generic smoothing operation without any specific connection to time, although
typically some kind of ordering is implied.

Simple moving average

A simple moving average (SMA) is the un-weighted mean of the previous n data


points. For example, a 10-day simple moving average of closing price is the mean
of the previous 10 days' closing prices. If those prices are   
then the formula is

When calculating successive values, a new value comes into the sum and an
old value drops out, meaning a full summation each time is unnecessary,

In technical analysis there are various popular values for n, like 10 days,
40 days, or 200 days. The period selected depends on the kind of
movement one is concentrating on, such as short, intermediate, or long
term. In any case moving average levels are interpreted as support in a
rising market, or resistance in a falling market.

In all cases a moving average lags behind the latest data point, simply
from the nature of its smoothing. An SMA can lag to an undesirable
extent, and can be disproportionately influenced by old data points
dropping out of the average. This is addressed by giving extra weight to
more recent data points, as in the weighted and exponential moving
averages.

One characteristic of the SMA is that if the data have a periodic


fluctuation, then applying an SMA of that period will eliminate that
variation (the average always containing one complete cycle). But a
perfectly regular cycle is rarely encountered in economics or finance. [1]

For a number of applications it is advantageous to avoid the shifting


induced by using only 'past' data. Hence a central moving average can
be computed, using both 'past' and 'future' data. The 'future' data in this
case are not predictions, but merely data obtained after the time at which
the average is to be computed.
Cumulative moving average

The cumulative moving average is also frequently called a running average or


a long running average although the term running average is also used as
synonym for a moving average. This article uses the term cumulative moving
average or simply cumulative average since this term is more descriptive and
unambiguous.

In some data acquisition systems, the data arrives in an ordered data stream and
the statistician would like to get the average of all of the data up until the current
data point. For example, an investor may want the average price of all of the stock
transactions for a particular stock up until the current time. As each new
transaction occurs, the average price at the time of the transaction can be
calculated for all of the transactions up to that point using the cumulative average.
This is the cumulative average, which is typically an unweighted average of the
sequence of i values x1... xi up to the current time:

The brute force method to calculate this would be to store all of the data and
calculate the sum and divide by the number of data points every time a new
data point arrived. However, it is possible to simply update cumulative
average as a new value xi+1 becomes available, using the formula:

The brute force method to calculate this would be to store all of the data and
calculate the sum and divide by the number of data points every time a new data
point arrived. However, it is possible to simply update cumulative average as a
new value xi+1 becomes available, using the formula:

Where CA0 can be taken to be equal to 0.

Thus the current cumulative average for a new data point is equal to the
previous cumulative average plus the difference between the latest data point
and the previous average divided by the number of points received so far.
When all of the data points arrive (i = N), the cumulative average will equal
the final average.

The derivation of the cumulative average formula is straightforward. Using

and similarly for i + 1, it is seen that

Solving this equation for CAi+1 results in:

Weighted moving average

A weighted average is any average that has multiplying factors to give different
weights to different data points. Mathematically, the moving average is
the convolution of the data points with a moving average function; in technical
analysis, a weighted moving average (WMA) has the specific meaning of
weights that decrease arithmetically.  In an n-day WMA the latest day has weight
n, the second latest n − 1, etc, down to zero.

WMA weights n = 15

The denominator is a triangle number, and can be easily computed

as 
When calculating the WMA across successive values, it can be noted the
difference between the numerators of WMA M+1 and
WMAM is npM+1 − pM − ... − pM−n+1. If we denote the sumpM + ... + pM−n+1 by Total
M, then

Review Question

1. Explain the term moving average. Why this method is more reliable than the
trend to analyze the time series?

2. Explain following terminologies:

a. Cumulative Moving Average,

b. Weighted Moving Average,

c. Centered Moving Average


Lesson 8

Time Series Smoothing

Introduction

Exponential smoothing is a technique that can be applied to time series data,


either to produce smoothed data for presentation, or to make forecasts. The time
series data themselves are a sequence of observations. The observed
phenomenon may be an essentially random process, or it may be an orderly,
but noisy, process. Whereas in the simple moving average the past observations
are weighted equally, exponential smoothing assigns exponentially decreasing
weights over time.

Exponential smoothing is commonly applied to financial market and economic


data, but it can be used with any discrete set of repeated measurements. The raw
data sequence is often represented by {xt}, and the output of the exponential
smoothing algorithm is commonly written as {st} which may be regarded as our
best estimate of what the next value of x will be. When the sequence of
observations begins at time t = 0, the simplest form of exponential smoothing is
given by the formulas

where α is the smoothing factor, and 0 < α < 1.

The exponential moving average

The simplest form of exponential smoothing is given by the formulae


where α is the smoothing factor, and 0 < α < 1. In other words, the smoothed
statistic st is a simple weighted average of the previous observation xt-1 and
the previous smoothed statistic st−1. The term smoothing factor applied to α
here is something of a misnomer, as larger values of α actually reduce the
level of smoothing. In the limiting case with α = 1 the output series is just the
same as the original series. Simple exponential smoothing is easily applied,
and it produces a smoothed statistic as soon as two observations are
available.

Values of α close to one have less of a smoothing effect and give greater
weight to recent changes in the data, while values of α closer to zero have a
greater smoothing effect and are less responsive to recent changes. There is
no formally correct procedure for choosing α. Sometimes the statistician's
judgment is used to choose an appropriate factor. Alternatively, a statistical
technique may be used to optimizethe value of α. For example, the method of
least squares might be used to determine the value of α for which the sum of
the quantities (sn-1 − xn-1)2 is minimized.

This simple form of exponential smoothing is also known as an "exponentially


weighted moving average". Technically it can also be classified as
an Autoregressive integrated moving average (ARIMA) (0,1,1) model with no
constant term.[2]

Why is it "exponential"?

By direct substitution of the defining equation for simple exponential smoothing


back into itself we find that

In other words, as time passes the smoothed statistic st becomes the


weighted average of a greater and greater number of the past
observations xt−n, and the weights assigned to previous observations are in
general proportional to the terms of the geometric progression
{1, (1 − α), (1 − α)2, (1 − α)3, …}. A geometric progression is the discrete
version of an exponential function, so this is where the name for this
smoothing method originated.

Double exponential smoothing

Simple exponential smoothing does not do well when there is a trend in the data.
[1]
 In such situations, double exponential smoothing can be used.

Again, the raw data sequence of observations is represented by {xt}, beginning at


time t = 0. We use {st} to represent the smoothed value for time t, and {bt} is our
best estimate of the trend at time t. The output of the algorithm is now written
as Ft+m, an estimate of the value of x at time t+m, m>0 based on the raw data up to
time t. Double exponential smoothing is given by the formulas [1]

where α is the data smoothing factor, 0 < α < 1, β is the trend smoothing


factor, 0 < β < 1, and b0 is taken as (xn-1 - x0)/(n - 1) for somen > 1.

Review Question

What do you understand by smoothing? elucidate


Autoregressive moving average (ARMA) models

Also called Box-Jenkins models after the iterative Box-Jenkins methodology


usually used to estimate them, are typically applied to auto correlated time
series data.

Given a time series of data Xt, the ARMA model is a tool for understanding and,
perhaps, predicting future values in this series. The model consists of two parts,
an autoregressive (AR) part and a moving average (MA) part. The model is usually
then referred to as the ARMA(p,q) model where p is the order of the
autoregressive part and q is the order of the moving average part

Autoregressive model

The notation AR(p) refers to the autoregressive model of order p. The AR(p)
model is written

where   are the parameters of the model, c is a constant and   


is white noise. The constant term is omitted by many authors for simplicity.

An autoregressive model is essentially an all-pole infinite impulse


response filter with some additional interpretation placed on it.

Some constraints are necessary on the values of the parameters of this model
in order that the model remains stationary. For example, processes in the
AR(1) model with |φ1| ≥ 1 are not stationary.
Autoregressive moving average model

The notation ARMA(p, q) refers to the model with p autoregressive terms


and q moving average terms. This model contains the AR(p) and MA(q) models,

Specification in terms of lag operator

In some texts the models will be specified in terms of the lag operator L. In these
terms then the AR(p) model is given by

where   represents the polynomial

The MA(q) model is given by

where θ represents the polynomial

Finally, the combined ARMA(p, q) model is given by

or more concisely,
Fitting models

ARMA models in general can, after choosing p and q, be fitted by least


squares regression to find the values of the parameters which minimize the error
term. It is generally considered good practice to find the smallest values of p and q
which provide an acceptable fit to the data. For a pure AR model the Yule-Walker
equations may be used to provide a fit.

Finding appropriate values of p and q in the ARMA(p,q) model can be facilitated by


plotting the partial autocorrelation functions for an estimate of p, and likewise using
the autocorrelation functions for an estimate of q. Further information can be
gleaned by considering the same functions for the residuals of a model fitted with
an initial selection of p and q.

Applications

ARMA is appropriate when a system is a function of a series of unobserved


shocks (the MA part) as well as its own behavior. For example, stock prices may
be shocked by fundamental information as well as exhibiting technical trending
and mean-reversion effects due to market participants. When looking at long term
data, econometricians tend to opt for an Ar(p) model for simplicity.

Autoregressive Integrated Moving Average (ARIMA)

In time series analysis, an autoregressive integrated moving average


(ARIMA) model is a generalization of an autoregressive moving average (ARMA)
model. These models are fitted to time series data either to better understand the
data or to predict future points in the series (forecasting). They are applied in some
cases where data show evidence of non-stationarity, where an initial differencing
step (corresponding to the "integrated" part of the model) can be applied to
remove the non-stationarity.
The model is generally referred to as an ARIMA (p, d, q) model where p, d,
and q are integers greater than or equal to zero and refer to the order of the
autoregressive, integrated, and moving average parts of the model respectively.
ARIMA models form an important part of the Box-Jenkins approach to time-series
modeling.

When one of the terms is zero, it's usual to drop AR, I or MA. For example,
an I(1) model is ARIMA(0,1,0), and a MA(1) model is ARIMA(0,0,1).

Definition

Given a time series of data Xt where t is an integer index and the Xt are real


numbers, then an ARMA(p,q) model is given by:

where L is the lag operator, the αi are the parameters of the autoregressive part of


the model, the θi are the parameters of the moving average part and the   are
error terms. The error terms   are generally assumed to be independent,
identically distributed variables sampled from a normal distribution with zero mean.

Assume now that the polynomial   has a unitary root of


multiplicity d. Then it can be rewritten as:

An ARIMA(p,d,q) process expresses this polynomial factorisation property,


and is given by:
and thus can be thought as a particular case of an ARMA(p+d,q) process
having the auto-regressive polynomial with some roots in the unity. For
this reason every ARIMA model with d>0 is not wide sense stationary.

Forecasts using ARIMA models

ARIMA models are used for observable non-stationary processes Xt that have


some clearly identifiable trends:

 constant trend (i.e. a non-zero average) leads to d = 1


 linear trend (i.e. a linear growth behavior) leads to d = 2
 quadratic trend (i.e. a quadratic growth behavior) leads to d = 3

In these cases the ARIMA model can be viewed as a "cascade" of two models.
The first is non-stationary:

while the second is wide-sense stationary:

Now standard forecasts techniques can be formulated for the process Yt,


and then (having the sufficient number of initial conditions) Xt can be
forecasted via opportune integration steps.
Partial autocorrelation function

In time series analysis, the partial autocorrelation function (PACF) or Partial


autoCORrelation (PARCOR) plays an important role in data analyses aimed at
identifying the extent of the lag in an autoregressive model. The use of this
function was introduced as part of the Box approach to time series modeling,
where by plotting the partial auto correlative functions one could determine the
appropriate lags p in an AR(p) model or in an extended ARIMA(p,d,q) model.

Description

Given a time series zt, the partial autocorrelation of lag k is


the autocorrelation between zt and zt + k with the linear dependence of zt + 1through
to zt + k − 1 removed; equivalently, it is the autocorrelation between zt and zt − k that is
not accounted for by lags 1 to k-1, inclusive.

There are algorithms, not discussed here, for estimating the partial autocorrelation
based on the sample autocorrelations. These algorithms derive from the exact
theoretical relation between the partial autocorrelation function and the
autocorrelation function.

Partial autocorrelation plots (Box and Jenkins, pp. 64–65, 1970) are a commonly
used tool for model identification in Box-Jenkins models. Specifically, partial
autocorrelations are useful in identifying the order of an autoregressive model. The
partial autocorrelation of an AR(p) process is zero at lag p+1 and greater. If the
sample autocorrelation plot indicates that an AR model may be appropriate, then
the sample partial autocorrelation plot is examined to help identify the order. One
looks for the point on the plot where the partial autocorrelations for all higher lags
are essentially zero. Placing on the plot an indication of the sampling uncertainty
of the sample PACF is helpful for this purpose: this is usually constructed on the
basis that the true value of the PACF, at any given positive lag, is zero. This can
be formalised as described below.
An approximate test that a given partial correlation is zero (at a 5% significance
level) is given by comparing the sample partial autocorrelations against the critical

region with upper and lower limits given by  , where n is the record length
(number of points) of the time-series being analysed. This approximation relies on
the assumption that the record length is moderately large (say n>30) and that the
underlying process has a multivariate normal distribution.

Review Questions

1. Explain in detail the Time Series Modeling.

2. Explain the process by which different components of Time series are


neutralized.

3. Explain the smoothing techniques. Elucidate ARMA and ARIMA.

You might also like