This action might not be possible to undo. Are you sure you want to continue?

**TIME-SERIES ANALYSIS AND FORECASTING
**

At The Institute of Advanced Studies, Vienna

from March 22nd to April 2, 1993

Lecturer : D.S.G. Pollock

Queen Mary and Westﬁeld College,

The University of London

This course is concerned with the methods of time-series modelling which are

applicable in econometrics and throughout a wide range of disciplines in the

physical and social sciences. The course is for nonspecialists who may be inter-

ested in pursuing this topic as an adjunct to their other studies and who might

envisage employing the techniques of time-series analysis in empirical enquiries

within the context of their own disciplines.

The course is mathematically self-contained in the sense that the requisite

results are presented either in the lectures themselves or in the accompanying

text. The techniques of the frequency domain and the time domain are given

an equal emphasis in this course.

Week 1

1 Trends in Time Series

2 Cycles in Time Series

3 Models and Methods of Time-Series Analysis

4 Time-Series Analysis in the Frequency Domain

5 Linear Stochastic Models

Week 2

6 State-Space Analysis and Structural Time-Series Models

7 Forecasting with ARIMA Models

8 Identiﬁcation and Estimation of ARIMA Models

9 Identiﬁcation and Estimation in the Frequency Domain

10 Seasonality and Linear Filtering

In addition, there will be a public Lecture on the topic of The Methods of

Time-Series Analysis which is to take place on ***** in ***** at *****. This

lecture will give a broad overview of the mathematical themes of time-series

analysis and of the historical development of the subject; and it is intended for

an audience with no signiﬁcant knowledge of the subject.

LECTURES IN TIME-SERIES ANALYSIS

AND FORECASTING

by

D.S.G. Pollock

Queen Mary and Westﬁeld College,

The University of London

These two booklets contain some of the material of the courses

titled Methods of Time-Series Analysis and Economic Forecasting

which have been taught in the Department of Economics of Queen

Mary College in recent years. The material is presented in the form of

a series of ten lectures for a course given at the Institute for Advanced

Studies in Vienna titled A Short Course in Time-Series Analysis.

Book 1

1 Trends in Economic Time Series

2 Seasons and Cycles in Time Series

3 Models and Methods of Time-Series Analysis

4 Time-Series Analysis in the Frequency Domain

5 Linear Stochastic Models

Book 2

6 State-Space Analysis and Structural Time-Series Models

7 Forecasting with ARIMA Models

8 Identiﬁcation and Estimation of ARIMA Models

9 Identiﬁcation and Estimation in the Frequency Domain

10 Seasonality and Linear Filtering

THE METHODS OF TIME-SERIES ANALYSIS

by

D.S.G. Pollock

Queen Mary and Westﬁeld College,

The University of London

This paper describes some of the principal themes of time-series analysis

and it gives an historical account of their development.

There are two distinct yet broadly equivalent modes of time-series anal-

ysis which may be pursued. On the one hand there are the time-domain

methods which have their origin in the classical theory of correlation; and

they lead inevitably towards the construction of structural or parametric

models of the autoregressive moving-average type. On the other hand are

the frequency-domain methods of spectral analysis which are based on an

extension of the methods of Fourier analysis.

The paper describes the developments which led to the synthesis of

the two branches of time-series analysis and it indicates how this synthesis

was achieved.

It remains true that the majority of time-series analysts operate prin-

cipally in one or other of the two domains. Such specialisation is often

inﬂuenced by the academic discipline to which the analyst adheres. How-

ever, it is clear that there are many advantages to be derived from pursuing

the two modes of analysis concurrently.

Address for correspondence:

D.S.G. Pollock

Department of Economics

Queen Mary College

University of London

Mile End Road

London E1 4 NS

Tel : +44-71-975-5096

Fax : +44-71-975-5500

LECTURE 1

Trends in Economic

Time Series

In many time series, broad movements can be discerned which evolve more

gradually than the other motions which are evident. These gradual changes are

described as trends and cycles. The changes which are of a transitory nature

are described as ﬂuctuations.

In some cases, the trend should be regarded as nothing more than the

accumulated eﬀect of the ﬂuctuations. In other cases, we feel that the trends

and the ﬂuctuations represent diﬀerent sorts of inﬂuences, and we are inclined

to decompose the time series into the corresponding components.

In economics, it is traditional to decompose time series into a variety of

components, some or all of which may be present in a particular instance. If

{Y

t

} is the sequence of values of an economic index, then its generic element is

liable to be expressed as

(1.1) Y

t

= T

t

+C

t

+S

t

+ε

t

,

where

T

t

is the global trend,

C

t

is a secular cycle,

S

t

is the seasonal variation and

ε

t

is an irregular component.

Many of the more prominent macroeconomic indicators are amenable to

a decomposition of the sort depicted above. One can imagine, for example, a

quarterly index of Gross National Product which appears to be following an

exponential growth trend {T

t

}.

The growth trend might be obscured, to some extent, by a superimposed

cycle {C

t

} with a period of roughly four and a half years, which happens to

correspond, more or less, to the average lifetime of the legislative assembly.

The reasons for this curious coincidence need not concern us here.

The ghost of an annual cycle {S

t

} might also be apparent in the index;

and this could well be a reﬂection of the fact that some economic activities,

1

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

such as building construction, are signiﬁcantly aﬀected by the weather and by

the duration of sunlight.

When the foregoing components—the trend, the secular cycle and the sea-

sonal cycle—have been extracted from the index, the residue should correspond

to an irregular component {ε

t

} for which no unique explanation can be oﬀered.

This component ought to resemble a time series generated by a so-called sta-

tionary stochastic process. Such a series has the characteristic that any segment

of consecutive elements looks much like any other segment of the same duration,

regardless of the date at which it begins or ends.

If the residue follows a trend, or if it manifests a more or less regular

pattern, then it contains features which ought to have been attributed to the

other components; and we should set about the task of redeﬁning them.

There are two distinct purposes for which we might wish to eﬀect such

a decomposition. The ﬁrst purpose is to give a summary description of the

salient features of the time series. Thus, if we eliminate the irregular and

seasonal components from the series, we are left with an index which may give

a clearer picture of the more important features. This might help us to gain

an insight into the fundamental workings of the economic or social structure

which has generated the series.

The other purpose in decomposing the series is to predict its future values.

For each component of the time series, a particular method of prediction is ap-

propriate. By combining the separate predictions of the components, a forecast

can be derived which may be superior to one derived by a method which pays

no attention to the underlying structure of the time series.

Extracting the Trend

There are essentially two ways of extracting trends from a time series. The

ﬁrst way is to apply to the series a variety of so-called ﬁlters which annihilate

or nullify all of the components which are not regarded as trends.

A ﬁlter is a carefully crafted moving average which spans a number of data

points and which attributes a weight to each of them. The weights should sum

to unity to ensure that the ﬁlter does not systematically inﬂate or deﬂate the

values of the series. Thus, for example, the following moving average might

serve to eliminate the annual cycle from an economic series which is recorded

at quarterly intervals:

(1.2)

ˆ

Y

t

=

1

16

Y

t+3

+ 2Y

t+2

+ 3Y

t+1

+ 4Y

t

+ 3Y

t−1

+ 2Y

t−2

+Y

t−3

.

Another ﬁlter with a wider span and a diﬀerent proﬁle of weights might serve

to eliminate the four-and-a-half-year cycle which is present in our imaginary

series of Gross National Product.

2

D.S.G. POLLOCK : TRENDS IN TIME SERIES

Finally a ﬁlter could be designed which smooths away the irregularities

of the index which defy systematic explanation. The order in which the three

ﬁlters are applied is immaterial; and what is left after they have been applied

should give a picture of the underlying trend {T

t

} of the index.

Other collections of ﬁlters, applied in series, might serve to isolate the

other components {C

t

} and {S

t

} which are to be found in equation (1).

The process of ﬁltering is often a good way of deriving an index which rep-

resents the more important historical characteristics of the time series. How-

ever, it generates no model for the underlying trends; and it suggests no way

of predicting their future values.

The alternative way of extracting the trend from the index is to ﬁt some

function which is capable of adapting itself to whatever form the trend happens

to display. Diﬀerent functions are appropriate to diﬀerent forms of trend; and

some functions which analysts tend to favour see almost always to be inappro-

priate. Once an analytic function has been ﬁtted to the series, it may be used

to provide extrapolative forecasts of the trend.

Polynomial Trends

Amongst the mathematical functions which suggest themselves as means

of modelling a trend is a pth-degree polynomial whose argument is the time

index t:

(1.3) φ(t) = φ

0

+φ

1

t +· · · +φ

p

t

p

.

When there is no theory to specify a mathematical form for the trend, it

may be possible to approximate it by a polynomial of low degree. This notion

is suggested by the formal result that every analytic mathematical function can

be expanded as a power series, which is an indeﬁnite sum whose terms contain

rising powers of the argument. Thus the polynomial in t may be construed as

an approximation to an analytic function which is obtained by discarding all

but the leading terms of a power-series expansion.

There are also arguments from physics which suggest that ﬁrst-degree and

second-degree polynomials in t, which are linear and quadratic time trends in

other words, are common in the natural world. The thought occurs to us that

such trends might also arise in the social world.

According to a well-known dictum,

Every body continues in its state of rest or of uniform motion in a straight

line unless it is compelled to change that state by forces impressed upon it.

This is Newtons’s ﬁrst law of motion. The kinematic equation for the distance

covered by a body moving with constant velocity in a straight line is

(1.4) x = x

0

+ut,

3

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

where u is the uniform velocity, and x

0

represents the initial position of the

body at time t = 0. This is nothing but a ﬁrst-degree polynomial in t.

Newton’s second law of motion asserts that

The change of motion is proportional to the motive force impressed; and is

made in the direction of the straight line in which the force is impressed.

In modern language, this is expressed by saying that the acceleration of a

body along a straight line is proportional to the force which is applied in that

direction. The kinematic equation for the distance travelled under uniformly

accelerated rectilinear motion is

(1.5) x = x

0

+u

0

t +

1

2

at

2

,

where u

0

is the velocity at time t = 0 and a is the constant acceleration due to

the motive force. This is just a quadratic in t.

A linear or a quadratic function may be appropriate if the trend in question

is monotonically increasing or decreasing. In other cases, polynomials of higher

degrees might be ﬁtted. Figure 1 is the result of ﬁtting a cubic function to an

economic time series by least-squares regression.

1920 1925 1930 1935 1940

140

150

160

170

180

Figure 1. A cubic function ﬁtted to data on meat

consumption in the United States, 1919–1941.

4

D.S.G. POLLOCK : TRENDS IN TIME SERIES

It might be felt that there are salient features in the data which are not

captured by the cubic polynomial. In that case, the recourse might be to

increase the degree of the polynomial by one. The result will be a curve which

ﬁts the data more closely. Also, it will be found that one of the branches

of the polynomial—the left branch in this case—has changed direction. The

values found by extrapolating the quartic function backwards in time will diﬀer

radically from those found by extrapolating the cubic function.

In general, the eﬀect of altering the degree of the polynomial by one will

be to alter the direction of one or other of the branches of the ﬁtted function;

and, from the point of view of forecasting, this is a highly unsatisfactory cir-

cumstance. Another feature of a polynomial function is that its branches tend

to plus or minus inﬁnity with increasing rapidity as the argument increases or

decreases beyond a range of central values where the function has its stationary

points and its points of inﬂection. This might also be regarded as a undesirable

property for a function which is to be used in extrapolative forecasting.

Some care has to be taken in ﬁtting a polynomial time trend by the method

of least-squares regression. A straightforward procedure, which comes imme-

diately to mind, is to form a matrix X of regressors in which the generic row

[t

0

, t, t

2

, . . . , t

p

] contains rising powers of the argument t. The annual data on

meat consumption, for example, which are plotted in Figure 1, run from 1919

to 1941; and these dates might be taken as the initial and terminal values of

t. In that case, there would be a vast diﬀerences in the values of the elements

of the matrix X. For, whereas t

0

= 1 for all values of t = 1919, . . . , 1941, we

should ﬁnd that, when t = 1941, the value of t

3

is in excess of 7, 300 million.

Clearly, such a disparity of numbers taxes the precision of the computer.

An obvious recourse is to recode the values of t. Thus, we might take

t = −11, . . . , 11 for the range of the argument. The change would aﬀect only the

value of the intercept term φ

0

which could be adjusted ex post. Unfortunately,

such a recourse in not always adequate to ensure the numerical accuracy of

the computation. The reason lies in the peculiarly ill-conditioned nature of the

matrix (X

X)

−1

of cross products.

In fact, a specialised procedure of polynomial regression is often called for

in which the functions t

0

, t, . . . , t

p

are replaced by a set of so-called orthogo-

nal polynomials which give rise to vectors of regressors whose cross products

are zero-valued. The estimated coeﬃcients associated with these orthogonal

polynomials can be converted into the coeﬃcients φ

0

, φ

1

, . . . , φ

p

of equation

(3).

Exponential and Logistic Trends

The notion of exponential or geometric growth is common in economics

where it is closely related to the idea of compound interest. Consider a ﬁnancial

asset with an annual rate of return of γ. The annual growth factor for an

5

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

investment of unit value is (1 +γ). If α units were invested at time t = 0, and

if the returns were compounded with the principal on an annual basis, then the

value of the investment at time t would be given by

(1.6) y

t

= α(1 +γ)

t

.

An investment which is compounded twice a year has an annual growth

factor of (1+

1

2

γ)

2

, and one which is compounded quarterly has a growth factor

of (1 +

1

4

γ)

4

. If an investment were compounded continuously, then its growth

factor would be lim(n → ∞)(1 +

1

n

γ)

n

= e

γ

. The value of the asset at time t

would be given by

(1.7) y = αe

γt

;

and this is the equation for exponential growth.

The equation of exponential growth is a solution of the diﬀerential equation

(1.8)

dy

dt

= γy.

The implication of the diﬀerential equation is that the absolute rate of growth

in y is proportional to the value already attained by y. It is equivalent to say

that the proportional rate of growth (1/y)(dy/dt) is constant.

An exponential growth trend can be ﬁtted to observations y

1

, . . . , y

n

, sam-

pled at regular intervals, by applying ordinary least-squares regression to the

equation

(1.9) ln y

t

= ln α +γt +ε

t

.

This is obtained by taking the logarithm of equation (7) and adding a distur-

bance term ε

t

. An alternative parametrisation is obtained by setting λ = e

γ

.

Then the transformed growth equation becomes

(1.10) ln y

t

= lnα + (ln λ)t +ε

t

,

and the geometric growth rate is λ −1.

Whereas unhindered exponential growth might well be a possibility for

certain monetary or ﬁnancial quantities, it is implausible to suggest that such

a process can be sustained for long when real resources are involved. Since real

resources are ﬁnite, we expect there to be upper limits to the levels which can

be attained by economic variables.

For an example of a trend with an upper bound, we might imagine a pro-

cess whereby the ownership of a consumer durable grows until the majority

6

D.S.G. POLLOCK : TRENDS IN TIME SERIES

of households or individuals are in possession of it. Good examples are pro-

vided by the sales of domestic electrical appliances such are fridges and colour

television sets.

Typically, when the new durable is introduced, the rate of sales is slow.

Then, as information about the durable, or experience of it, is spread amongst

consumers, the sales begin to accelerate. For a time, their cumulated total

might appear to follow an exponential growth path. Then come the ﬁrst signs

that the market is being saturated; and there is a point of inﬂection in the

cumulative curve where its second derivative—which is the rate of increase in

sales per period—passes from positive to negative. Eventually, as the level of

ownership approaches the saturation point, the rate of sales will decline to a

constant level, which may be at zero, if the good is wholly durable, or at a

small positive replacement rate if it is not.

It is very diﬃcult to specify the dynamics of a process such as the one we

have described whenever there are replacement sales to be taken into account.

The reason is that the replacement sales depend not only on the size of the

ownership of the durable goods but also upon the age of the stock of goods.

The latter is a function, at least in an early period, of the way in which sales

have grown at the outset. Often we have to be content with modelling only the

growth of ownership.

One of the simplest ways of modelling the growth of ownership is to employ

the so-called logistic curve. This classical device has its origins in the mathe-

matics of biology where it has been used to model the growth of a population

of animals in an environment with limited food resources.

0.25

0.5

1.0

−4 −2 2 4

Figure 2. The logistic function e

x

/(1 + e

x

) and its derivative. For large

negative values of x, the function and its derivative are close. In the case

of the exponential function e

x

, they coincide for all values of x.

7

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

The simplest version of the function is given by

(1.11) π(x) =

1

1 +e

−x

=

e

x

1 +e

x

.

The second expression comes from multiplying top and bottom of the ﬁrst

expression by e

x

. The logistic curve varies between a value of zero, which is

approached as x → −∞, and a value of unity, which is approached as x → +∞.

At the mid point, where x = 0, the value of the function is π(0) =

1

2

. These

characteristics can be understood easily in reference to the ﬁrst expression.

The alternative expression for the logistic curve also lends itself to an

interpretation. We may begin by noting that, for large negative values of x,

the term 1+e

x

, which is found in the denominator, is not signiﬁcantly diﬀerent

from unity. Therefore, as x increases from such values towards zero, the logistic

function closely resembles an exponential function. By the time x reaches zero,

the denominator, with a value of 2, is already signiﬁcantly aﬀected by the term

e

x

. At that point, there is an inﬂection in the curve as the rate of increase in π

begins to decline. Thereafter, the rate of increase declines rapidly toward zero,

with the eﬀect that the value of π never exceeds unity.

The inverse mapping x = x(π) is easily derived. Consider

(1.12)

1 −π =

1 +e

x

1 +e

x

−

e

x

1 +e

x

=

1

1 +e

x

=

π

e

x

.

This is rearranged to give

(1.13) e

x

=

π

1 −π

,

whence the inverse function is found by taking natural logarithms:

(1.14) x(π) = ln

π

1 −π

.

The logistic curve needs to be elaborated before it can be ﬁtted ﬂexibly

to a set of observations y

1

, . . . , y

n

tending to an upper asymptote. The general

from of the function is

(1.15) y(t) =

γ

1 +e

−h(t)

=

γe

h(t)

1 +e

h(t)

; h(t) = α +βt.

Here γ is the upper asymptote of the function, which is the saturation level of

ownership in the example of the consumer durable. The parameters β and α

8

D.S.G. POLLOCK : TRENDS IN TIME SERIES

determine respectively the rate of ascent of the function and the mid point of

its ascent, measured on the time-axis.

It can be seen that

(1.16) ln

y(t)

γ −y(t)

= h(t).

Therefore, with the inclusion of a residual term, the equation for the generic

element of the sample is

(1.17) ln

y

t

γ −y

t

= α +βt +e

t

.

For a given value of γ, one may calculate the value of the dependent variable on

the LHS. Then the values of α and β may be found by least-squares regression.

The value of γ may also be determined according to the criterion of min-

imising the sum of squares of the residuals. A crude procedure would entail

running numerous regressions, each with a diﬀerent value for γ. The deﬁni-

tive value would be the one from the regression with the least residual sum of

squares. There are other procedures for ﬁnding the minimising value of γ of

a more systematic and eﬃcient nature which might be used instead. Amongst

these are the methods of Golden Section Search and Fibonnaci Search which

are presented in many texts of numerical analysis.

The objection may be raised that the domain of the logistic function is

the entire real line—which spans all of time from creation to eternity—whereas

the sales history of a consumer durable dates only from the time when it is

introduced to the market. The problem might be overcome by replacing the

time variable t in equation (15) by its logarithm and by allowing t to take only

nonnegative values. Then, whilst t ∈ [0, ∞), we still have ln(t) ∈ (−∞, ∞),

which is the entire domain of the logistic function.

1 2 3 4

0.2

0.4

0.6

0.8

1.0

Figure 3. The function y(t) = γ/(1 + exp{α − β ln(t)}) with γ = 1,

α = 4 and β = 7. The positive values of t are the domain of the function.

9

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

There are many curves which will serve the purpose of modelling a sig-

moidal growth process. Their number is equal, at least, to the number of

theoretical probability density functions—for the corresponding (cumulative)

distribution functions rise monotonically from zero to unity in ways with are

suggestive of processes of bounded growth.

In fact, we do not need to have an analytic form for a cumulative function

before it can be ﬁtted to a growth process. It is enough to have a table of

values of a standardised form of the function. An example is provided by the

normal density function whose distribution function is regularly ﬁtted to data

points in the course of probit analysis. In this case, the ﬁtting involves ﬁnding

values for the location parameter µ and the dispersion parameter σ

2

by which

the standard normal function is converted into an arbitrary normal function.

Nowadays, there are eﬃcient procedures for numerical optimisation which can

accomplish such tasks with ease.

Flexible Trends

If the purpose of decomposing a time series is to form predictions of its

components, then it is important to obtain adequate representations of these

components at every point within the sample period. The device which is most

appropriate to the extrapolative forecasting of a trend is rarely the best means

of representing it within the sample. An extrapolation is usually based upon

a simple analytic function; and any attempt to make the function reﬂect the

local variations of the sample will endow it with global characteristics which

may aﬀect the forecasts adversely.

One way of modelling the local characteristics of a trend without prejudic-

ing its global characteristics is to use a segmented curve. In many applications,

it has been found that a curve with cubic polynomial segments is appropriate.

The segments must be joined in a way which avoids evident discontinuities. In

practice, the requirement is usually for continuous ﬁrst-order and second-order

derivatives. A curve whose segments are joined in this way is described as a

cubic spline.

A spline is a draughtsman’s tool which was once used in drawing smooth

curves. It is a thin ﬂexible piece of wood which was clamped to a series of

pins which were placed along the path of the curve which had to be described.

Some of the essential properties of a mathematical spline can be understood

by bearing the real spline in mind. The pins to which a draughtsman clamped

his spline correspond to the data points through which we might interpolate a

mathematical spline. The segments of the mathematical spline would be joined

at the data points.

The cubic spline becomes a device for modelling a trend when, instead of

passing through the data points, it is allowed, in the interests of smoothness,

to deviate from them. The Reinsch smoothing spline is ﬁtted by minimising

10

D.S.G. POLLOCK : TRENDS IN TIME SERIES

1920 1925 1930 1935 1940

140

150

160

170

180

λ = 0.75

1920 1925 1930 1935 1940

140

150

160

170

180

λ = 0.125

Figure 4. Cubic smoothing splines ﬁtted to data on

meat consumption in the United States, 1919–1941.

11

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

a criterion function which imposes both a penalty for deviating from the data

points and a penalty for excessive curvature in the segments. The measure

of curvature is based upon second derivatives, whilst the measure of deviation

is the sum of the squared distances of the points from the curve. A single

parameter λ governs the trade-oﬀ between the objectives of smoothness and

goodness of ﬁt.

As an analogy for the smoothing spline, one might think of attaching the

draughtsman’s spline to the pins by springs instead of by clamps. The precise

form of the curve would depend upon the stiﬀness of the spline and the forces

exerted by the springs. The degree of ﬂexibility of the spline corresponds to

the value of λ. The forces exerted by ordinary springs are proportional to their

extension; and, in this respect, the analogy, which requires the forces to be

proportional to the squares of their extensions, is imperfect.

Figure 4 shows the consequences of ﬁtting the smoothing spline to the data

on meat consumption which is also used in Figure 1 where a cubic polynomial

has been ﬁtted. It is a matter of judgment how the value of λ should be chosen

so as to reﬂect the trend.

There are various ways in which the curve of a cubic spline may be ex-

trapolated to form forecasts of the trend. In normal circumstances, when the

ends of the spline are left free, the second derivatives are zero-valued and the

extrapolation is linear. However, it is possible to clamp the ends of the spline

in a way which imposes a value on their ﬁrst derivatives. In that case, the

extrapolation is quadratic.

Stochastic Trends

It is possible that what is perceived as a trend is the result of the accumu-

lation of small stochastic ﬂuctuations which have no systematic basis. In that

case, there are some clearly deﬁned ways of removing the trend from the data

as well as for extrapolating it into the future.

The simplest model embodying a stochastic trend is the so-called ﬁrst-

order random walk. Let {y

t

} be the random-walk sequence. Then its value at

time t is obtained from the previous value via the equation

(1.18) y

t

= y

t−1

+ε

t

.

Here ε

t

is an element of a white-noise sequence of independently and identically

distributed random variables with

(1.19) E(ε

t

) = 0 and V (ε

t

) = σ

2

for all t.

By a process of back-substitution, the following expression can be derived:

(1.20) y

t

= y

0

+

ε

t

+ε

t−1

+· · · +ε

1

.

12

D.S.G. POLLOCK : TRENDS IN TIME SERIES

0

1

2

0

−1

−2

−3

0 25 50 75 100

Figure 5. A sequence generated by a white-noise process.

This depicts y

t

as the sum of an initial value y

0

and of an accumulation of

stochastic increments. If y

0

has a ﬁxed ﬁnite value, then the mean and the

variance of y

t

are be given by

(1.21) E(y

t

) = y

0

and V (y

t

) = t ×σ

2

.

There is no central tendency in the random-walk process; and, if its starting

point is in the indeﬁnite past rather than at time t = 0, then the mean and

variance are undeﬁned.

To reduce the random walk to a stationary stochastic process, it is neces-

sary only to take its ﬁrst diﬀerences. Thus

(1.22) y

t

−y

t−1

= ε

t

.

The values of a random walk, as the name implies, have a tendency to

wander haphazardly. However, if the variance of the white-noise process is

small, then the values of the stochastic increments will also be small and the

random walk will wander slowly. It is debatable whether the outcome of such

a process deserves to be called a trend.

A ﬁrst-order random walk over a surface is what is know as Brownian

motion. For a physical example of Brownian motion, one can imagine small

particles, such a pollen grains, ﬂoating on the surface of a viscous liquid. The

viscosity might be expected to bring the particles to a halt quickly if they

13

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

0

2

4

0

−2

−4

−6

−8

0 25 50 75 100

Figure 6. A ﬁrst-order random walk

0

50

0

−50

−100

−150

−200

0 25 50 75 100

Figure 7. A second-order random walk

14

D.S.G. POLLOCK : TRENDS IN TIME SERIES

were in motion. However, if the particles are very light, then they will dart

hither and thither on the surface of the liquid under the impact of its molecules

which are themselves in constant motion.

There is no better way of predicting the outcome of a random walk than

to take the most recently observed value and to extrapolate it indeﬁnitely into

the future. This is demonstrated by taking the expected values of the elements

of the equation

(1.23) y

t+h

= y

t+h−1

+ε

t+h

which represents the value which lies h periods ahead at time t. The expecta-

tions, which are conditional upon the information of the set I

t

= {y

t

, y

t−1

, . . .}

containing observations on the series up to time t, may be denoted as follows:

(1.24) E(y

t+h

|I

t

) =

ˆ y

t+h|t

, if h > 0;

y

t+h

, if h ≤ 0.

In these terms, the predictions of the values of the random walk for h > 1

periods ahead and for one period ahead are given, respectively, by

(1.25)

E(y

t+h

|I

t

) = ˆ y

t+h|t

= ˆ y

t+h−1|t

,

E(y

t+1

|I

t

) = ˆ y

t+1|t

= y

t

.

The ﬁrst of these, which comes from (23), depends upon the fact that

E(ε

t+h

|I

t

) = 0. The second, which comes from taking expectations in the

equation y

t+1

= y

t

+ ε

t+1

, uses the fact that the value of y

t

is already known.

The implication of the two equations is that y

t

serves as the optimal predictor

for all future values of the random walk.

A second-order random walk is formed by accumulating the values of a

ﬁrst-order process. Thus, if {ε

t

} and {y

t

} are respectively a white-noise se-

quence and the sequence from a ﬁrst-order random walk, then

(1.26)

z

t

= z

t−1

+y

t

= z

t−1

+y

t−1

+ε

t

= 2z

t−1

−z

t−2

+ε

t

deﬁnes the second-order random walk. Here the ﬁnal expression is obtained by

setting y

t−1

= z

t−1

−z

t−2

in the second expression. It is clear that, to reduce

the sequence {z

t

} to the stationary white-noise sequence, we must take ﬁrst

diﬀerences twice in succession.

The nature of a second-order process can be understood by recognising

that it represents a trend in which the slope—which is its ﬁrst diﬀerence—

follows a random walk. If the random walk wanders slowly, then the slope of

15

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

this trend is liable to change only gradually. Therefore, for extended periods,

the second-order random walk may appear to follow a linear time trend.

For a physical analogy of a second-order random walk, we can imagine a

body in motion which suﬀers a series of small impacts. If the kinetic energy of

the body is large relative to the energy of the impacts, then its linear motion will

be disturbed only slightly. In order to predict where the body might be in some

future period, we simply extrapolate its linear motion free from disturbances.

To demonstrate that the forecast function for a second-order random walk

is a straight line, we may take the expectations, which are conditional upon I

t

,

of the elements of the the equation

(1.27) z

t+h

= 2z

t+h−1

−z

t+h−2

+ε

t+h

.

For h periods ahead and for one period ahead, this gives

(1.28)

E(z

t+h

|I

t

) = ˆ z

t+h|t

= 2ˆ z

t+h−1|t

− ˆ z

t+h−2|t

,

E(z

t+1

|I

t

) = ˆ z

t+1|t

= 2z

t

−z

t−1

,

which together serve to deﬁne a simple iterative scheme. It is straightforward

to conﬁrm that these diﬀerence equations have an analytic solution of the form

(1.29) ˆ z

t+h|t

= α +βh with α = z

t

and β = z

t

−z

t−1

,

which generates a linear time trend.

It is possible to deﬁne random walks of higher orders. Thus a third-order

random walk is formed by accumulating the values of a second-order process.

A third-order process can be expected to give rise to local quadratic trends;

and the appropriate way of predicting its values is by quadratic extrapolation.

A stochastic trend of the random-walk variety may be elaborated by the

addition of an irregular component. A simple model consists of a ﬁrst-order

random walk with an added white-noise component. The model is speciﬁed by

the equations

(1.30)

y

t

= ξ

t

+η

t

,

ξ

t

= ξ

t−1

+ν

t

,

wherein η

t

and ν

t

are generated by two mutually independent white-noise pro-

cesses.

The equations combine to give

(1.31)

y

t

−y

t−1

= ξ

t

−ξ

−1

+η

t

−η

t−1

= ν

t

+η

t

−η

t−1

.

16

D.S.G. POLLOCK : TRENDS IN TIME SERIES

The expression on the RHS can be reformulated to give

(1.32) ν

t

+η

t

−η

t−1

= ε

t

−µε

t−1

,

where ε

t

and ε

t−1

are elements of a white-noise sequence and µ is a parameter

of an appropriate value. Thus, the combination of the random walk and white

noise gives rise to the single equation

(1.33) y

t

= y

t−1

+ε

t

−µε

t−1

.

The forecast for h steps ahead, which is obtained by taking expectations

in the equation y

t+h

= y

t+h−1

+ε

t+h

−µε

t+h−1

, is given by

(1.34) E(y

t+h

|I

t

) = ˆ y

t+h|t

= ˆ y

t+h−1|t

.

The forecast for one step ahead, which is obtained from the equation y

t+1

=

y

t

+ε

t+1

−µε

t

, is

(1.35)

E(y

t+1

|I

t

) = ˆ y

t+1|t

= y

t

−µε

t

= y

t

−µ(y

t

− ˆ y

t|t−1

)

= (1 −µ)y

t

+µˆ y

t|t−1

.

The result ˆ y

t|t−1

= y

t−1

− µε

t−1

, which leads to the identity ε

t

= y

t

− ˆ y

t|t−1

upon which the second equality of (35) depends, reﬂects the fact that, if the in-

formation at time t−1 consists of the elements of the set I

t−1

= {y

t−1

, y

t−2

, . . .}

and the value of µ, then ε

t−1

is a know quantity which is unaﬀected by the

process of taking expectations.

By applying a straightforward process of back-substitution to the ﬁnal

equation of (35), it will be found that

(1.36)

ˆ y

t+1|t

= (1 −µ)(y

t

+µy

t−1

+· · · +µ

t−1

y

1

) +µ

t

ˆ y

0

= (1 −µ){y

t

+µy

t−1

+µ

2

y

t−2

+· · ·},

where the ﬁnal expression stands for an inﬁnite series. This is a so-called

exponentially-weighted moving average; and it is the basis of the widely-used

forecasting procedure known as exponential smoothing.

To form the one-step-ahead forecast ˆ y

t+1|t

in the manner indicated by the

ﬁrst of the equations under (36), an initial value ˆ y

0

is required. Equation (34)

indicates that all the succeeding forecasts ˆ y

t+2|t

, ˆ y

t+3|t

etc. have the same value

as the one-step-ahead forecast.

It will transpire, in subsequent lectures, that equation (33) is a simple

example of an Integrated Autoregressive Moving-Average or ARIMA model.

There exists a readily accessible general theory of the forecasting of ARIMA

processes which we shall expound at length.

17

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

References

Eubank, R.L., (1988), Spline Smoothing and Nonparametric Regression, Marcel

Dekker Inc. New York.

Hamming, R.W., (1989), Digital Filters: Third Edition, Prentice–Hall Inc.,

Englewood Cliﬀs, N.J.

Ratkowsky, D.L., (1985), Nonlinear Regression Modelling: A Uniﬁed Approach,

Marcel Dekker Inc. New York.

Reinsch, C.H., (1967), “Smoothing by Spline Functions”, Numerische Mathe-

matik, 10, 177–183.

Schoenberg, I.J., (1964), “Spline Functions and the Problem of Graduation”,

Proceedings of the National Academy of Science, 52, 947–950.

De Vos, A.F. and I.J. Steyn, (1990), “Stochastic Nonlinearity: A Firm Ba-

sis for the Flexible Functional Form”: Research Memorandum 1990-13, Vrije

Universiteit Amsterdam.

18

LECTURE 2

Seasons and Cycles

in Time Series

Cycles of a regular nature are often encountered in physics and engineering.

Consider a point moving with constant speed in a circle of radius ρ. The point

might be the axis of the ‘big end’ of a connecting rod which joins a piston to

a ﬂywheel. Let time t be reckoned from an instant when the radius joining

the point to the centre is at an angle of θ below the horizontal. If the point is

projected onto the horizontal axis, then the distance of the projection from the

centre is given by

(2.1) x = ρ cos(ωt −θ).

The movement of the projection back and forth along the horizontal axis is

described as simple harmonic motion.

The parameters of the function are as follows:

ρ is the amplitude,

ω is the angular velocity or frequency and

θ is the phase displacement.

The angular velocity is a measure in radians per unit period. The quantity 2π/ω

measures the period of the cycle. The phase displacement, also measured in

radians, indicates the extent to which the cosine function has been displaced by

a shift along the time axis. Thus, instead of the peak of the function occurring

at time t = 0, as it would with an ordinary cosine function, it now occurs a

time t = θ/ω.

Using the compound-angle formula cos(A−B) = cos Acos B+sin Asin B,

we can rewrite equation (1) as

(2.2)

x = ρ cos θ cos(ωt) + ρ sin θ sin(ωt)

= αcos(ωt) + β sin(ωt),

with

(2.3) α = ρ cos θ, β = ρ sin θ and α

2

+ β

2

= ρ

2

.

19

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Extracting a Regular Cyclical Component

A cyclical component which is concealed beneath other motions may be

extracted from a data sequence by a straightforward application of the method

of linear regression. An equation may be written in the form of

(2.4) y

t

= αc

t

(ω) + βs

t

(ω) + e

t

; t = 0, . . . , T −1,

where c

t

(ω) = cos(ωt) and s

t

(ω) = sin(ωt). To avoid the need for an intercept

term, the values of the dependent variable should be deviations about a mean

value. In matrix terms, equation (4) becomes

(2.5) y = [ c s ]

_

α

β

_

+ e,

where c = [c

0

, . . . , c

T−1

]

and s = [s

0

, . . . , s

T−1

]

and e = [e

0

, . . . , e

T−1

]

are

vectors of T elements. The parameters α, β can be found by running regressions

for a wide range of values of ω and by selecting the regression which delivers

the lowest value for the residual sum of squares.

Such a technique may be used for extracting a seasonal component from

an economic time series; and, in that case, we know in advance what value

to give to ω. For the seasonality of economic activities is related, ultimately,

to the near-perfect regularities of the solar system which are reﬂected in the

annual calender.

It may be unreasonable to expect that an idealised seasonal cycle can be

represented by a simple sinusoidal function. However, wave forms of a more

complicated nature may be synthesised by employing a series of sine and cosine

functions whose frequencies are integer multiples of the fundamental seasonal

frequency. If there are s = 2n observations per annum, then a general model

for a seasonal ﬂuctuation would comprise the frequencies

(2.6) ω

j

=

2πj

s

, j = 0, . . . , n =

s

2

,

which are equally spaced in the interval [0, π]. Such a series of frequencies is

described as an harmonic scale.

A model of seasonal ﬂuctuation comprising the full set of harmonically-

related frequencies would take the form of

(2.7) y

t

=

n

j=0

_

α

j

cos(ω

j

t) + β

j

sin(ω

j

t)

_

+ e

t

,

where e

t

is a residual element which might represent an irregular white-noise

component in the process underlying the data.

20

D.S.G. POLLOCK : SEASONS AND CYCLES

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

Figure 1. Trigonometrical functions, of frequencies ω

1

= π/2 and

ω

2

= π, associated with a quarterly model of a seasonal ﬂuctuation.

At ﬁrst sight, it appears that there are s + 2 components in the sum.

However, when s is even, we have

(2.8)

sin(ω

0

t) = sin(0) = 0,

cos(ω

0

t) = cos(0) = 1,

sin(ω

n

t) = sin(πt) = 0,

cos(ω

n

t) = cos(πt) = (−1)

t

.

Therefore there are only s nonzero coeﬃcients to be determined.

This simple seasonal model is illustrated adequately by the case of quar-

terly data. Matters are no more complicated in the case of monthly data. When

there are four observations per annum, we have ω

0

= 0, ω

1

= π/2 and ω

2

= π;

and equation (7) assumes the form of

(2.9) y

t

= α

0

+ α

1

cos

_

πt

2

_

+ β

1

sin

_

πt

2

_

+ α

2

(−1)

t

+ e

t

.

If the four seasons are indexed by j = 0, . . . , 3, then the values from the

year τ can be represented by the following matrix equation:

(2.10)

_

¸

_

y

τ0

y

τ1

y

τ2

y

τ3

_

¸

_

=

_

¸

¸

_

1 1 0 1

1 0 1 −1

1 −1 0 1

1 0 −1 −1

_

¸

¸

_

_

¸

_

α

0

α

1

β

1

α

2

_

¸

_

+

_

¸

_

e

τ0

e

τ1

e

τ2

e

τ3

_

¸

_

.

21

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

It will be observed that the vectors of the matrix are mutually orthogonal.

When the data consist of T = 4p observations which span p years, the

coeﬃcients of the equation are given by

(2.11)

α

0

=

1

T

T−1

t=0

y

t

,

α

1

=

2

T

p

τ=1

(y

τ0

−y

τ2

),

β

1

=

2

T

p

τ=1

(y

τ1

−y

τ3

),

α

2

=

1

T

p

τ=1

(y

τ0

−y

τ1

+ y

τ2

−y

τ3

).

It is the mutual orthogonality of the vectors of ‘explanatory’ variables which

accounts for the simplicity of these formulae.

An alternative model of seasonality, which is used more often by econome-

tricians, assigns an individual dummy variable to each season. Thus, in place

of equation (10), we may take

(2.12)

_

¸

_

y

τ0

y

τ1

y

τ2

y

τ3

_

¸

_

=

_

¸

¸

_

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

_

¸

¸

_

_

¸

_

δ

0

δ

1

δ

2

δ

3

_

¸

_

+

_

¸

_

e

τ0

e

τ1

e

τ2

e

τ3

_

¸

_

,

where

(2.13) δ

j

=

4

T

p

τ=1

y

τj

, for j = 0, . . . , 3.

A comparison of equations (10) and (12) establishes the mapping from the

coeﬃcients of the trigonometrical functions to the coeﬃcients of the dummy

variables. The inverse mapping is

(2.14)

_

¸

¸

¸

_

α

0

α

1

β

1

α

2

_

¸

¸

¸

_

=

_

¸

¸

¸

_

1

4

1

4

1

4

1

4

1

2

0 −

1

2

0

0

1

2

0 −

1

2

1

4

−

1

4

1

4

−

1

4

_

¸

¸

¸

_

_

¸

¸

¸

_

δ

0

δ

1

δ

2

δ

3

_

¸

¸

¸

_

.

Another way of parametrising the model of seasonality is to adopt the

following form:

(2.15)

_

¸

_

y

τ0

y

τ1

y

τ2

y

τ3

_

¸

_

=

_

¸

¸

_

1 1 0 0

1 0 1 0

1 0 0 1

1 0 0 0

_

¸

¸

_

_

¸

_

φ

γ

0

γ

1

γ

2

_

¸

_

+

_

¸

_

e

τ0

e

τ1

e

τ2

e

τ3

_

¸

_

.

22

D.S.G. POLLOCK : SEASONS AND CYCLES

This scheme is unbalanced in that it does not treat each season in the same

manner. An attempt might be made to correct this feature by adding to the

matrix an extra column with a unit at the bottom and with zeros elsewhere and

by introducing an accompanying parameter γ

3

. However, the columns of the

resulting matrix will be linearly dependent; and this will make the parameters

indeterminate unless an additional constraint is imposed which sets γ

0

+· · · +

γ

3

= 0.

The problem highlights a diﬃculty which might arise if either of the

schemes under (10) or (12) were ﬁtted to the data by multiple regression in

the company of a polynomial φ(t) = φ

0

+ φ

1

t +· · · + φ

p

t

p

designed to capture

a trend. To make such a regression viable, one would have to eliminate the

intercept parameter φ

0

.

Irregular Cycles

Whereas it seems reasonable to model a seasonal ﬂuctuation in terms of

trigonometrical functions, it is diﬃcult to accept that other cycles in economic

activity should have such regularity.

A classic expression of skepticism was made by Slutsky [19] in a famous

article of 1927:

Suppose we are inclined to believe in the reality of the strict period-

icity of the business cycle, such, for example, as the eight-year period

postulated by Moore. Then we should encounter another diﬃculty.

Wherein lies the source of this regularity? What is the mechanism of

causality which, decade after decade, reproduces the same sinusoidal

wave which rises and falls on the surface of the social ocean with the

regularity of day and night?

It seems that something other than a perfectly regular sinusoidal component

is required to model the secular ﬂuctuations of economic activity which are

described as business cycles.

To obtain a model for a seasonal ﬂuctuation, it has been enough to modify

the equation of harmonic motion by superimposing a disturbance term which

aﬀects the amplitude. To generate a cycle which is more fundamentally aﬀected

by randomness, we must construct a model which has random eﬀects in both

the phase and the amplitude.

To begin, let us imagine, once more, a point on the circumference of a circle

of radius ρ which is travelling with an angular velocity of ω. At the instant

t = 0, when the point makes a positive angle of θ with the horizontal axis, the

coordinates are given by

(2.16) (α, β) = (ρ cos θ, ρ sin θ).

23

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

To ﬁnd the coordinates of the point after it has rotated through an angle of ω

in one period of time, we may rotate the component vectors (α, 0) and (0, β)

separately and add them. The rotation of the components is depicted as follows:

(2.17)

(α, 0)

ω

−→ (αcos ω, αsin ω),

(0, β)

ω

−→ (−β sin ω, β cos ω).

Their addition gives

(2.18) (α, β)

ω

−→ (y, z) = (αcos ω −β sin ω, αsin ω + β cos ω).

In matrix terms, the transformation becomes

(2.19)

_

y

z

_

=

_

cos ω −sin ω

sin ω cos ω

_ _

α

β

_

.

To ﬁnd the values of the coordinates at a time which is an integral number of

periods ahead, we may transform the vector [y

, z

]

by premultiplying it the

appropriate number of times by the matrix of the rotation. Alternatively, we

may replace ω in equation (19) by whatever angle will be reached at the time

in question. In eﬀect, equation (19) speciﬁes the horizontal and vertical com-

ponents of a circular motion which amount to a pair of synchronous harmonic

motions.

To introduce the appropriate irregularities into the motion, we may add a

random disturbance term to each of its components. The discrete-time equation

of the resulting motion may be expressed as follows:

(2.20)

_

y

t

z

t

_

=

_

cos ω −sin ω

sin ω cos ω

_ _

y

t−1

z

t−1

_

+

_

υ

t

ζ

t

_

.

Now the character of the motion is radically altered. There is no longer any

bound on the amplitudes which the components might acquire in the long

run; and there is, likewise, a tendency for the phases of their cycles to drift

without limit. Nevertheless, in the absence of uncommonly large disturbances,

the trajectories of y and z are liable, in a limited period, to resemble those of

the simple harmonic motions.

It is easy to decouple the equations of y and z. The ﬁrst of the equations

within the matrix expression can be written as

(2.21) y

t

= cy

t−1

−sz

t−1

+ υ

t

.

The second equation may be lagged by one period and rearranged to give

(2.22) z

t−1

−cz

t−2

= sy

t−2

+ ζ

t−1

.

24

D.S.G. POLLOCK : SEASONS AND CYCLES

By taking the ﬁrst diﬀerence of equation (21) and by using equation (22) to

eliminate the values of z, we get

(2.23)

y

t

−cy

t−1

= cy

t−1

−c

2

y

t−2

−sz

t−1

+ csz

t−2

+ υ

t

−cυ

t−1

= cy

t−1

−c

2

y

t−2

−s

2

y

t−2

−sζ

t−1

+ υ

t

−cυ

t−1

.

If we use the result that y

t−2

cos

2

+y

t−2

sin

2

= y

t−2

and if we collect the dis-

turbances to form a new variable ε

t

= υ

t

−sζ

t−1

−cυ

t−1

, then we can rearrange

the second equality to give

(2.24) y

t

= 2 cos ωy

t−1

−y

t−2

+ ε

t

.

Here it is not true in general that the sequence of disturbances {ε

t

} will be

white noise. However, if we specify that, within equation (20),

(2.25)

_

υ

t

ζ

t

_

=

_

−sin ω

cos ω

_

η

t

,

where {η

t

} is a white-noise sequence, then the lagged terms within ε

t

will cancel

leaving a sequence whose elements are mutually uncorrelated.

A sequence generated by equation (24) when {ε

t

} is a white-noise sequence

is depicted in Figure 2.

0

10

20

30

40

0

−10

−20

−30

−40

0 25 50 75 100

Figure 2. A quasi-cyclical sequence generated by the

equation y

t

= 2 cos ωy

t−1

−y

t−2

+ ε

t

when ω = 20

◦

.

25

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

It is interesting to recognise that equation (24) becomes the equation of a

second-order random walk in the case where ω = 0. The second-order random

walk gives rise to trends which can remain virtually linear over considerable

periods.

Whereas there is little diﬃculty in understanding that an accumulation of

purely random disturbances can give rise to linear trend, there is often surprise

at the fact that such disturbances can also generate cycles which are more or

less regular. An understanding of this phenomenon can be reached by con-

sidering a physical analogy. One such analogy, which is a very apposite, was

provided by Yule whose article of 1927 introduced the concept of a second-order

autoregressive process of which equation (24) is a limiting case. Yules’s purpose

was to explain, in terms of random causes, a cycle of roughly 11 years which

characterises the Wolfer sunspot index.

Yule invited his readers to imagine a pendulum attached to a recording de-

vice. Any deviations from perfectly harmonic motion which might be recorded

must be the result of superimposed errors of observation which could be all

but eliminated if a long sequence of observations were subjected to a regression

analysis.

The recording apparatus is left to itself and unfortunately boys get

into the room and start pelting the pendulum with peas, sometimes

from one side and sometimes from the other. The motion is now

aﬀected not by superposed ﬂuctuations but by true disturbances, and

the eﬀect on the graph will be of an entirely diﬀerent kind. The graph

will remain surprisingly smooth, but amplitude and phase will vary

continuously.

The phenomenon described by Yule is due to the inertia of the pendulum.

In the short term, the impacts of the peas impart very little energy to the

system compared with the sum of its kinetic and potential energies at any point

in time. However, on taking a longer view, we can see that, in the absence of

clock weights, the system is driven by the impacts alone.

The Fourier Decomposition of a Time Series

In spite of the notion that a regular trigonometrical function is an inappro-

priate means for modelling an economic cycle other than a seasonal ﬂuctuation,

there are good reasons to persist with the business of explaining a data sequence

in terms of such functions.

The Fourier decomposition of a series is a matter of explaining the series

entirely as a composition of sinusoidal functions. Thus it is possible to represent

the generic element of the sample as

(2.26) y

t

=

n

j=0

_

α

j

cos(ω

j

t) + β

j

sin(ω

j

t)

_

.

26

D.S.G. POLLOCK : SEASONS AND CYCLES

Assuming that T = 2n is even, this sum comprises T functions whose frequen-

cies

(2.27) ω

j

=

2πj

T

, j = 0, . . . , n =

T

2

are at equally spaced points in the interval [0, π].

As we might infer from our analysis of a seasonal ﬂuctuation, there are

as many nonzeros elements in the sum under (26) as there are data points,

for the reason that two of the functions within the sum—namely sin(ω

0

t) =

sin(0) and sin(ω

n

t) = sin(πt)—are identically zero. It follows that the mapping

from the sample values to the coeﬃcients constitutes a one-to-one invertible

transformation. The same conclusion arises in the slightly more complicated

case where T is odd.

The angular velocity ω

j

= 2πj/T relates to a pair of trigonometrical com-

ponents which accomplish j cycles in the T periods spanned by the data. The

highest velocity ω

n

= π corresponds to the so-called Nyquist frequency. If a

component with a frequency in excess of π were included in the sum in (26),

then its eﬀect would be indistinguishable from that of a component with a

frequency in the range [0, π]

To demonstrate this, consider the case of a pure cosine wave of unit am-

plitude and zero phase whose frequency ω lies in the interval π < ω < 2π. Let

ω

∗

= 2π −ω. Then

(2.28)

cos(ωt) = cos

_

(2π −ω

∗

)t

_

= cos(2π) cos(ω

∗

t) + sin(2π) sin(ω

∗

t)

= cos(ω

∗

t);

which indicates that ω and ω

∗

are observationally indistinguishable. Here,

ω

∗

∈ [0, π] is described as the alias of ω > π.

For an illustration of the problem of aliasing, let us imagine that a person

observes the sea level at 6am. and 6pm. each day. He should notice a very

gradual recession and advance of the water level; the frequency of the cycle

being f = 1/28 which amounts to one tide in 14 days. In fact, the true frequency

is f = 1 −1/28 which gives 27 tides in 14 days. Observing the sea level every

six hours should enable him to infer the correct frequency.

Calculation of the Fourier Coeﬃcients

For heuristic purposes, we can imagine calculating the Fourier coeﬃcients

using an ordinary regression procedure to ﬁt equation (26) to the data. In

this case, there would be no regression residuals, for the reason that we are

‘estimating’ a total of T coeﬃcients from T data points; so we are actually

solving a set of T linear equations in T unknowns.

27

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

A reason for not using a multiple regression procedure is that, in this case,

the vectors of ‘explanatory’ variables are mutually orthogonal. Therefore T

applications of a univariate regression procedure would be appropriate to our

purpose.

Let c

j

= [c

0j

, . . . , c

T−1,j

]

and s

j

= [s

0,j

, . . . , s

T−1,j

]

represent vectors of

T values of the generic functions cos(ω

j

t) and sin(ω

j

t) respectively. Then there

are the following orthogonality conditions:

(2.29)

c

i

c

j

= 0 if i = j,

s

i

s

j

= 0 if i = j,

c

i

s

j

= 0 for all i, j.

In addition, there are the following sums of squares:

(2.30)

c

0

c

0

= c

n

c

n

= T,

s

0

s

0

= s

n

s

n

= 0,

c

j

c

j

= s

j

s

j

=

T

2

.

The ‘regression’ formulae for the Fourier coeﬃcients are therefore

(2.31) α

0

= (i

i)

−1

i

y =

1

T

t

y

t

= ¯ y,

(2.32) α

j

= (c

j

c

j

)

−1

c

j

y =

2

T

t

y

t

cos ω

i

t,

(2.33) β

j

= (s

j

s

j

)

−1

s

j

y =

2

T

t

y

t

sin ω

j

t.

By pursuing the analogy of multiple regression, we can understand that

there is a complete decomposition of the sum of squares of the elements of y

which is given by

(2.34) y

y = α

2

0

i

i +

j

α

2

j

c

j

c

j

+

j

β

2

j

s

j

s

j

.

Now consider writing α

2

0

i

i = ¯ y

2

i

i = ¯ y

¯ y where ¯ y

= [¯ y, . . . , ¯ y] is the vector

whose repeated element is the sample mean ¯ y. It follows that y

y − α

2

0

i

i =

y

y − ¯ y

¯ y = (y − ¯ y)

**(y − ¯ y). Therefore we can rewrite the equation as
**

(2.35) (y − ¯ y)

(y − ¯ y) =

T

2

j

_

α

2

j

+ β

2

j

_

=

T

2

j

ρ

2

j

,

28

D.S.G. POLLOCK : SEASONS AND CYCLES

and it follows that we can express the variance of the sample as

(2.36)

1

T

T−1

t=0

(y

t

− ¯ y)

2

=

1

2

n

j=1

(α

2

j

+ β

2

j

)

=

2

T

2

j

_

_

t

y

t

cos ω

j

t

_

2

+

_

t

y

t

sin ω

j

t

_

2

_

.

The proportion of the variance which is attributable to the component at fre-

quency ω

j

is (α

2

j

+ β

2

j

)/2 = ρ

2

j

/2, where ρ

j

is the amplitude of the component.

The number of the Fourier frequencies increases at the same rate as the

sample size T. Therefore, if the variance of the sample remains ﬁnite, and

if there are no regular harmonic components in the process generating the

data, then we can expect the proportion of the variance attributed to the

individual frequencies to decline as the sample size increases. If there is such

a regular component within the process, then we can expect the proportion of

the variance attributable to it to converge to a ﬁnite value as the sample size

increases.

In order provide a graphical representation of the decomposition of the

sample variance, we must scale the elements of equation (36) by a factor of T.

The graph of the function I(ω

j

) = (T/2)(α

2

j

+β

2

j

) is know as the periodogram.

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 3. The periodogram of Wolfer’s Sunspot Numbers 1749–1924.

29

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

There are many impressive examples where the estimation of the peri-

odogram has revealed the presence of regular harmonic components in a data

series which might otherwise have passed undetected. One of the best-know

examples concerns the analysis of the brightness or magnitude of the star T.

Ursa Major. It was shown by Whittaker and Robinson in 1924 that this series

could be described almost completely in terms of two trigonometrical functions

with periods of 24 and 29 days.

The attempts to discover underlying components in economic time-series

have been less successful. One application of periodogram analysis which was a

notorious failure was its use by William Beveridge in 1921 and 1923 to analyse

a long series of European wheat prices. The periodogram had so many peaks

that at least twenty possible hidden periodicities could be picked out, and this

seemed to be many more than could be accounted for by plausible explanations

within the realms of economic history.

Such ﬁndings seem to diminish the importance of periodogram analysis

in econometrics. However, the fundamental importance of the periodogram is

established once it is recognised that it represents nothing less than the Fourier

transform of the sequence of empirical autocovariances.

The Empirical Autocovariances

A natural way of representing the serial dependence of the elements of a

data sequence is to estimate their autocovariances. The empirical autocovari-

ance of lag τ is deﬁned by the formula

(2.37) c

τ

=

1

T

T−1

t=τ

(y

t

− ¯ y)(y

t−τ

− ¯ y).

The empirical autocorrelation of lag τ is deﬁned by r

τ

= c

τ

/c

0

where c

0

, which

is formally the autocovariance of lag 0, is the variance of the sequence. The

autocorrelation provides a measure of the relatedness of data points separated

by τ periods which is independent of the units of measurement.

It is straightforward to establish the relationship between the periodogram

and the sequence of autocovariances.

The periodogram may be written as

(2.38) I(ω

j

) =

2

T

_

_

T−1

t=0

cos(ω

j

t)(y

t

− ¯ y)

_

2

+

_

T−1

t=0

sin(ω

j

t)(y

t

− ¯ y)

_

2

_

.

The identity

t

cos(ω

j

t)(y

t

−¯ y) =

t

cos(ω

j

t)y

t

follows from the fact that, by

30

D.S.G. POLLOCK : SEASONS AND CYCLES

construction,

t

cos(ω

j

t) = 0 for all j. Expanding the expression in (38) gives

(2.39)

I(ω

j

) =

2

T

_

t

s

cos(ω

j

t) cos(ω

j

s)(y

t

− ¯ y)(y

s

− ¯ y)

_

+

2

T

_

t

s

sin(ω

j

t) sin(ω

j

s)(y

t

− ¯ y)(y

s

− ¯ y)

_

,

and, by using the identity cos(A) cos(B) +sin(A) sin(B) = cos(A−B), we can

rewrite this as

(2.40) I(ω

j

) =

2

T

_

t

s

cos(ω

j

[t −s])(y

t

− ¯ y)(y

s

− ¯ y)

_

.

Next, on deﬁning τ = t − s and writing c

τ

=

t

(y

t

− ¯ y)(y

t−τ

− ¯ y)/T, we can

reduce the latter expression to

(2.41) I(ω

j

) = 2

T−1

τ=1−T

cos(ω

j

τ)c

τ

,

which is a Fourier transform of the sequence of empirical autocovariances.

An Appendix on Harmonic Cycles

Lemma 1. Let ω

j

= 2πj/T where j ∈ {0, 1, . . . , T/2} if T is even and j ∈

{0, 1, . . . , (T −1)/2} if T is odd. Then

T−1

t=0

cos(ω

j

t) =

T−1

t=0

sin(ω

j

t) = 0.

Proof. By Euler’s equations, we have

T−1

t=0

cos(ω

j

t) =

1

2

T−1

t=0

exp(i2πjt/T) +

1

2

T−1

t=0

exp(−i2πjt/T).

By using the formula 1 + λ +· · · + λ

T−1

= (1 −λ

T

)/(1 −λ), we ﬁnd that

T−1

t=0

exp(i2πjt/T) =

1 −exp(i2πj)

1 −exp(i2πj/T)

.

But exp(i2πj) = cos(2πj) + i sin(2πj) = 1, so the numerator in the expression

above is zero, and hence

t

exp(i2πj/T) = 0. By similar means, we can show

31

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

that

t

exp(−i2πj/T) = 0; and, therefore, it follows that

t

cos(ω

j

t) = 0. An

analogous proof shows that

t

sin(ω

j

t) = 0.

Lemma 2. Let ω

j

= 2πj/T where j ∈ 0, 1, . . . , T/2 if T is even and j ∈

0, 1, . . . , (T −1)/2 if T is odd. Then

(a)

T−1

t=0

cos(ω

j

t) cos(ω

k

t) =

_

0, if j = k;

T

2

, if j = k.

(b)

T−1

t=0

sin(ω

j

t) sin(ω

k

t) =

_

0, if j = k;

T

2

, if j = k.

(c)

T−1

t=0

cos(ω

j

t) sin(ψ

k

t) = 0 ifj = k.

Proof. From the formula cos Acos B =

1

2

{cos(A+B) +cos(A−B)} we have

T−1

t=0

cos(ω

j

t) cos(ω

k

t) =

1

2

{cos([ω

j

+ ω

k

]t) + cos([ω

j

−ψ

k

]t)}

=

1

2

T−1

t=0

{cos(2π[j + k]t/T) + cos(2π[j −k]t/T)} .

We ﬁnd, in consequence of Lemma 1, that if j = k, then both terms on the RHS

vanish, and thus we have the ﬁrst part of (a). If j = k, then cos(2π[j −k]t/T) =

cos 0 = 1 and so, whilst the ﬁrst term vanishes, the second terms yields the

value of T under summation. This gives the second part of (a).

The proofs of (b) and (c) follow along similar lines.

References

Beveridge, Sir W. H., (1921), “Weather and Harvest Cycles.” Economic Jour-

nal, 31, 429–452.

Beveridge, Sir W. H., (1922), “Wheat Prices and Rainfall in Western Europe.”

Journal of the Royal Statistical Society, 85, 412–478.

Moore, H. L., (1914), “Economic Cycles: Their Laws and Cause.” Macmillan:

New York.

Slutsky, E., (1937), “The Summation of Random Causes as the Source of Cycli-

cal Processes.” Econometrica, 5, 105–146.

Yule, G. U., (1927), “On a Method of Investigating Periodicities in Disturbed

Series with Special Reference to Wolfer’s Sunspot Numbers.” Philosophical

Transactions of the Royal Society, 89, 1–64.

32

LECTURE 3

Models and Methods

of Time-Series Analysis

A time-series model is one which postulates a relationship amongst a num-

ber of temporal sequences or time series. An example is provided by the simple

regression model

(3.1) y(t) = x(t)β + ε(t),

where y(t) = {y

t

; t = 0, ±1, ±2, . . .} is a sequence, indexed by the time subscript

t, which is a combination of an observable signal sequence x(t) = {x

t

} and an

unobservable white-noise sequence ε(t) = {ε

t

} of independently and identically

distributed random variables.

A more general model, which we shall call the general temporal regression

model, is one which postulates a relationship comprising any number of con-

secutive elements of x(t), y(t) and ε(t). The model may be represented by the

equation

(3.2)

p

i=0

α

i

y(t −i) =

k

i=0

β

i

x(t −i) +

q

i=0

µ

i

ε(t −i),

where it is usually taken for granted that α

0

= 1. This normalisation of the

leading coeﬃcient on the LHS identiﬁes y(t) as the output sequence. Any of

the sums in the equation can be inﬁnite, but if the model is to be viable, the

sequences of coeﬃcients {α

i

}, {β

i

} and {µ

i

} can depend on only a limited

number of parameters.

Although it is convenient to write the general model in the form of (2), it

is also common to represent it by the equation

(3.3) y(t) =

p

i=1

φ

i

y(t −i) +

k

i=0

β

i

x(t −i) +

q

i=0

µ

i

ε(t −i),

where φ

i

= −α

i

for i = 1, . . . , p. This places the lagged versions of the se-

quence y(t) on the RHS in the company of the input sequence x(t) and its lags.

33

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Whereas engineers are liable to describe this as a feedback model, economists

are more likely to describe it as a model with lagged dependent variables.

The foregoing models are termed regression models by virtue of the in-

clusion of the observable explanatory sequence x(t). When x(t) is deleted, we

obtain a simpler unconditional linear stochastic model:

(3.4)

p

i=0

α

i

y(t −i) =

q

i=0

µ

i

ε(t −i).

This is the autoregressive moving-average (ARMA) model.

A time-series model can often assume a variety of forms. Consider a simple

dynamic regression model of the form

(3.5) y(t) = φy(t −1) + x(t)β + ε(t),

where there is a single lagged dependent variable. By repeated substitution,

we obtain

(3.6)

y(t) = φy(t −1) + βx(t) + ε(t)

= φ

2

y(t −2) + β

_

x(t) + φx(t −1)

_

+ ε(t) + φε(t −1)

.

.

.

= φ

n

y(t −n) + β

_

x(t) + φx(t −1) +· · · + φ

n−1

x(t −n + 1)

_

+ ε(t) + φε(t −1) +· · · + φ

n−1

ε(t −n + 1).

If |φ| < 1, then lim(n → ∞)φ

n

= 0; and it follows that, if x(t) and ε(t) are

bounded sequences, then, as the number of repeated substitutions increases

indeﬁnitely, the equation will tend to the limiting form of

(3.7) y(t) = β

∞

i=0

φ

i

x(t −i) +

∞

i=0

φ

i

ε(t −i).

It is notable that, by this process of repeated substitution, the feedback

structure has been eliminated from the model. As a result, it becomes easier

to assess the impact upon the output sequence of changes in the values of the

input sequence. The direct mapping from the input sequence to the output

sequence is described by engineers as a transfer function or as a ﬁlter.

For models more complicated than the one above, the method of repeated

substitution, if pursued directly, becomes intractable. Thus we are motivated

to use more powerful algebraic methods to eﬀect the transformation of the

equation. This leads us to consider the use of the so-called lag operator. A

proper understanding of the lag operator depends upon a knowledge of the

algebra of polynomials and of rational functions.

34

D.S.G. POLLOCK : MODELS AND METHODS

The Algebra of the Lag Operator

A sequence x(t) = {x

t

; t = 0, ±1, ±2, . . .} is any function mapping from

the set of integers Z = {0, ±1, ±2, . . .} to the real line. If the set of integers

represents a set of dates separated by unit intervals, then x(t) is described as

a temporal sequence or a time series.

The set of all time series represents a vector space, and various linear

transformations or operators can be deﬁned over the space. The simplest of

these is the lag operator L which is deﬁned by

(3.8) Lx(t) = x(t −1).

Now, L{Lx(t)} = Lx(t − 1) = x(t − 2); so it makes sense to deﬁne L

2

by

L

2

x(t) = x(t −2). More generally, L

k

x(t) = x(t −k) and, likewise, L

−k

x(t) =

x(t +k). Other operators are the diﬀerence operator ∇ = I −L which has the

eﬀect that

(3.9) ∇x(t) = x(t) −x(t −1),

the forward-diﬀerence operator ∆ = L

−1

− I, and the summation operator

S = (I −L)

−1

= {I + L + L

2

+· · ·} which has the eﬀect that

(3.10) Sx(t) =

∞

i=0

x(t −i).

In general, we can deﬁne polynomials of the lag operator of the form p(L) =

p

0

+ p

1

L +· · · + p

n

L

n

=

p

i

L

i

having the eﬀect that

(3.11)

p(L)x(t) = p

0

x(t) + p

1

x(t −1) +· · · + p

n

x(t −n)

=

n

i=0

p

i

x(t −i).

In these terms, the equation under (2) of the general temporal model becomes

(3.12) α(L)y(t) = β(L)x(t) + µ(L)ε(t).

The advantage which comes from deﬁning polynomials in the lag operator

stems from the fact that they are isomorphic to the set of ordinary algebraic

polynomials. Thus we can rely upon what we know about ordinary polynomials

to treat problems concerning lag-operator polynomials.

35

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Algebraic Polynomials

Consider the equation φ

0

+ φ

1

z + φ

2

z

2

= 0. Once the equation has been

divided by φ

2

, it can be factorised as (z − λ

1

)(z − λ

2

) where λ

1

, λ

2

are the

roots or zeros of the equation which are given by the formula

(3.13) λ =

−φ

1

±

_

φ

2

1

−4φ

2

φ

0

2φ

2

.

If φ

2

1

≥ 4φ

2

φ

0

, then the roots λ

1

, λ

2

are real. If φ

2

1

= 4φ

2

φ

0

, then λ

1

= λ

2

.

If φ

2

1

< 4φ

2

φ

0

, then the roots are the conjugate complex numbers λ = α + iβ,

λ

∗

= α −iβ, where i =

√

−1.

There are three alternative ways of representing the conjugate complex

numbers λ and λ

∗

:

(3.14)

λ = α + iβ = ρ(cos θ + i sin θ) = ρe

iθ

,

λ

∗

= α −iβ = ρ(cos θ −i sin θ) = ρe

−iθ

,

where

(3.15) ρ =

_

α

2

+ β

2

and θ = tan

−1

_

β

α

_

.

These are called, respectively, the Cartesian form, the trigonometrical form and

the exponential form.

The Cartesian and trigonometrical representations are understood by con-

sidering the Argand diagram:

ρ

α

β

θ

−θ

λ

λ

*

Re

Im

Figure 1. The Argand Diagram showing a complex

number λ = α + iβ and its conjugate λ

∗

= α −iβ.

36

D.S.G. POLLOCK : MODELS AND METHODS

The exponential form is understood by considering the following series

expansions of cos θ and i sin θ about the point θ = 0:

(3.16)

cos θ =

_

1 −

θ

2

2!

+

θ

4

4!

−

θ

6

6!

+· · ·

_

,

i sin θ =

_

iθ −

iθ

3

3!

+

iθ

5

5!

−

iθ

7

7!

+· · ·

_

.

Adding these gives

(3.17)

cos θ + i sin θ =

_

1 + iθ −

θ

2

2!

−

iθ

3

3!

+

θ

4

4!

+· · ·

_

= e

iθ

.

Likewise, by subtraction, we get

(3.18)

cos θ −i sin θ =

_

1 −iθ −

θ

2

2!

+

iθ

3

3!

+

θ

4

4!

−· · ·

_

= e

−iθ

.

These are Euler’s equations. It follows from adding (17) and (18) that

(3.19) cos θ =

e

iθ

+ e

−iθ

2

.

Subtracting (18) from (17) gives

(3.20)

sin θ =

−i

2

(e

iθ

−e

−iθ

)

=

1

2i

(e

iθ

−e

−iθ

).

Now consider the general equation of the nth order:

(3.21) φ

0

+ φ

1

z + φ

2

z

2

+· · · + φ

n

z

n

= 0.

On dividing by φ

n

, we can factorise this as

(3.22) (z −λ

1

)(z −λ

2

) · · · (z −λ

n

) = 0,

where some of the roots may be real and others may be complex. The complex

roots come in conjugate pairs, so that, if λ = α + iβ is a complex root, then

there is a corresponding root λ

∗

= α−iβ such that the product (z−λ)(z−λ

∗

) =

z

2

+ 2αz + (α

2

+ β

2

) is real and quadratic. When we multiply the n factors

together, we obtain the expansion

(3.23) 0 = z

n

−

i

λ

i

z

n−1

+

i

j

λ

i

λ

j

z

n−2

−· · · (−1)

n

λ

1

λ

2

· · · λ

n

.

37

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

This can be compared with the expression (φ

0

/φ

n

)+(φ

1

/φ

n

)z +· · · +z

n

=

0. By equating coeﬃcients of the two expressions, we ﬁnd that (φ

0

/φ

n

) =

(−1)

n

λ

i

or, equivalently,

(3.24) φ

n

= φ

0

n

i=1

(−λ

i

)

−1

.

Thus we can express the polynomial in any of the following forms:

(3.25)

φ

i

z

i

= φ

n

(z −λ

i

)

= φ

0

(−λ

i

)

−1

(z −λ

i

)

= φ

0

_

1 −

z

λ

i

_

.

We should also note that, if λ is a root of the primary equation

φ

i

z

i

= 0,

where rising powers of z are associated with rising indices on the coeﬃcients,

then µ = 1/λ is a root of the equation

φ

i

z

n−i

= 0, which has declining

powers of z instead. This follows since

φ

i

λ

i

=

φ

i

µ

−i

= 0 implies that

µ

n

φ

i

µ

−i

=

φ

i

µ

n−i

= 0. Confusion can arise from not knowing which of

the two equations one is dealing with.

Rational Functions of Polynomials

If δ(z) and γ(z) are polynomial functions of z of degrees d and g respec-

tively with d < g, then the ratio δ(z)/γ(z) is described as a proper rational

function. We shall often encounter expressions of the form

(3.26) y(t) =

δ(L)

γ(L)

x(t).

For this to have a meaningful interpretation in the context of a time-series

model, we normally require that y(t) should be a bounded sequence whenever

x(t) is bounded. The necessary and suﬃcient condition for the boundedness of

y(t), in that case, is that the series expansion of δ(z)/γ(z) should be convergent

whenever |z| ≤ 1. We can determine whether or not the sequence will converge

by expressing the ratio δ(z)/γ(z) as a sum of partial fractions. The basic result

is as follows:

(3.27) If δ(z)/γ(z) = δ(z)/{γ

1

(z)γ

2

(z)} is a proper rational function, and

if γ

1

(z) and γ

2

(z) have no common factor, then the function can

be uniquely expressed as

δ(z)

γ(z)

=

δ

1

(z)

γ

1

(z)

+

δ

2

(z)

γ

2

(z)

,

where δ

1

(z)/γ

1

(z) and δ

2

(z)/γ

2

(z) are proper rational functions.

38

D.S.G. POLLOCK : MODELS AND METHODS

Imagine that γ(z) =

(1−z/λ

i

). Then repeated applications of this basic

result enables us to write

(3.28)

δ(z)

γ(z)

=

κ

1

1 −z/λ

1

+

κ

2

1 −z/λ

2

+· · · +

κ

g

1 −z/λ

g

.

By adding the terms on the RHS, we ﬁnd an expression with a numerator of

degree n −1. By equating the terms of the numerator with the terms of δ(z),

we can ﬁnd the values κ

1

, κ

2

, . . . , κ

g

. The convergence of the expansion of

δ(z)/γ(z) is a straightforward matter. For the series converges if and only if

the expansion of each of the partial fractions converges. For the expansion

(3.29)

κ

1 −z/λ

= κ

_

1 + z/λ + (z/λ)

2

+· · ·

_

to converge when |z| ≤ 1, it is necessary and suﬃcient that |λ| > 1.

Example. Consider the function

(3.30)

3z

1 + z −2z

2

=

3z

(1 −z)(1 + 2z)

=

κ

1

1 −z

+

κ

2

1 + 2z

=

κ

1

(1 + 2z) + κ

2

(1 −z)

(1 −z)(1 + 2z)

.

Equating the terms of the numerator gives

(3.31) 3z = (2κ

1

−κ

2

)z + (κ

1

+ κ

2

),

so κ

2

= −κ

1

, which gives 3 = (2κ

1

− κ

2

) = 3κ

1

; and thus we have κ

1

= 1,

κ

2

= −1.

Linear Diﬀerence Equations

An nth-order linear diﬀerence equation is a relationship amongst n + 1

consecutive elements of a sequence x(t) of the form

(3.32) α

0

x(t) + α

1

x(t −1) +· · · + α

n

x(t −n) = u(t),

where u(t) is some speciﬁed sequence which is described as the forcing function.

The equation can be written, in a summary notation, as

(3.33) α(L)x(t) = u(t),

39

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

where α(L) = α

0

+α

1

L+· · · +α

n

L

n

. If n consecutive values of x(t) are given,

say x

1

, x

2

, . . . , x

n

, then the relationship can be used to ﬁnd the succeeding value

x

n+1

. In this way, so long as u(t) is fully speciﬁed, it is possible to generate any

number of the succeeding elements of the sequence. The values of the sequence

prior to t = 1 can be generated likewise; and thus, in eﬀect, we can deduce

the function x(t) from the diﬀerence equation. However, instead of a recursive

solution, we often seek an analytic expression for x(t).

The function x(t; c), expressing the analytic solution, will comprise a set

of n constants in c = [c

1

, c

2

, . . . , c

n

]

**which can be determined once we are
**

given a set of n consecutive values of x(t) which are called initial conditions.

The general analytic solution of the equation α(L)x(t) = u(t) is expressed as

x(t; c) = y(t; c) + z(t), where y(t) is the general solution of the homogeneous

equation α(L)y(t) = 0, and z(t) = α

−1

(L)u(t) is called a particular solution of

the inhomogeneous equation.

We may solve the diﬀerence equation in three steps. First, we ﬁnd the

general solution of the homogeneous equation. Next, we ﬁnd the particular

solution z(t) which embodies no unknown quantities. Finally, we use the n

initial values of x to determine the constants c

1

, c

2

, . . . , c

n

. We shall discuss in

detail only the solution of the homogeneous equation.

Solution of the Homogeneous Diﬀerence Equation

If λ

j

is a root of the equation α(z) = α

0

+α

1

z +· · · +α

n

z

n

= 0 such that

α(λ

j

) = 0, then y

j

(t) = (1/λ

j

)

t

is a solution of the equation α(L)y(t) = 0.

This can be see this by considering the expression

(3.34)

α(L)

_

1

λ

j

_

t

=

_

α

0

+ α

1

L +· · · + α

n

L

n

_

_

1

λ

j

_

t

= α

0

_

1

λ

j

_

t

+ α

1

_

1

λ

j

_

t−1

+· · · + α

n

_

1

λ

j

_

t−n

=

_

α

0

+ α

1

λ

j

+· · · + α

n

λ

n

j

_

_

1

λ

j

_

t

= α(λ

j

)

_

1

λ

j

_

t

.

Alternatively, one may consider the factorisation α(L) = α

0

i

(1 − L/λ

i

).

Within this product is the term 1 −L/λ

j

; and since

_

1 −

L

λ

j

__

1

λ

j

_

t

=

_

1

λ

j

_

t

−

_

1

λ

j

_

t

= 0,

it follows that α(L)(1/λ

j

)

t

= 0.

40

D.S.G. POLLOCK : MODELS AND METHODS

The general solution, in the case where α(L) = 0 has distinct real roots, is

given by

(3.35) y(t; c) = c

1

_

1

λ

1

_

t

+ c

2

_

1

λ

2

_

t

+· · · + c

n

_

1

λ

n

_

t

,

where c

1

, c

2

, . . . , c

n

are the constants which are determined by the initial con-

ditions.

In the case where two roots coincide at a value of λ

j

, the equation α(L)y(t)

= 0 has the solutions y

1

(t) = (1/λ

j

)

t

and y

2

(t) = t(1/λ

j

)

t

. To show this, let us

extract the term (1 − L/λ

j

)

2

from the factorisation α(L) = α

0

i

(1 − L/λ

i

).

Then, according to the previous argument, we have (1 − L/λ

j

)

2

(1/λ

j

)

t

= 0,

but, also, we have

(3.36)

_

1 −

L

λ

j

_

2

t

_

1

λ

j

_

t

=

_

1 −

2L

λ

j

+

L

2

λ

2

j

_

t

_

1

λ

j

_

t

= t

_

1

λ

j

_

t

−2(t −1)

_

1

λ

j

_

t

+ (t −2)

_

1

λ

j

_

t

= 0.

In general, if there are r repeated roots with a value of λ

j

, then all of (1/λ

j

)

t

,

t(1/λ

j

)

t

, t

2

(1/λ

j

)

t

, . . . , t

r−1

(1/λ

j

)

t

are solutions to the equation α(L)y(t) = 0.

A particularly important special case arises when there are r repeated

roots of unit value. Then the functions 1, t, t

2

, . . . , t

r−1

are all solutions to the

homogeneous equation. With each solution is associated a coeﬃcient which

can be determined in view of the initial conditions. If these coeﬃcients are

d

0

, d

1

, d

2

, . . . , d

r−1

then, within the general solution of the homogeneous equa-

tion, there will be found the term d

0

+d

1

t+d

2

t

2

+· · ·+d

r−1

t

r−1

which represents

a polynomial in t of degree r −1.

The 2nd-order Diﬀerence Equation with Complex Roots

Imagine that the 2nd-order equation α(L)y(t) = α

0

y(t) + α

1

y(t − 1) +

α

2

y(t −2) = 0 is such that α(z) = 0 has complex roots λ = 1/µ and λ

∗

= 1/µ

∗

.

If λ, λ

∗

are conjugate complex numbers, then so too are µ, µ

∗

. Therefore, let

us write

(3.37)

µ = γ + iδ = κ(cos ω + i sin ω) = κe

iω

,

µ

∗

= γ −iδ = κ(cos ω −i sin ω) = κe

−iω

.

These will appear in a general solution of the diﬀerence equation of the form

(3.38) y(t) = cµ

t

+ c

∗

(µ

∗

)

t

.

41

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

0

2

4

6

8

0

−2

0 5 10 15 20 25 0 −5

p

1

p

2

Figure 2. The solution of the homogeneous diﬀerence equation (1 −

1.69L + 0.81L

2

)y(t) = 0 for the initial conditions y

0

= 1 and y

1

= 3.69.

The time lag of the phase displacement p

1

and the duration of the cycle p

2

are also indicated.

This represents a real-valued sequence; and, since a real term must equal its

own conjugate, it follows that c and c

∗

must be conjugate numbers of the form

(3.39)

c

∗

= ρ(cos θ + i sin θ) = ρe

iθ

,

c = ρ(cos θ −i sin θ) = ρe

−iθ

.

Thus the general solution becomes

(3.40)

cµ

t

+ c

∗

(µ

∗

)

t

= ρe

−iθ

(κe

iω

)

t

+ ρe

iθ

(κe

−iω

)

t

= ρκ

t

_

e

i(ωt−θ)

+ e

−i(ωt−θ)

_

= 2ρκ

t

cos(ωt −θ).

To analyse the ﬁnal expression, consider ﬁrst the factor cos(ωt −θ). This

is a displaced cosine wave. The value ω, which is a number of radians per unit

period, is called the angular velocity or the angular frequency of the wave. The

value f = ω/2π is its frequency in cycles per unit period. The duration of one

cycle, also called the period, is r = 2π/ω.

The term θ is called the phase displacement of the cosine wave, and it

serves to shift the cosine function along the axis of t so that, in the absence of

damping, the peak would occur at the value of t = θ/ω instead of at t = 0.

42

D.S.G. POLLOCK : MODELS AND METHODS

Next consider the term κ

t

wherein κ =

√

(γ

2

+ δ

2

) is the modulus of the

complex roots. When κ has a value of less than unity, it becomes a damping

factor which serves to attenuate the cosine wave as t increases. The damping

also serves to shift the peaks of the cosine function slightly to the left.

Finally, the factor 2ρ aﬀects the initial amplitude of the cosine wave which

is the value which it assumes when t = 0. Since ρ is just the modulus of the

values c and c

∗

, this amplitude reﬂects the initial conditions. The phase angle

θ is also a product of the initial conditions.

It is instructive to derive an expression for the second-order diﬀerence equa-

tion which is in terms of the parameters of the trigonometrical or exponential

representations of a pair of complex roots. Consider

(3.41)

α(z) = α

0

(1 −µz)(1 −µ

∗

z)

= α

0

_

1 −(µ + µ

∗

)z + µµ

∗

z

2

_

,

From (37) it follows that

(3.42) µ + µ

∗

= 2κcos ω and µµ

∗

= κ

2

.

Therefore the polynomial operator which is entailed by the diﬀerence equation

is

(3.43) α

0

+ α

1

L + α

2

L

2

= α

0

(1 −2κcos ω L + κ

2

L

2

);

and it is usual to set α

0

= 1. This representation indicates that a necessary

condition for the roots to be complex, which is not a suﬃcient condition, is

that α

2

/α

0

> 0.

It is easy to ascertain by inspection whether or not the second-order dif-

ference equation is stable. The condition that the roots of α(z) = 0 must lie

outside the unit circle, which is necessary and suﬃcient for stability, imposes

certain restrictions on the coeﬃcients of α(z) which can be checked easily.

We can reveal these conditions most readily by considering the auxiliary

polynomial ρ(z) = z

2

α(z

−1

) whose roots, which are the inverses of those of

α(z), must lie inside the unit circle. Let the roots of ρ(z), which might be real

or complex, be denoted by µ

1

, µ

2

. Then we can write

(3.44)

ρ(z) = α

0

z

2

+ α

1

z + α

2

= α

0

(z −µ

1

)(z −µ

2

)

= α

0

_

z

2

−(µ

1

+ µ

2

)z + µ

1

µ

2

_

,

where is is assumed that α

0

> 0. This indicates that α

2

/α

0

= µ

1

µ

2

. Therefore

the conditions |µ

1

|, |µ

2

| < 1 imply that

(3.45) −α

0

< α

2

< α

0

.

43

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

If the roots are complex conjugate numbers µ, µ

∗

= γ ±iδ, then this condition

will ensure that µ

∗

µ = α

2

/α

0

< 1, which is the condition that they are within

the unit circle.

Now consider the fact that, if α

0

> 0, then the function ρ(z) will have a

minimum value over the real line which is greater than zero if the roots are

complex and no greater than zero if they are real. If the roots are real, then

they will be found in the interval (−1, 1) if and only if

(3.46)

ρ(−1) = α

0

−α

1

+ α

2

> 0 and

ρ(1) = α

0

+ α

1

+ α

2

> 0.

If the roots are complex then these conditions are bound to be satisﬁed.

From these arguments, it follows that the conditions under (45) and (46)

in combination are necessary and suﬃcient to ensure that the roots of ρ(z) = 0

are within the unit circle and that the roots of α(z) = 0 are outside.

State-Space Models

An nth-order diﬀerence equation in a single variable can be transformed

into a ﬁrst-order system in n variables which are the elements of a so-called

state vector.

There is a wide variety of alternative forms which can be assumed by

a ﬁrst-order vector diﬀerence equation corresponding to the nth-order scalar

equation. However, certain of these are described as canonical forms by virtue

of special structures in the matrix.

In demonstrating one of the more common canonical forms, let us consider

again the nth-order diﬀerence equation of (32), in reference to which we may

deﬁne the following variables:

(3.47)

ξ

1

(t) = x(t),

ξ

2

(t) = ξ

1

(t −1) = x(t −1),

.

.

.

ξ

n

(t) = ξ

n−1

(t −1) = x(t −n + 1).

On the basis of these deﬁnitions, a ﬁrst-order vector equation may be con-

structed in the form of

(3.48)

_

¸

¸

_

ξ

1

(t)

ξ

2

(t)

.

.

.

ξ

n

(t)

_

¸

¸

_

=

_

¸

¸

_

−α

1

. . . −α

n−1

−α

n

1 . . . 0 0

.

.

.

.

.

.

.

.

.

.

.

.

0 . . . 1 0

_

¸

¸

_

_

¸

¸

_

ξ

1

(t −1)

ξ

2

(t −1)

.

.

.

ξ

n

(t −1)

_

¸

¸

_

+

_

¸

¸

_

1

0

.

.

.

0

_

¸

¸

_

ε(t).

44

D.S.G. POLLOCK : MODELS AND METHODS

The matrix in this structure is sometimes described as the companion form.

Here it is manifest, in view of the deﬁnitions under (47), that the leading

equation of the system, which is

(3.49) ξ

1

(t) = −α

1

ξ

1

(t −1) +· · · + α

n

ξ

n

(t −1) + ε(t),

is precisely the equation under (32).

Example. An example of a system which is not in a canonical form is provided

by the following matrix equation:

(3.50)

_

y(t)

z(t)

_

= κ

_

cos ω −sin ω

sin ω cos ω

_ _

y(t −1)

z(t −1)

_

+

_

υ(t)

ζ(t)

_

.

With the use of the lag operator, the equation can also be written as

(3.51)

_

1 −κcos ωL κsin ωL

−κsin ωL 1 −κcos ωL

_ _

y(t)

z(t)

_

=

_

υ(t)

ζ(t)

_

.

On premultiplying the equation by the inverse of the matrix on the LHS, we

get

(3.52)

_

y(t)

z(t)

_

=

1

1 −2κcos ωL + κ

2

L

2

_

1 −κcos ωL −κsin ωL

κsin ωL 1 −κcos ωL

_ _

υ(t)

ζ(t)

_

.

A special case arises when

(3.53)

_

υ(t)

ζ(t)

_

=

_

−sin ω

cos ω

_

η(t),

where η(t) is a white-noise sequence. Then the equation becomes

(3.54)

_

y(t)

z(t)

_

=

1

1 −2κcos ωL + κ

2

L

2

_

−sin ω

cos ω

_

η(t).

On deﬁning ε(t) = −sin ωη(t) we may write the ﬁrst of these equations as

(3.55) (1 −2κcos ωL + κ

2

L

2

)y(t) = ε(t).

This is just a second-order diﬀerence equation with a white-noise forcing func-

tion; and, by virtue of the inclusion of the damping factor κ ∈ [0, 1), it repre-

sents a generalisation of the equation to be found under (2.24).

45

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Transfer Functions

Consider again the simple dynamic model of equation (5):

(3.56) y(t) = φy(t −1) + x(t)β + ε(t).

With the use of the lag operator, this can be rewritten as

(3.57) (1 −φL)y(t) = βx(t) + ε(t)

or, equivalently, as

(3.58) y(t) =

β

1 −φL

x(t) +

1

1 −φL

ε(t).

The latter is the so-called rational transfer-function form of the equation. The

operator L within the transfer functions or ﬁlters can be replaced by a complex

number z. Then the transfer function which is associated with the signal x(t)

becomes

(3.59)

β

1 −φz

= β

_

1 + φz + φ

2

z

2

+· · ·

_

,

where the RHS comes from a familiar power-series expansion.

The sequence {β, βφ, βφ

2

, . . .} of the coeﬃcients of the expansion consti-

tutes the impulse response of the transfer function. That is to to say, if we

imagine that, on the input side, the signal is a unit-impulse sequence of the

form

(3.60) x(t) = {. . . , 0, 1, 0, 0, . . .},

which has zero values at all but one instant, then its mapping through the

transfer function would result in an output sequence of

(3.61) r(t) = {. . . , 0, β, βφ, βφ

2

, . . .}.

Another important concept is the step response of the ﬁlter. We may

imagine that the input sequence is zero-valued up to a point in time when it

assumes a constant unit value:

(3.62) x(t) = {. . . , 0, 1, 1, 1, . . .}.

The mapping of this sequence through the transfer function would result in an

output sequence of

(3.63) s(t) = {. . . , 0, β, β + βφ, β + βφ + βφ

2

, . . .}

46

D.S.G. POLLOCK : MODELS AND METHODS

whose elements, from the point when the step occurs in x(t), are simply the

partial sums of the impulse-response sequence. This sequence of partial sums

{β, β + βφ, β + βφ + βφ

2

, . . .} is described as the step response. Given that

|φ| < 1, the step response converges to a value

(3.64) γ =

β

1 −φ

which is described as the steady-state gain or the long-term multiplier of the

transfer function.

These various concepts apply to models of any order. Consider the equa-

tion

(3.65) α(L)y(t) = β(L)x(t) + ε(t),

where

(3.66)

α(L) = 1 + α

1

L +· · · + α

p

L

p

= 1 −φ

1

L −· · · −φ

p

L

p

,

β(L) = β

0

+ β

1

L +· · · + β

k

L

k

are polynomials of the lag operator. The transfer-function form of the model

is simply

(3.67) y(t) =

β(L)

α(L)

x(t) +

1

α(L)

ε(t),

The rational function associated with x(t) has a series expansion

(3.68)

β(z)

α(z)

= ω(z)

=

_

ω

0

+ ω

1

z + ω

2

z

2

+· · ·

_

;

and the sequence of the coeﬃcients of this expansion constitutes the impulse-

response function. The partial sums of the coeﬃcients constitute the step-

response function. The gain of the transfer function is deﬁned by

(3.69) γ =

β(1)

α(1)

=

β

0

+ β

1

+· · · + β

k

1 + α

1

+· · · + α

p

.

The method of ﬁnding the coeﬃcients of the series expansion of the transfer

function in the general case can be illustrated by the second-order case:

(3.70)

β

0

+ β

1

z

1 −φ

1

z −φ

2

z

2

=

_

ω

0

+ ω

1

z + ω

2

z

2

+· · ·

_

.

47

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

We rewrite this equation as

(3.71) β

0

+ β

1

z =

_

1 −φ

1

z −φ

2

z

2

__

ω

0

+ ω

1

z + ω

2

z

2

+· · ·

_

.

Then, by performing the multiplication on the RHS, and by equating the co-

eﬃcients of the same powers of z on the two sides of the equation, we ﬁnd

that

(3.72)

β

0

= ω

0

,

β

1

= ω

1

−φ

1

ω

0

,

0 = ω

2

−φ

1

ω

1

−φ

2

ω

0

,

.

.

.

0 = ω

n

−φ

1

ω

n−1

−φ

2

ω

n−2

,

ω

0

= β

0

,

ω

1

= β

1

+ φ

1

ω

0

,

ω

2

= φ

1

ω

1

+ φ

2

ω

0

,

.

.

.

ω

n

= φ

1

ω

n−1

+ φ

2

ω

n−2

.

The necessary and suﬃcient condition for the convergence of the sequence

{ω

i

} is that the roots of the primary polynomial equation 1 −φ

1

z −φ

2

z

2

= 0

should lie outside the unit circle or, equivalently, that the roots of the auxiliary

equation z

2

−φ

1

z −φ

2

= 0—which are the inverses of the former roots—should

lie inside the unit circle. If the roots of these equations are real, then the

sequence will converge monotonically to zero whereas, if the roots are complex-

valued, then the sequence will converge in the manner of a damped sinusoid.

It is clear that the equation

(3.73) ω(n) = φ

1

ω(n −1) + φ

2

ω(n −2),

which serves to generate the elements of the impulse response, is nothing but

a second-order homogeneous diﬀerence equation. In fact, Figure 2, which has

been presented as the solution to a homogeneous diﬀerence equation, represents

the impulse response of the transfer function (1 + 2L)/(1 −1.69L + 0.81L

2

).

In the light of this result, it is apparent that the coeﬃcients of the denomi-

nator polynomial 1−φ

1

z−φ

2

z

2

serve to determine the period and the damping

factor of a complex impulse response. The coeﬃcients in the numerator poly-

nomial β

0

+ β

1

z serve to determine the initial amplitude of the response and

its phase lag. It seems that all four coeﬃcients must be present if a second-

order transfer function is to have complete ﬂexibility in modelling a dynamic

response.

The Frequency Response

In many applications within forecasting and time-series analysis, it is of

interest to consider the response of a transfer function to a signal which is a

simple sinusoid. As we have indicated in a previous lecture, it is possible

48

D.S.G. POLLOCK : MODELS AND METHODS

0

10

20

30

40

50

0 π/2 −π/2 π −π

Figure 3.The gain of the transfer function (1 + 2L

2

)/(1 −1.69L + 0.81L

2

).

0 π/2 π −π/2 −π

π

−π

Figure 4.The phase diagram of the transfer function (1 + 2L

2

)/(1 −1.69L + 0.81L

2

).

49

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

to represent a ﬁnite sequence as a sum of sine and cosine functions whose

frequencies are integer multiples of a fundamental frequency. More generally, it

is possible, as we shall see later, to represent an arbitrary stationary stochastic

process as a combination of an inﬁnite number of sine and cosine functions

whose frequencies range continuously in the interval [0, π]. It follows that the

eﬀect of a transfer function upon stationary signals can be characterised in

terms of its eﬀect upon the sinusoidal functions.

Consider therefore the consequences of mapping the signal x(t) = cos(ωt)

through the transfer function γ(L) = γ

0

+ γ

1

L +· · · + γ

g

L

g

. The output is

(3.74)

y(t) = γ(L) cos(ωt)

=

g

j=0

γ

j

cos

_

ω[t −j]

_

.

The trigonometrical identity cos(A−B) = cos Acos B + sin Asin B enables us

to write this as

(3.75)

y(t) =

_

j

γ

j

cos(ωj)

_

cos(ωt) +

_

j

γ

j

sin(ωj)

_

sin(ωt)

= αcos(ωt) + β sin(ωt) = ρ cos(ωt −θ).

Here we have deﬁned

(3.76)

α =

g

j=0

γ

j

cos(ωj), β =

g

j=0

γ

j

sin(ωj),

ρ =

_

α

2

+ β

2

and θ = tan

−1

_

β

α

_

.

It can be seen from (75) that the eﬀect of the ﬁlter upon the signal is

twofold. First there is a gain eﬀect whereby the amplitude of the sinusoid has

been increased or diminished by a factor of ρ. Also there is a phase eﬀect

whereby the peak of the sinusoid is displaced by a time delay of θ/ω periods.

Figures 3 and 4 represent the two eﬀects of a simple rational transfer function

on the set of sinusoids whose frequencies range from 0 to π.

50

LECTURE 4

Time-Series Analysis in

the Frequency Domain

A sequence is a function mapping from a set of integers, described as the

index set, onto the real line or into a subset thereof. A time series is a sequence

whose index corresponds to consecutive dates separated by a unit time interval.

In the statistical analysis of time series, the elements of the sequence are

regarded as a set of random variables. Usually, no notational distinction is

made between these random variables and their realised values. It is important

nevertheless to bear the distinction in mind.

In order to analyse a statistical time series, it must be assumed that the

structure of the statistical or stochastic process which generates the observa-

tions is essentially invariant through time. The conventional assumptions are

summarised in the condition of stationarity. In its strong form, the condition

requires that any two segments of equal length which are extracted from the

time series must have identical multivariate probability density functions. The

condition of weak stationarity requires only that the elements of the time series

should have a common ﬁnite expected value and that the autocovariance of two

elements should depend only on their temporal separation.

A fundamental process, from which many other stationary processes may

be derived, is the so-called white-noise process which consists of a sequence of

uncorrelated random variables, each with a zero mean and the same ﬁnite vari-

ance. By passing white noise through a linear ﬁlter, a sequence whose elements

are serially correlated can be generated. In fact, virtually every stationary

stochastic process may be depicted as the product of a ﬁltering operation ap-

plied to white noise. This result follows from the Cram´er–Wold Theorem which

will be presented after we have introduced the concepts underlying the spectral

representation of a time series.

The spectral representation is rooted in the basic notion of Fourier analysis

which is that well-behaved functions can be approximated over a ﬁnite inter-

val, to any degree of accuracy, by a weighted combination of sine and cosine

functions whose harmonically rising frequencies are integral multiples of a fun-

damental frequency. Such linear combinations are described as Fourier sums

or Fourier series. Of course, the notion applies to sequences as well; for any

51

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

number of well-behaved functions may be interpolated through the coordinates

of a ﬁnite sequence.

We shall approach the Fourier analysis of stochastic processes via the ex-

act Fourier representation of a ﬁnite sequence. This is extended to provide a

representation of an inﬁnite sequence in terms of an inﬁnity of trigonometri-

cal functions whose frequencies range continuously in the interval [0, π]. The

trigonometrical functions and their weighting functions are gathered under a

Fourier–Stieltjes integral. It is remarkable that, whereas a Fourier sum serves

only to deﬁne a strictly periodic function, a Fourier integral suﬃces to represent

an aperiodic time series generated by a stationary stochastic process.

The Fourier integral is also used to represent the underlying stochastic

process. This is achieved by describing the stochastic processes which generate

the weighting functions. There are two such weighting processes, associated

respectively with the sine and cosine functions; and their common variance,

which is a function f(ω), ω ∈ [0, π], is the so-called spectral density function.

The relationship between the spectral density function and the sequence

of autocovariances, which is summarised in the Wiener–Khintchine theorem,

provides a link between the time-domain and the frequency-domain analyses.

The sequence of autocovariances may be obtained from the Fourier transform

of the spectral density function and the spectral density function is, conversely,

a Fourier transform of the autocovariances.

Stationarity

Consider two vectors of n + 1 consecutive elements from the process y(t):

(4.1) [y

t

, y

t+1

, . . . , y

t+n

] and [y

s

, y

s+1

, . . . , y

s+n

]

Then y(t) = {y

t

; t = 0, ±1, ±2, . . .} is strictly stationary if the joint probability

density functions of the two vectors are the same for any values of t and s

regardless of the size of n. On the assumption that the ﬁrst and second-order

moments of the distribution are ﬁnite, the condition of stationarity implies

that all the elements of y(t) have the same expected value and that the covari-

ance between any pair of elements of the sequences is a function only of their

temporal separation. Thus,

(4.2) E(y

t

) = µ and C(y

t

, y

s

) = γ

|t−s|

.

On their own, the conditions of (2) constitute the conditions of weak station-

arity.

A normal process is completely characterised by its mean and its autoco-

variances. Therefore, a normal process y(t) which satisﬁes the conditions for

weak stationarity is also stationary in the strict sense.

52

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

The Autocovariance Function

The covariance between two elements y

t

and y

s

of a process y(t) which are

separated by τ −|t −s| intervals of time, is known as the autocovariance at lag

τ and is denoted by γ

τ

. The autocorrelation at lag τ, denoted by ρ

τ

, is deﬁned

by

(4.3) ρ

τ

=

γ

τ

γ

0

,

where γ

0

is the variance of the process y(t).

The stationarity conditions imply that the autocovariances of y(t) satisfy

the equality

(4.4) γ

τ

= γ

−τ

for all values of τ.

The autocovariance matrix of a stationary process corresponding to the n

elements y

0

, y

1

, . . . , y

n−1

is given by

(4.5) Γ =

_

¸

¸

¸

¸

_

γ

0

γ

1

γ

2

. . . γ

n−1

γ

1

γ

0

γ

1

. . . γ

n−2

γ

2

γ

1

γ

0

. . . γ

n−3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

n−1

γ

n−2

γ

n−3

. . . γ

0

_

¸

¸

¸

¸

_

.

The sequences {γ

τ

} and {ρ

τ

} are described as the autocovariance and autocor-

relation functions respectively.

The Filtering of White Noise

A white-noise process is a sequence ε(t) of uncorrelated random variables

with mean zero and common variance σ

2

ε

. Thus

(4.6)

E(ε

t

) = 0, for all t

E(ε

t

ε

s

) =

_

σ

2

ε

, if t = s;

0, if t = s.

By a process of linear ﬁltering, a variety of time series may be constructed

whose elements display complex interdependencies. A ﬁnite linear ﬁlter, also

called a moving-average operator, is a polynomial in the lag operator of the

form µ(L) = µ

0

+µ

1

L+· · · +µ

q

L

q

. The eﬀect of this ﬁlter on ε(t) is described

by the equation

(4.7)

y(t) = µ(L)ε(t)

= µ

0

ε(t) +µ

1

ε(t −1) +µ

2

ε(t −2) +· · · +µ

q

ε(t −q)

=

q

i=0

µ

i

ε(t −i).

53

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

The operator µ(L) is also be described as the transfer function which maps the

input sequence ε(t) into the output sequence y(t).

An operator µ(L) = {µ

0

+ µ

1

L + µ

2

L

2

+ · · ·} with an indeﬁnite number

of terms in rising powers of L may also be considered. However, for this to be

practical, the coeﬃcients {µ

0

, µ

1

, µ

2

, . . .} must be functions of a limited number

of fundamental parameters. In addition, it is required that

(4.8)

i

|µ

i

| < ∞.

Given the value of σ

2

ε

= V {ε(t)}, the autocovariances of the ﬁltered se-

quence y(t) = µ(L)ε(t) may be determined by evaluating the expression

(4.9)

γ

τ

= E(y

t

y

t−τ

)

= E

_

i

µ

i

ε

t−i

j

µ

j

ε

t−τ−j

_

=

i

j

µ

i

µ

j

E(ε

t−i

ε

t−τ−j

).

From equation (6), it follows that

(4.10) γ

τ

= σ

2

ε

j

µ

j

µ

j+τ

;

and so the variance of the ﬁltered sequence is

(4.11) γ

0

= σ

2

ε

j

µ

2

j

.

The condition under equation (8) guarantees that these quantities are ﬁnite, as

is required by the condition of stationarity.

The z-transform

In the subsequent analysis, it will prove helpful to present the results in the

notation of the z-transform. The z-transform of the inﬁnite sequence y(t) =

{y

t

; t = 0, ±1, ±2, . . .} is deﬁned by

(4.12) y(z) =

∞

τ=−∞

y

t

z

t

.

Here z is a complex number which may be placed on the perimeter of the unit

circle provided that the series converges. Thus z = e

−iω

with ω ∈ [0, 2π]

54

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

If y(t) = µ

0

ε(t) + µ

1

ε(t − 1) + · · · + µ

q

z

q

ε(t − q) = µ(L)ε(t) is a moving-

average process, then the z-transform of the sequence of moving-average coeﬃ-

cients is the polynomial µ(z) = µ

0

+µ

1

z +· · · +µ

q

z

q

which has the same form

as the operator µ(L).

The z-transform of a sequence of autocovariances is called the autocovari-

ance generating function. For the moving-average process, this is given by

(4.13)

γ(z) = σ

2

ε

µ(z)µ(z

−1

)

= σ

2

ε

i

µ

i

z

i

j

µ

j

z

−j

= σ

2

ε

i

j

µ

i

µ

j

z

i−j

=

τ

_

σ

2

ε

j

µ

j

µ

j+τ

_

z

τ

; τ = i −j

=

∞

τ=−∞

γ

τ

z

τ

.

The ﬁnal equality is by virtue of equation (10).

The Fourier Representation of a Sequence

According to the basic result of Fourier analysis, it is always possible to

approximate an arbitrary analytic function deﬁned over a ﬁnite interval of the

real line, to any desired degree of accuracy, by a weighted sum of sine and

cosine functions of harmonically increasing frequencies.

Similar results apply in the case of sequences, which may be regarded as

functions mapping from the set of integers onto the real line. For a sample of

T observations y

0

, . . . , y

T−1

, it is possible to devise an expression in the form

(4.14) y

t

=

n

j=0

_

α

j

cos(ω

j

t) +β

j

sin(ω

j

t)

_

,

wherein ω

j

= 2πj/T is a multiple of the fundamental frequency ω

1

= 2π/T.

Thus, the elements of a ﬁnite sequence can be expressed exactly in terms of

sines and cosines. This expression is called the Fourier decomposition of y

t

and

the set of coeﬃcients {α

j

, β

j

; j = 0, 1, . . . , n} are called the Fourier coeﬃcients.

When T is even, we have n = T/2; and it follows that

(4.15)

sin(ω

0

t) = sin(0) = 0,

cos(ω

0

t) = cos(0) = 1,

sin(ω

n

t) = sin(πt) = 0,

cos(ω

n

t) = cos(πt) = (−1)

t

.

55

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Therefore, equation (14) becomes

(4.16) y

t

= α

0

+

n−1

j=1

_

α

j

cos(ω

j

t) +β

j

sin(ω

j

t)

_

+α

n

(−1)

t

.

When T is odd, we have n = (T −1)/2; and then equation (14) becomes

(4.17) y

t

= α

0

+

n

j=1

_

α

j

cos(ω

j

t) +β

j

sin(ω

j

t)

_

.

In both cases, there are T nonzero coeﬃcients amongst the set

{α

j

, β

j

; j = 0, 1, . . . , n}; and the mapping from the sample values to the co-

eﬃcients constitutes a one-to-one invertible transformation.

In equation (16), the frequencies of the trigonometric functions range from

ω

1

= 2π/T to ω

n

= π; whereas, in equation (17), they range from ω

1

= 2π/T

to ω

n

= π(T −1)/T. The frequency π is the so-called Nyquist frequency.

Although the process generating the data may contain components of fre-

quencies higher than the Nyquist frequency, these will not be detected when

it is sampled regularly at unit intervals of time. In fact, the eﬀects on the

process of components with frequencies in excess of the Nyquist value will be

confounded with those whose frequencies fall below it.

To demonstrate this, consider the case where the process contains a com-

ponent which is a pure cosine wave of unit amplitude and zero phase whose

frequency ω lies in the interval π < ω < 2π. Let ω

∗

= 2π −ω. Then

(4.18)

cos(ωt) = cos

_

(2π −ω

∗

)t

_

= cos(2π) cos(ω

∗

t) + sin(2π) sin(ω

∗

t)

= cos(ω

∗

t);

which indicates that ω and ω

∗

are observationally indistinguishable. Here,

ω

∗

< π is described as the alias of ω > π.

The Spectral Representation of a Stationary Process

By allowing the value of n in the expression (14) to tend to inﬁnity, it is

possible to express a sequence of indeﬁnite length in terms of a sum of sine and

cosine functions. However, in the limit as n → ∞, the coeﬃcients α

j

, β

j

tend

to vanish; and therefore an alternative representation in terms of diﬀerentials

is called for.

By writing α

j

= dA(ω

j

), β

j

= dB(ω

j

) where A(ω), B(ω) are step functions

with discontinuities at the points {ω

j

; j = 0, . . . , n}, the expression (14) can be

rendered as

(4.19) y

t

=

j

_

cos(ω

j

t)dA(ω

j

) + sin(ω

j

t)dB(ω

j

)

_

.

56

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

0.0

0.2

0.4

0.6

0.0

−0.2

−0.4

0 25 50 75 100 125

0.000

0.025

0.050

0.075

0.100

0.125

0 π/4 π/2 3π/4 π

Figure 1. The graph of 134 observations on the monthly purchase of

clothing after a logarithmic transformation and the removal of a linear trend

together with the corresponding periodogram.

57

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

In the limit, as n → ∞, the summation is replaced by an integral to give the

expression

(4.20) y(t) =

_

π

0

_

cos(ωt)dA(ω) + sin(ωt)dB(ω)

_

.

Here, cos(ωt) and sin(ωt), and therefore y(t), may be regarded as inﬁnite se-

quences deﬁned over the entire set of positive and negative integers.

Since A(ω) and B(ω) are discontinuous functions for which no derivatives

exist, one must avoid using α(ω)dω and β(ω)dω in place of dA(ω) and dB(ω).

Moreover, the integral in equation (20) is a Fourier–Stieltjes integral.

In order to derive a statistical theory for the process that generates y(t),

one must make some assumptions concerning the functions A(ω) and B(ω).

So far, the sequence y(t) has been interpreted as a realisation of a stochastic

process. If y(t) is regarded as the stochastic process itself, then the functions

A(ω), B(ω) must, likewise, be regarded as stochastic processes deﬁned over

the interval [0, π]. A single realisation of these processes now corresponds to a

single realisation of the process y(t).

The ﬁrst assumption to be made is that the functions A(ω) and B(ω)

represent a pair of stochastic processes of zero mean which are indexed on the

continuous parameter ω. Thus

(4.21) E

_

dA(ω)

_

= E

_

dB(ω)

_

= 0.

The second and third assumptions are that the two processes are mutu-

ally uncorrelated and that non-overlapping increments within each process are

uncorrelated. Thus

(4.22)

E

_

dA(ω)dB(λ)

_

= 0 for all ω, λ,

E

_

dA(ω)dA(λ)

_

= 0 if ω = λ,

E

_

dB(ω)dB(λ)

_

= 0 if ω = λ.

The ﬁnal assumption is that the variance of the increments is given by

(4.23)

V

_

dA(ω)

_

= V

_

dB(ω)

_

= 2dF(ω)

= 2f(ω)dω.

We can see that, unlike A(ω) and B(ω), F(ω) is a continuous diﬀerentiable

function. The function F(ω) and its derivative f(ω) are the spectral distribu-

tion function and the spectral density function, respectively.

In order to express equation (20) in terms of complex exponentials, we

may deﬁne a pair of conjugate complex stochastic processes:

(4.24)

dZ(ω) =

1

2

_

dA(ω) −idB(ω)

_

,

dZ

∗

(ω) =

1

2

_

dA(ω) +idB(ω)

_

.

58

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

Also, we may extend the domain of the functions A(ω), B(ω) from [0, π] to

[−π, π] by regarding A(ω) as an even function such that A(−ω) = A(ω) and by

regarding B(ω) as an odd function such that B(−ω) = −B(ω). Then we have

(4.25) dZ

∗

(ω) = dZ(−ω).

From conditions under (22), it follows that

(4.26)

E

_

dZ(ω)dZ

∗

(λ)

_

= 0 if ω = λ,

E{dZ(ω)dZ

∗

(ω)} = f(ω)dω.

These results may be used to reexpress equation (20) as

(4.27)

y(t) =

_

π

0

_

(e

iωt

+e

−iωt

)

2

dA(ω) −i

(e

iωt

−e

−iωt

)

2

dB(ω)

_

=

_

π

0

_

e

iωt

{dA(ω) −idB(ω)}

2

+e

−iωt

{dA(ω) +idB(ω)}

2

_

=

_

π

0

_

e

iωt

dZ(ω) +e

−iωt

dZ

∗

(ω)

_

.

When the integral is extended over the range [−π, π], this becomes

(4.28) y(t) =

_

π

−π

e

iωt

dZ(ω).

This is commonly described as the spectral representation of the process y(t).

The Autocovariances and the Spectral Density Function

The sequence of the autocovariances of the process y(t) may be expressed

in terms of the spectrum of the process. From equation (28), it follows that

the autocovariance y

t

at lag τ = t −k is given by

(4.29)

γ

τ

= C(y

t

, y

k

) = E

__

ω

e

iωt

dZ(ω)

_

λ

e

−iλk

dZ(−λ)

_

=

_

ω

_

λ

e

iωt

e

−iλk

E{dZ(ω)dZ

∗

(λ)}

=

_

ω

e

iωτ

E{dZ(ω)dZ

∗

(ω)}

=

_

ω

e

iωτ

f(ω)dω.

59

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

−0.75

0 5 10 15 20 25

0

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 2. The theoretical autocorrelation function of the ARMA(2, 2)

process (1 − 1.344L + 0.902L

2

)y(t) = (1 − 1.691L + 0.810L

2

)ε(t) and

(below) the corresponding spectral density function.

60

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

Here the ﬁnal equalities are derived by using the results (25) and (26). This

equation indicates that the Fourier transform of the spectrum is the autoco-

variance function.

The inverse mapping from the autocovariances to the spectrum is given by

(4.30)

f(ω) =

1

2π

∞

τ=−∞

γ

τ

e

−iωτ

=

1

2π

_

γ

0

+ 2

∞

τ=1

γ

τ

cos(ωτ)

_

.

This function is directly comparable to the periodogram of a data sequence

which is deﬁned under (2.41). However, the periodogram has T empirical auto-

covariances c

0

, . . . , c

T−1

in place of an indeﬁnite number of theoretical autoco-

variances. Also, it diﬀers from the spectrum by a scalar factor of 4π. In many

texts, equation (30) serves as the primary deﬁnition of the spectrum.

To demonstrate the relationship which exists between equations (29) and

(30), we may substitute the latter into the former to give

(4.31)

γ

τ

=

_

π

−π

e

iωτ

_

1

2π

∞

τ=−∞

γ

τ

e

−iωτ

_

dω

=

1

2π

∞

κ=−∞

γ

κ

_

π

−π

e

iω(τ−κ)

dω.

From the fact that

(4.32)

_

π

−π

e

iω(τ−κ)

dω =

_

2π, if κ = τ;

0, if κ = τ,

it can be seen that the RHS of the equation reduces to γ

τ

. This serves to show

that equations (29) and (30) do indeed represent a Fourier transform and its

inverse.

The essential interpretation of the spectral density function is indicated by

the equation

(4.33) γ

0

=

_

ω

f(ω)dω,

which comes from setting τ = 0 in equation (29). This equation shows how the

variance or ‘power’ of y(t), which is γ

0

, is attributed to the cyclical components

of which the process is composed.

61

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

It is easy to see that a ﬂat spectrum corresponds to the autocovariance

function which characterises a white-noise process ε(t). Let f

ε

= f

ε

(ω) be the

ﬂat spectrum. Then, from equation (30), it follows that

(4.34)

γ

0

=

_

π

−π

f

ε

(ω)dω

= 2πf

ε

,

and, from equation (29), it follows that

(4.35)

γ

τ

=

_

π

−π

f

ε

(ω)e

iωτ

dω

= f

ε

_

π

−π

e

iωτ

dω

= 0.

These are the same as the conditions under (6) which have served to deﬁne a

white-noise process. When the variance is denoted by σ

2

ε

, the expression for

the spectrum of the white-noise process becomes

(4.36) f

ε

(ω) =

σ

2

ε

2π

.

Canonical Factorisation of the Spectral Density Function

Let y(t) be a stationary stochastic process whose spectrum is f

y

(ω). Since

f

y

(ω) ≥ 0, it is always possible to ﬁnd a complex function µ(ω) such that

(4.37) f

y

(ω) =

1

2π

µ(ω)µ

∗

(ω).

For a wide class of stochastic processes, the function µ(ω) may be constructed

in such a way that it can be expanded as a one-sided Fourier series:

(4.38) µ(ω) =

∞

j=0

µ

j

e

−iωj

.

On deﬁning

(4.39) dZ

ε

(ω) =

dZ

y

(ω)

µ(ω)

,

62

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

the spectral representation of the process y(t) given in equation (28), may be

rewritten as

(4.40) y(t) =

_

ω

e

iωt

µ(ω)dZ

ε

(ω).

Expanding the expression of µ(ω) and interchanging the order of integra-

tion and summation gives

(4.41)

y(t) =

_

ω

e

iωt

_

j

µ

j

e

−iωj

_

dZ

ε

(ω)

=

j

µ

j

__

ω

e

iω(t−j)

dZ

ε

(ω)

_

=

j

µ

j

ε(t −j),

where we have deﬁned

(4.42) ε(t) =

_

ω

e

iωt

dZ

ε

(ω).

The spectrum of ε(t) is given by

(4.43)

E{dZ

ε

(ω)dZ

∗

ε

(ω)} = E

_

dZ

y

(ω)dZ

∗

y

(ω)

µ(ω)µ

∗

(ω)

_

=

f

y

(ω)

µ(ω)µ

∗

(ω)

=

1

2π

.

Hence ε(t) is identiﬁed as a white-noise process with unit variance. Therefore

equation (38) represents a moving-average process; and what our analysis im-

plies is that virtually every stationary stochastic process can be represented in

this way.

The Frequency-Domain Analysis of Filtering

It is a straightforward matter to derive the spectrum of a process y(t) =

µ(L)x(t) which is formed by mapping the process x(t) through a linear ﬁlter.

Taking the spectral representation of the process x(t) to be

(4.44) x(t) =

_

ω

e

iωt

dZ

x

(ω),

63

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

we have

(4.45)

y(t) =

j

µ

j

x(t −j)

=

j

µ

j

__

ω

e

iω(t−j)

dZ

x

(ω)

_

=

_

ω

e

iωt

_

j

µ

j

e

−iωj

_

dZ

x

(ω).

On writing

µ

j

e

−iωj

= µ(ω), this becomes

(4.46)

y(t) =

_

ω

e

iωt

µ(ω)dZ

x

(ω)

=

_

ω

e

iωt

dZ

y

(ω).

It follows that the spectral density function f

y

(ω) of the ﬁltered process y(t) is

given by

(4.47)

f

y

(ω)dω = E{dZ

y

(ω)dZ

∗

y

(ω)}

= µ(ω)µ

∗

(ω)E{dZ

x

(ω)dZ

∗

x

(ω)}

= |µ(ω)|

2

f

x

(ω)dω.

In the case of the process deﬁned in equation (7), where y(t) is obtained by

ﬁltering a white-noise sequence, the result is specialised to give

(4.48)

f

y

(ω) = |µ(ω)|

2

f

ε

(ω)

=

σ

2

ε

2π

|µ(ω)|

2

.

Let µ(z) =

µ

j

z

j

denote the z-transform of the sequence {µ

j

}. Then

(4.49)

|µ(z)|

2

= µ(z)µ(z

−1

)

=

τ

j

µ

j

µ

j+τ

z

τ

.

It follows that, when z = e

−iω

, equation (48) can be written as

(4.50)

f

y

(ω) =

σ

2

ε

2π

µ(z)µ(z

−1

)

=

1

2π

τ

_

σ

2

ε

j

µ

j

µ

j+τ

_

z

τ

.

64

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

But, according to equation (10), γ

τ

= σ

2

ε

j

µ

j

µ

j+τ

is the autocovariance of

lag τ of the process y(t). Therefore, the function f

y

(ω) can be written as

(4.51)

f

y

(ω) =

1

2π

∞

τ=−∞

e

−iωτ

γ

τ

=

1

2π

_

γ

0

+ 2

∞

τ=1

γ

τ

cos(ωτ)

_

,

which indicates that the spectral density function is the Fourier transform of the

autocovariance function of the ﬁltered sequence. This is known as the Wiener–

Khintchine theorem. The importance of this theorem is that it provides a link

between the time domain and the frequency domain.

The Gain and Phase

The complex-valued function µ(ω), which is entailed in the process of linear

ﬁltering, can be written as

(4.52) µ(ω) = |µ(ω)|e

−iθ(ω)

.

where

(4.53)

|µ(ω)|

2

=

_

∞

j=0

µ

j

cos(ωj)

_

2

+

_

∞

j=0

µ

j

sin(ωj)

_

2

θ(ω) = arctan

_

µ

j

sin(ωj)

µ

j

cos(ωj)

_

.

The function |µ(ω)|, which is described as the gain of the ﬁlter, indicates

the extent to which the amplitude of the cyclical components of which x(t) is

composed are altered in the process of ﬁltering.

The function θ(ω), which is described as the phase displacement and which

gives a measure in radians, indicates the extent to which the cyclical compo-

nents are displaced along the time axis.

The substitution of expression (52) in equation (46) gives

(4.54) y(t) =

_

π

−π

e

i{ωt−θ(ω)}

|µ(ω)|dZ

x

(ω).

The importance of this equation is that it summarises the two eﬀects of the

ﬁlter.

65

LECTURE 5

Linear Stochastic Models

Autcovariances of a Stationary Process

A temporal stochastic process is simply a sequence of random variables

indexed by a time subscript. Such a process can be denoted by x(t). The

element of the sequence at the point t = τ is x

τ

= x(τ).

Let {x

τ+1

, x

τ+2

, . . . , x

τ+n

} denote n consecutive elements of the sequence.

Then the process is said to be strictly stationary if the joint probability distri-

bution of the elements does not depend on τ regardless of the size of n. This

means that any two segments of the sequence of equal length have identical

probability density functions. In consequence, the decision on where to place

the time origin is arbitrary; and the argument τ can be omitted. Some further

implications of stationarity are that

(5.1) E(x

t

) = µ < ∞ for all t and C(x

τ+t

, x

τ+s

) = γ

|t−s|

.

The latter condition means that the covariance of any two elements depends

only on their temporal separation |t − s|. Notice that, if the elements of the

sequence are normally distributed, then the two conditions are suﬃcient to

establish strict stationarity. On their own, they constitute the conditions of

weak or 2nd-order stationarity.

The condition on the covariances implies that the dispersion matrix of the

vector [x

1

, x

2

, . . . , x

n

] is a bisymmetric Laurent matrix of the form

(5.2) Γ =

_

¸

¸

¸

¸

_

γ

0

γ

1

γ

2

. . . γ

n−1

γ

1

γ

0

γ

1

. . . γ

n−2

γ

2

γ

1

γ

0

. . . γ

n−3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

n−1

γ

n−2

γ

n−3

. . . γ

0

_

¸

¸

¸

¸

_

,

wherein the generic element in the (i, j)th position is γ

|i−j|

= C(x

i

, x

j

). Given

that a sequence of observations of a time series represents only a segment of

a single realisation of a stochastic process, one might imagine that there is

little chance of making valid inferences about the parameters of the process.

66

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

However, provided that the process x(t) is stationary and provided that the

statistical dependencies between widely separated elements of the sequence are

weak, it is possible to estimate consistently those parameters of the process

which express the dependence of proximate elements of the sequence. If one

is prepared to make suﬃciently strong assumptions about the nature of the

process, then a knowledge of such parameters may be all that is needed for a

complete characterisation of the process.

Moving-Average Processes

The qth-order moving average process, or MA(q) process, is deﬁned by the

equation

(5.3) y(t) = µ

0

ε(t) + µ

1

ε(t − 1) + · · · + µ

q

ε(t − q),

where ε(t), which has E{ε(t)} = 0, is a white-noise process consisting of a

sequence of independently and identically distributed random variables with

zero expectations. The equation is normalised either by setting µ

0

= 1 or by

setting V {ε(t)} = σ

2

ε

= 1. The equation can be written in summary notation

as y(t) = µ(L)ε(t), where µ(L) = µ

0

+µ

1

L+· · · +µ

q

L

q

is a polynomial in the

lag operator.

A moving-average process is clearly stationary since any two elements

y

t

and y

s

represent the same function of the vectors [ε

t

, ε

t−1

, . . . , ε

t−q

] and

[ε

s

, ε

s−1

, . . . , ε

s−q

] which are identically distributed. In addition to the condi-

tion of stationarity, it is usually required that a moving-average process should

be invertible such that it can be expressed in the form of µ

−1

(L)y(t) = ε(t)

where the LHS embodies a convergent sum of past values of y(t). This is an

inﬁnite-order autoregressive representation of the process. The representation

is available only if all the roots of the equation µ(z) = µ

0

+µ

1

z +· · · +µ

q

z

q

= 0

lie outside the unit circle. This conclusion follows from our discussion of partial

fractions.

As an example, let us consider the ﬁrst-order moving-average process which

is deﬁned by

(5.4) y(t) = ε(t) − θε(t − 1) = (1 − θL)ε(t).

Provided that |θ| < 1, this can be written in autoregressive form as

(5.5)

ε(t) = (1 − θL)

−1

y(t)

=

_

y(t) + θy(t − 1) + θ

2

y(t − 2) + · · ·

_

.

Imagine that |θ| > 1 instead. Then, to obtain a convergent series, we have to

write

(5.6)

y(t + 1) = ε(t + 1) − θε(t)

= −θ(1 − L

−1

/θ)ε(t),

67

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

where L

−1

ε(t) = ε(t + 1). This gives

(5.7)

ε(t) = −θ

−1

(1 − L

−1

/θ)

−1

y(t + 1)

= −θ

−1

_

y(t + 1)/θ + y(t + 2)/θ

2

+ y(t − 3)/θ

3

+ · · ·

_

.

Normally, an expression such as this, which embodies future values of y(t),

would have no reasonable meaning.

It is straightforward to generate the sequence of autocovariances from a

knowledge of the parameters of the moving-average process and of the variance

of the white-noise process. Consider

(5.8)

γ

τ

= E(y

t

y

t−τ

)

= E

_

i

µ

i

ε

t−i

j

µ

j

ε

t−τ−j

_

=

i

j

µ

i

µ

j

E(ε

t−i

ε

t−τ−j

).

Since ε(t) is a sequence of independently and identically distributed random

variables with zero expectations, it follows that

(5.9) E(ε

t−i

ε

t−τ−j

) =

_

0, if i = τ + j;

σ

2

ε

, if i = τ + j.

Therefore

(5.10) γ

τ

= σ

2

ε

j

µ

j

µ

j+τ

.

Now let τ = 0, 1, . . . , q. This gives

(5.11)

γ

0

= σ

2

ε

(µ

2

0

+ µ

2

1

+ · · · + µ

2

q

),

γ

1

= σ

2

ε

(µ

0

µ

1

+ µ

1

µ

2

+ · · · + µ

q−1

µ

q

),

.

.

.

γ

q

= σ

2

ε

µ

0

µ

q

.

Also, γ

τ

= 0 for all τ > q.

The ﬁrst-order moving-average process y(t) = ε(t) − θε(t − 1) has the

following autocovariances:

(5.12)

γ

0

= σ

2

ε

(1 + θ

2

),

γ

1

= −σ

2

ε

θ,

γ

τ

= 0 if τ > 1.

68

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

Thus, for a vector y = [y

1

, y

2

, . . . , y

T

]

**of T consecutive elements from a ﬁrst-
**

order moving-average process, the dispersion matrix is

(5.13) D(y) = σ

2

ε

_

¸

¸

¸

¸

_

1 + θ

2

−θ 0 . . . 0

−θ 1 + θ

2

−θ . . . 0

0 −θ 1 + θ

2

. . . 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 0 0 . . . 1 + θ

2

_

¸

¸

¸

¸

_

.

In general, the dispersion matrix of a qth-order moving-average process has q

subdiagonal and q supradiagonal bands of nonzero elements and zero elements

elsewhere.

It is also helpful to deﬁne an autocovariance generating function which is a

power series whose coeﬃcients are the autocovariances γ

τ

for successive values

of τ. This is denoted by

(5.14) γ(z) =

τ

γ

τ

z

τ

; with τ = {0, ±1, ±2, . . .} and γ

τ

= γ

−τ

.

The generating function is also called the z-transform of the autocovariance

function.

The autocovariance generating function of the qth-order moving-average

process can be found quite readily. Consider the convolution

(5.15)

µ(z)µ(z

−1

) =

i

µ

i

z

i

j

µ

j

z

−j

=

i

j

µ

i

µ

j

z

i−j

=

τ

_

j

µ

i

µ

j+τ

_

z

τ

, τ = i − j.

By referring to the expression for the autocovariance of lag τ of a moving-

average process given under (10), it can be seen that the autocovariance gen-

erating function is just

(5.16) γ(z) = σ

2

ε

µ(z)µ(z

−1

).

Autoregressive Processes

The pth-order autoregressive process, or AR(p) process, is deﬁned by the

equation

(5.17) α

0

y(t) + α

1

y(t − 1) + · · · + α

p

y(t − p) = ε(t).

69

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

This equation is invariably normalised by setting α

0

= 1, although it would

be possible to set σ

2

ε

= 1 instead. The equation can be written in summary

notation as α(L)y(t) = ε(t), where α(L) = α

0

+ α

1

L + · · · + α

p

L

p

. For the

process to be stationary, the roots of the equation α(z) = α

0

+ α

1

z + · · · +

α

p

z

p

= 0 must lie outside the unit circle. This condition enables us to write

the autoregressive process as an inﬁnite-order moving-average process in the

form of y(t) = α

−1

(L)ε(t).

As an example, let us consider the ﬁrst-order autoregressive process which

is deﬁned by

(5.18)

ε(t) = y(t) − φy(t − 1)

= (1 − φL)y(t).

Provided that the process is stationary with |φ| < 1, it can be represented in

moving-average form as

(5.19)

y(t) = (1 − φL)

−1

ε(t)

=

_

ε(t) + φε(t − 1) + φ

2

ε(t − 2) + · · ·

_

.

The autocovariances of the process can be found by using the formula of (10)

which is applicable to moving-average process of ﬁnite or inﬁnite order. Thus

(5.20)

γ

τ

= E(y

t

y

t−τ

)

= E

_

i

φ

i

ε

t−i

j

φ

j

ε

t−τ−j

_

=

i

j

φ

i

φ

j

E(ε

t−i

ε

t−τ−j

);

and the result under (9) indicates that

(5.21)

γ

τ

= σ

2

ε

j

φ

j

φ

j+τ

=

σ

2

ε

φ

τ

1 − φ

2

.

For a vector y = [y

1

, y

2

, . . . , y

T

]

**of T consecutive elements from a ﬁrst-order
**

autoregressive process, the dispersion matrix has the form

(5.22) D(y) =

σ

2

ε

1 − φ

2

_

¸

¸

¸

¸

_

1 φ φ

2

. . . φ

T−1

φ 1 φ . . . φ

T−2

φ

2

φ 1 . . . φ

T−3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

φ

T−1

φ

T−2

φ

T−3

. . . 1

_

¸

¸

¸

¸

_

.

70

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

To ﬁnd the autocovariance generating function for the general pth-order

autoregressive process, we may consider again the function α(z) =

i

α

i

z

i

.

Since an autoregressive process may be treated as an inﬁnite-order moving-

average process, it follows that

(5.23) γ(z) =

σ

2

ε

α(z)α(z

−1

)

.

For an alternative way of ﬁnding the autocovariances of the pth-order process,

consider multiplying

i

α

i

y

t−i

= ε

t

by y

t−τ

and taking expectations to give

(5.24)

i

α

i

E(y

t−i

y

t−τ

) = E(ε

t

y

t−τ

).

Taking account of the normalisation α

0

= 1, we ﬁnd that

(5.25) E(ε

t

y

t−τ

) =

_

σ

2

ε

, if τ = 0;

0, if τ > 0.

Therefore, on setting E(y

t−i

y

t−τ

) = γ

τ−i

, equation (24) gives

(5.26)

i

α

i

γ

τ−i

=

_

σ

2

ε

, if τ = 0;

0, if τ > 0.

The second of these is a homogeneous diﬀerence equation which enables us to

generate the sequence {γ

p

, γ

p+1

, . . .} once p starting values γ

0

, γ

1

, . . . , γ

p−1

are

known. By letting τ = 0, 1, . . . , p in (26), we generate a set of p + 1 equations

which can be arrayed in matrix form as follows:

(5.27)

_

¸

¸

¸

¸

_

γ

0

γ

1

γ

2

. . . γ

p

γ

1

γ

0

γ

1

. . . γ

p−1

γ

2

γ

1

γ

0

. . . γ

p−2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

p

γ

p−1

γ

p−2

. . . γ

0

_

¸

¸

¸

¸

_

_

¸

¸

¸

¸

_

1

α

1

α

2

.

.

.

α

p

_

¸

¸

¸

¸

_

=

_

¸

¸

¸

¸

_

σ

2

ε

0

0

.

.

.

0

_

¸

¸

¸

¸

_

.

These are called the Yule–Walker equations, and they can be used either for

generating the values γ

0

, γ

1

, . . . , γ

p

from the values α

1

, . . . , α

p

, σ

2

ε

or vice versa.

For an example of the two uses of the Yule–Walker equations, let us con-

sider the second-order autoregressive process. In that case, we have

(5.28)

_

_

γ

0

γ

1

γ

2

γ

1

γ

0

γ

1

γ

2

γ

1

γ

0

_

_

_

_

α

0

α

1

α

2

_

_

=

_

_

α

2

α

1

α

0

0 0

0 α

2

α

1

α

0

0

0 0 α

2

α

1

α

0

_

_

_

¸

¸

¸

_

γ

2

γ

1

γ

0

γ

1

γ

2

_

¸

¸

¸

_

=

_

_

α

0

α

1

α

2

α

1

α

0

+ α

2

0

α

2

α

1

α

0

_

_

_

_

γ

0

γ

1

γ

2

_

_

=

_

_

σ

2

ε

0

0

_

_

.

71

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Given α

0

= 1 and the values for γ

0

, γ

1

, γ

2

, we can ﬁnd σ

2

ε

and α

1

, α

2

. Con-

versely, given α

0

, α

1

, α

2

and σ

2

ε

, we can ﬁnd γ

0

, γ

1

, γ

2

. It is worth recalling at

this juncture that the normalisation σ

2

ε

= 1 might have been chosen instead

of α

0

= 1. This would have rendered the equations more easily intelligible.

Notice also how the matrix following the ﬁrst equality is folded across the axis

which divides it vertically to give the matrix which follows the second equality.

Pleasing eﬀects of this sort often arise in time-series analysis.

The Partial Autocorrelation Function

Let α

r(r)

be the coeﬃcient associated with y(t − r) in an autoregres-

sive process of order r whose parameters correspond to the autocovariances

γ

0

, γ

1

, . . . , γ

r

. Then the sequence {α

r(r)

; r = 1, 2, . . .} of such coeﬃcients, whose

index corresponds to models of increasing orders, constitutes the partial auto-

correlation function. In eﬀect, α

r(r)

indicates the role in explaining the variance

of y(t) which is due to y(t − r) when y(t − 1), . . . , y(t − r + 1) are also taken

into account.

Much of the theoretical importance of the partial autocorrelation function

is due to the fact that, when γ

0

is added, it represents an alternative way of

conveying the information which is present in the sequence of autocorrelations.

Its role in identifying the order of an autoregressive process is evident; for, if

α

r(r)

= 0 and if α

p(p)

= 0 for all p > r, then it is clearly implied that the

process has an order of r.

The sequence of partial autocorrelations may be computed eﬃciently via

the recursive Durbin–Levinson Algorithm which uses the coeﬃcients of the AR

model of order r as the basis for calculating the coeﬃcients of the model of

order r + 1.

To derive the algorithm, let us imagine that we already have the values

α

0(r)

= 1, α

1(r)

, . . . , α

r(r)

. Then, by extending the set of rth-order Yule–Walker

equations to which these values correspond, we can derive the system

(5.29)

_

¸

¸

¸

¸

_

γ

0

γ

1

. . . γ

r

γ

r+1

γ

1

γ

0

. . . γ

r−1

γ

r

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

r

γ

r−1

. . . γ

0

γ

1

γ

r+1

γ

r

. . . γ

1

γ

0

_

¸

¸

¸

¸

_

_

¸

¸

¸

¸

¸

_

1

α

1(r)

.

.

.

α

r(r)

0

_

¸

¸

¸

¸

¸

_

=

_

¸

¸

¸

¸

_

σ

2

(r)

0

.

.

.

0

g

_

¸

¸

¸

¸

_

,

wherein

(5.30) g =

r

j=0

α

j(r)

γ

r+1−j

with α

0(r)

= 1.

72

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

The system can also be written as

(5.31)

_

¸

¸

¸

¸

_

γ

0

γ

1

. . . γ

r

γ

r+1

γ

1

γ

0

. . . γ

r−1

γ

r

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

r

γ

r−1

. . . γ

0

γ

1

γ

r+1

γ

r

. . . γ

1

γ

0

_

¸

¸

¸

¸

_

_

¸

¸

¸

¸

¸

_

0

α

r(r)

.

.

.

α

1(r)

1

_

¸

¸

¸

¸

¸

_

=

_

¸

¸

¸

¸

_

g

0

.

.

.

0

σ

2

(r)

_

¸

¸

¸

¸

_

.

The two systems of equations (29) and (31) can be combined to give

(5.32)

_

¸

¸

¸

¸

_

γ

0

γ

1

. . . γ

r

γ

r+1

γ

1

γ

0

. . . γ

r−1

γ

r

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

γ

r

γ

r−1

. . . γ

0

γ

1

γ

r+1

γ

r

. . . γ

1

γ

0

_

¸

¸

¸

¸

_

_

¸

¸

¸

¸

¸

_

1

α

1(r)

+ cα

r(r)

.

.

.

α

r(r)

+ cα

1(r)

c

_

¸

¸

¸

¸

¸

_

=

_

¸

¸

¸

¸

_

σ

2

(r)

+ cg

0

.

.

.

0

g + cσ

2

(r)

_

¸

¸

¸

¸

_

.

If we take the coeﬃcient of the combination to be

(5.33) c = −

g

σ

2

(r)

,

then the ﬁnal element in the vector on the RHS becomes zero and the system

becomes the set of Yule–Walker equations of order r + 1. The solution of the

equations, from the last element α

r+1(r+1)

= c through to the variance term

σ

2

(r+1)

is given by

(5.34)

α

r+1(r+1)

=

1

σ

2

(r)

_

r

j=0

α

j(r)

γ

r+1−j

_

_

¸

_

α

1(r+1)

.

.

.

α

r(r+1)

_

¸

_

=

_

¸

_

α

1(r)

.

.

.

α

r(r)

_

¸

_

+ α

r+1(r+1)

_

¸

_

α

r(r)

.

.

.

α

1(r)

_

¸

_

σ

2

(r+1)

= σ

2

(r)

_

1 − (α

r+1(r+1)

)

2

_

.

Thus the solution of the Yule–Walker system of order r + 1 is easily derived

from the solution of the system of order r, and there is scope for devising a

recursive procedure. The starting values for the recursion are

(5.35) α

1(1)

= −γ

1

/γ

0

and σ

2

(1)

= γ

0

_

1 − (α

1(1)

)

2

_

.

73

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

Autoregressive Moving Average Processes

The autoregressive moving-average process of orders p and q, which is

referred to as the ARMA(p, q) process, is deﬁned by the equation

(5.36)

α

0

y(t) + α

1

y(t − 1) + · · · + α

p

y(t − p)

= µ

0

ε(t) + µ

1

ε(t − 1) + · · · + µ

q

ε(t − q).

The equation is normalised by setting α

0

= 1 and by setting either µ

0

= 1

or σ

2

ε

= 1. A more summary expression for the equation is α(L)y(t) = µ(L)ε(t).

Provided that the roots of the equation α(z) = 0 lie outside the unit circle,

the process can be represented by the equation y(t) = α

−1

(L)µ(L)ε(t) which

corresponds to an inﬁnite-order moving-average process. Conversely, provided

the roots of the equation µ(z) = 0 lie outside the unit circle, the process can

be represented by the equation µ

−1

(L)α(L)y(t) = ε(t) which corresponds to an

inﬁnite-order autoregressive process.

By considering the moving-average form of the process, and by noting the

form of the autocovariance generating function for such a process which is given

by equation (16), it can be seen that the autocovariance generating function

for the autoregressive moving-average process is

(5.37) γ(z) = σ

2

ε

µ(z)µ(z

−1

)

α(z)α(z

−1

)

.

This generating function, which is of some theoretical interest, does not

provide a practical means of ﬁnding the autocovariances. To ﬁnd these, let us

consider multiplying the equation

i

α

i

y

t−i

=

i

µ

i

ε

t−i

by y

t−τ

and taking

expectations. This gives

(5.38)

i

α

i

γ

τ−i

=

i

µ

i

δ

i−τ

,

where γ

τ−i

= E(y

t−τ

y

t−i

) and δ

i−τ

= E(y

t−τ

ε

t−i

). Since ε

t−i

is uncorrelated

with y

t−τ

whenever it is subsequent to the latter, it follows that δ

i−τ

= 0 if

τ > i. Since the index i in the RHS of the equation (38) runs from 0 to q, it

follows that

(5.39)

i

α

i

γ

i−τ

= 0 if τ > q.

Given the q +1 nonzero values δ

0

, δ

1

, . . . , δ

q

, and p initial values γ

0

, γ

1

, . . . , γ

p−1

for the autocovariances, the equations can be solved recursively to obtain the

subsequent values {γ

p

, γ

p+1

, . . .}.

74

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

To ﬁnd the requisite values δ

0

, δ

1

, . . . , δ

q

, consider multiplying the equation

i

α

i

y

t−i

=

i

µ

i

ε

t−i

by ε

t−τ

and taking expectations. This gives

(5.40)

i

α

i

δ

τ−i

= µ

τ

σ

2

ε

,

where δ

τ−i

= E(y

t−i

ε

t−τ

). The equation may be rewritten as

(5.41) δ

τ

=

1

α

0

_

µ

τ

σ

2

ε

−

i=1

δ

τ−i

_

,

and, by setting τ = 0, 1, . . . , q, we can generate recursively the required values

δ

0

, δ

1

, . . . , δ

q

.

Example. Consider the ARMA(2, 2) model which gives the equation

(5.42) α

0

y

t

+ α

1

y

t−1

+ α

2

y

t−2

= µ

0

ε

t

+ µ

1

ε

t−1

+ µ

2

ε

t−2

.

Multiplying by y

t

, y

t−1

and y

t−2

and taking expectations gives

(5.43)

_

_

γ

0

γ

1

γ

2

γ

1

γ

0

γ

1

γ

2

γ

1

γ

0

_

_

_

_

α

0

α

1

α

2

_

_

=

_

_

δ

0

δ

1

δ

2

0 δ

0

δ

1

0 0 δ

0

_

_

_

_

µ

0

µ

1

µ

2

_

_

.

Multiplying by ε

t

, ε

t−1

and ε

t−2

and taking expectations gives

(5.44)

_

_

δ

0

0 0

δ

1

δ

0

0

δ

2

δ

1

δ

0

_

_

_

_

α

0

α

1

α

2

_

_

=

_

_

σ

2

ε

0 0

0 σ

2

ε

0

0 0 σ

2

ε

_

_

_

_

µ

0

µ

1

µ

2

_

_

.

When the latter equations are written as

(5.45)

_

_

α

0

0 0

α

1

α

0

0

α

2

α

1

α

0

_

_

_

_

δ

0

δ

1

δ

2

_

_

= σ

2

ε

_

_

µ

0

µ

1

µ

2

_

_

,

they can be solved recursively for δ

0

, δ

1

and δ

2

on the assumption that that

the values of α

0

, α

1

, α

2

and σ

2

ε

are known. Notice that, when we adopt the

normalisation α

0

= µ

0

= 1, we get δ

0

= σ

2

ε

. When the equations (43) are

rewritten as

(5.46)

_

_

α

0

α

1

α

2

α

1

α

0

+ α

2

0

α

2

α

1

α

0

_

_

_

_

γ

0

γ

1

γ

2

_

_

=

_

_

µ

0

µ

1

µ

2

µ

1

µ

2

0

µ

2

0 0

_

_

_

_

δ

0

δ

1

δ

2

_

_

,

they can be solved for γ

0

, γ

1

and γ

2

. Thus the starting values are obtained

which enable the equation

(5.47) α

0

γ

τ

+ α

1

γ

τ−1

+ α

2

γ

τ−2

= 0; τ > 2

to be solved recursively to generate the succeeding values {γ

3

, γ

4

, . . .} of the

autocovariances.

75

THE METHODS OF TIME-SERIES ANALYSIS

by

D.S.G. Pollock

Queen Mary and Westﬁeld College,

The University of London

The methods to be presented in this lecture are designed for the purpose of

analysing series of statistical observations taken at regular intervals in time.

The methods have a wide range of applications. We can cite astronomy [18],

meteorology [9], seismology [21], oceanography [11], communications engineer-

ing and signal processing [16], the control of continuous process plants [20],

neurology and electroencephalography [1], [25], and economics [10]; and this

list is by no means complete.

1. The Frequency Domain and the Time Domain

The methods apply, in the main, to what are described as stationary or

non-evolutionary time series. Such series manifest statistical properties which

are invariant throughout time, so that the behaviour during one epoch is the

same as it would be during any other.

When we speak of a weakly stationary or covariance-stationary process,

we have in mind a sequence of random variables y(t) = {y

t

; t = 0, ±1, ±2, . . .},

representing the potential observations of the process, which have a common

ﬁnite expected value E(x

t

) = µ and a set of autocovariances C(y

t

, y

s

) = E{(y

t

−

µ)(y

s

−µ)} = γ

|t−s|

which depend only on the temporal separation τ = |t−s| of

the dates t and s and not on their absolute values. We also commonly require

of such a process that lim(τ → ∞)γ

τ

= 0 which is to say that the correlation

between increasingly remote elements of the sequence tends to zero. This is

a way of expressing the notion that the events of the past have a diminishing

eﬀect upon the present as they recede in time. In an appendix to the paper,

we review the deﬁnitions of mathematical expectations and covariances.

There are two distinct yet broadly equivalent modes of time-series anal-

ysis which may be pursued. On the one hand are the time-domain methods

which have their origin in the classical theory of correlation. Such methods

deal preponderantly with the autocovariance functions and the cross-covariance

functions of the series, and they lead inevitably towards the construction of

structural or parametric models of the autoregressive moving-average type for

1

THE METHODS OF TIME-SERIES ANALYSIS

single series and of the transfer-function type for two or more causally related

series. Many of the methods which are used to estimate the parameters of

these models can be viewed as sophisticated variants of the method of linear

regression.

On the other hand are the frequency-domain methods of spectral analysis.

These are based on an extension of the methods of Fourier analysis which

originate in the idea that, over a ﬁnite interval, any analytic function can be

approximated, to whatever degree of accuracy is desired, by taking a weighted

sum of sine and cosine functions of harmonically increasing frequencies.

2. Harmonic Analysis

The astronomers are usually given credit for being the ﬁrst to apply the

methods of Fourier analysis to time series. Their endeavours could be described

as the search for hidden periodicities within astronomical data. Typical exam-

ples were the attempts to uncover periodicities within the activities recorded

by the Wolfer sunspot index and in the indices of luminosity of variable stars.

The relevant methods were developed over a long period of time. Lagrange

[13] suggested methods for detecting hidden periodicities in 1772 and 1778.

The Dutchman Buys-Ballot [6] propounded eﬀective computational procedures

for the statistical analysis of astronomical data in 1847. However, we should

probably credit Sir Arthur Schuster [17], who in 1889 propounded the technique

of periodogram analysis, with being the progenitor of the modern methods for

analysing time series in the frequency domain.

In essence, these frequency-domain methods envisaged a model underlying

the observations which takes the form of

(1)

y(t) =

j

ρ

j

cos(ω

j

t −θ

j

) + ε(t)

=

j

_

α

j

cos(ω

j

t) + β

j

sin(ω

j

t)

_

+ ε(t),

where α

j

= ρ

j

cos θ

j

and β

j

= ρ

j

sin θ

j

, and where ε(t) is a sequence of indepen-

dently and identically distributed random variables which we call a white-noise

process. Thus the model depicts the series y(t) as a weighted sum of perfectly

regular periodic components upon which is superimposed a random component.

The factor ρ

j

=

√

(α

2

j

+ β

2

j

) is called the amplitude of the jth periodic

component, and it indicates the importance of that component within the sum.

Since the variance of a cosine function, which is also called its mean-square

deviation, is just one half, and since cosine functions at diﬀerent frequencies

are uncorrelated, it follows that the variance of y(t) is expressible as V {y(t)} =

1

2

j

ρ

2

j

+ σ

2

ε

where σ

2

ε

= V {ε(t)} is the variance of the noise.

The periodogram is simply a device for determining how much of the vari-

ance of y(t) is attributable to any given harmonic component. Its value at

2

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

ω

j

= 2πj/T, calculated from a sample y

0

, . . . , y

T−1

comprising T observations

on y(t), is given by

(2)

I(ω

j

) =

2

T

_

_

t

y

t

cos(ω

j

)

_

2

+

_

t

y

t

sin(ω

j

)

_

2

_

=

T

2

_

a

2

(ω

j

) + b

2

(ω

j

)

_

.

If y(t) does indeed comprise only a ﬁnite number of well-deﬁned harmonic

components, then it can be shown that 2I(ω

j

)/T is a consistent estimator of

ρ

2

j

in the sense that it converges to the latter in probability as the size T of the

sample of the observations on y(t) increases.

0 10 20 30 40 50 60 70 80 90

Figure 1. The graph of a sine function.

0 10 20 30 40 50 60 70 80 90

Figure 2. Graph of a sine function with small random ﬂuctuations superimposed.

3

THE METHODS OF TIME-SERIES ANALYSIS

The process by which the ordinates of the periodogram converge upon the

squared values of the harmonic amplitudes was well expressed by Yule [24] in

a seminal article of 1927:

If we take a curve representing a simple harmonic function of time,

and superpose on the ordinates small random errors, the only eﬀect is

to make the graph somewhat irregular, leaving the suggestion of peri-

odicity still clear to the eye. If the errors are increased in magnitude,

the graph becomes more irregular, the suggestion of periodicity more

obscure, and we have only suﬃciently to increase the “errors” to mask

completely any appearance of periodicty. But, however large the er-

rors, periodogram analysis is applicable to such a curve, and, given

a suﬃcient number of periods, should yield a close approximation to

the period and amplitude of the underlying harmonic function.

We should not quote this passage without mentioning that Yule proceeded

to question whether the hypothesis underlying periodogram analysis, which

postulates the equation under (1), was an appropriate hypothesis for all cases.

0

50

100

150

1750 1760 1770 1780 1790 1800 1810 1820 1830

0

50

100

150

1840 1850 1860 1870 1880 1890 1900 1910 1920

Figure 3. Wolfer’s Sunspot Numbers 1749–1924.

4

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

A highly successful application of periodogram analysis was that of Whit-

taker and Robinson [22] who, in 1924, showed that the series recording the

brightness or magnitude of the star T. Ursa Major over 600 days could be ﬁt-

ted almost exactly by the sum of two harmonic functions with periods of 24 and

29 days. This led to the suggestion that what was being observed was actu-

ally a two-star system wherein the larger star periodically masked the smaller

brighter star. Somewhat less successful were the attempts of Arthur Schuster

himself [18] in 1906 to substantiate the claim that there is an eleven-year cycle

in the activity recorded by the Wolfer sunspot index.

Other applications of the method of periodogram analysis were even less

successful; and one application which was a signiﬁcant failure was its use by

William Beveridge [2, 3] in 1921 and 1922 to analyse a long series of European

wheat prices. The periodogram of this data had so many peaks that at least

twenty possible hidden periodicities could be picked out, and this seemed to be

many more than could be accounted for by plausible explanations within the

realm of economic history. Such experiences seemed to point to the inappro-

priateness to economic circumstances of a model containing perfectly regular

cycles. A classic expression of disbelief was made by Slutsky [19] in another

article of 1927:

Suppose we are inclined to believe in the reality of the strict periodicity

of the business cycle, such, for example, as the eight-year period pos-

tulated by Moore [14]. Then we should encounter another diﬃculty.

Wherein lies the source of this regularity? What is the mechanism of

causality which, decade after decade, reproduces the same sinusoidal

wave which rises and falls on the surface of the social ocean with the

regularity of day and night?

3. Autoregressive and Moving-Average Models

The next major episode in the history of the development of time-series

analysis took place in the time domain, and it began with the two articles of

1927 by Yule [24] and Slutsky [19] from which we have already quoted. In both

articles, we ﬁnd a rejection of the model with deterministic harmonic compo-

nents in favour of models more ﬁrmly rooted in the notion of random causes. In

a wonderfully ﬁgurative exposition, Yule invited his readers to imagine a pen-

dulum attached to a recording device and left to swing. Then any deviations

from perfectly harmonic motion which might be recorded must be the result

of errors of observation which could be all but eliminated if a long sequence

of observations were subjected to a periodogram analysis. Next, Yule enjoined

the reader to imagine that the regular swing of the pendulum is interrupted by

small boys who get into the room and start pelting the pendulum with peas

sometimes from one side and sometimes from the other. The motion is now

aﬀected not by superposed ﬂuctuations but by true disturbances.

5

THE METHODS OF TIME-SERIES ANALYSIS

In this example, Yule contrives a perfect analogy for the autoregressive

time-series model. To explain the analogy, let us begin by considering a homo-

geneous second-order diﬀerence equation of the form

(3) y(t) = φ

1

y(t −1) + φ

2

y(t −2).

Given the initial values y

−1

and y

−2

, this equation can be used recursively to

generate an ensuing sequence {y

0

, y

1

, . . .}. This sequence will show a regular

pattern of behaviour whose nature depends on the parameters φ

1

and φ

2

. If

these parameters are such that the roots of the quadratic equation z

2

−φ

1

z −

φ

2

= 0 are complex and less than unity in modulus, then the sequence of values

will show a damped sinusoidal behaviour just as a clock pendulum will which

is left to swing without the assistance of the falling weights. In fact, in such a

case, the general solution to the diﬀerence equation will take the form of

(4) y(t) = αρ

t

cos(ωt −θ),

where the modulus ρ, which has a value between 0 and 1, is now the damping

factor which is responsible for the attenuation of the swing as the time t elapses.

The autoregressive model which Yule was proposing takes the form of

(5) y(t) = φ

1

y(t −1) + φ

2

y(t −2) + ε(t),

where ε(t) is, once more, a white-noise sequence. Now, instead of masking

the regular periodicity of the pendulum, the white noise has actually become

the engine which drives the pendulum by striking it randomly in one direction

and another. Its haphazard inﬂuence has replaced the steady force of the

falling weights. Nevertheless, the pendulum will still manifest a deceptively

regular motion which is liable, if the sequence of observations is short and

contains insuﬃcient contrary evidence, to be misinterpreted as the eﬀect of an

underlying mechanism.

In his article of 1927, Yule attempted to explain the Wolfer index in terms

of the second-order autoregressive model of equation (5). From the empirical

autocovariances of the sample represented in Figure 3, he estimated the val-

ues φ

1

= 1.343 and φ

2

= −0.655. The general solution of the corresponding

homogeneous diﬀerence equation has a damping factor of ρ = 0.809 and an

angular velocity of ω = 33.96

o

The angular velocity indicates a period of 10.6

years which is a little shorter than the 11-year period obtained by Schuster

in his periodogram analysis of the same data. In Figure 4, we show a series

which has been generated artiﬁcially from the Yule’s equation together with a

series generated by the equation y(t) = 1.576y(t − 1) − 0.903y(t − 2) + ε(t).

The homogeneous diﬀerence equation which corresponds to the latter has the

same value of ω as before. Its damping factor has the value ρ = 0.95, and this

increase accounts for the greater regularity of the second series.

6

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

0 10 20 30 40 50 60 70 80 90

Figure 4. A series generated by Yule’s equation

y(t) = 1.343y(t −1) −0.655y(t −2) + ε(t).

0 10 20 30 40 50 60 70 80 90

Figure 5. A series generated by the equation

y(t) = 1.576y(t −1) −0.903y(t −2) + ε(t).

Neither of our two series accurately mimics the sunspot index; although

the second series seems closer to it than the series generated by Yule’s equation.

An obvious feature of the sunspot index which is not shared by the artiﬁcial

series is the fact that the numbers are constrained to be nonnegative. To relieve

this constraint, we might apply to Wolf’s numbers y

t

a transformation of the

form log(y

t

+ λ) or of the more general form (y

t

+ λ)

κ−1

, such as has been

advocated by Box and Cox [4]. A transformed series could be more closely

mimicked.

The contributions to time-series analysis made by Yule [24] and Slutsky

[19] in 1927 were complementary: in fact, the two authors grasped opposite

ends of the same pole. For ten years, Slutsky’s paper was available only in its

7

THE METHODS OF TIME-SERIES ANALYSIS

original Russian version; but its contents became widely known within a much

shorter period.

Slutsky posed the same question as did Yule, and in much the same man-

ner. Was it possible, he asked, that a deﬁnite structure of a connection between

chaotically random elements could form them into a system of more or less regu-

lar waves? Slutsky proceeded to demonstrate this possibility by methods which

were partly analytic and partly inductive. He discriminated between coherent

series whose elements were serially correlated and incoherent or purely random

series of the sort which we have described as white noise. As to the coherent

series, he declared that

their origin may be extremely varied, but it seems probable that an

especially prominent role is played in nature by the process of moving

summation with weights of one kind or another; by this process coher-

ent series are obtained from other coherent series or from incoherent

series.

By taking, as his basis, a purely random series obtained by the People’s

Commissariat of Finance in drawing the numbers of a government lottery loan,

and by repeatedly taking moving summations, Slutsky was able to generate a

series which closely mimicked an index, of a distinctly undulatory nature, of

the English business cycle from 1855 to 1877.

The general form of Slutsky’s moving summation can be expressed by

writing

(6) y(t) = µ

0

ε(t) + µ

1

ε(t −1) +· · · + µ

q

ε(t −q),

where ε(t) is a white-noise process. This is nowadays called a qth-order moving-

average process, and it is readily compared to an autoregressive process of the

sort depicted under (5). The more general pth-order autoregressive process can

be expressed by writing

(7) α

0

y(t) + α

1

y(t −1) +· · · + α

p

y(t −p) = ε(t).

Thus, whereas the autoregressive process depends upon a linear combination

of the function y(t) with its own lagged values, the moving-average process

depends upon a similar combination of the function ε(t) with its lagged values.

The aﬃnity of the two sorts of process is further conﬁrmed when it is recognised

that an autoregressive process of ﬁnite order is equivalent to a moving-average

process of inﬁnite order and that, conversely, a ﬁnite-order moving-average

process is just an inﬁnite-order autoregressive process.

8

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

4. Generalised Harmonic Analysis

The next step to be taken in the development of the theory of time series

was to generalise the traditional method of periodogram analysis in such a way

as to overcome the problems which arise when the model depicted under (1) is

clearly inappropriate.

At ﬁrst sight, it would not seem possible to describe a covariance-station-

ary process, whose only regularities are statistical ones, as a linear combination

of perfectly regular periodic components. However any diﬃculties which we

might envisage can be overcome if we are prepared to accept a description

which is in terms of a non-denumerable inﬁnity of periodic components. Thus,

on replacing the so-called Fourier sum within equation (1) by a Fourier integral,

and by deleting the term ε(t), whose eﬀect is now absorbed by the integrand,

we obtain an expression in the form of

(8) y(t) =

_

π

0

_

cos(ωt)dA(ω) + sin(ωt)dB(ω)

_

.

Here we write dA(ω) and dB(ω) rather than α(ω)dω and β(ω)dω because there

can be no presumption that the functions A(ω) and B(ω) are continuous. As it

stands, this expression is devoid of any statistical interpretation. Moreover, if

we are talking of only a single realisation of the process y(t), then the generalised

functions A(ω) and B(ω) will reﬂect the unique peculiarities of that realisation

and will not be amenable to any systematic description.

However, a fruitful interpretation can be given to these functions if we con-

sider the observable sequence y(t) = {y

t

; t = 0, ±1, ±2, . . .} to be a particular

realisation which has been drawn from an inﬁnite population representing all

possible realisations of the process. For, if this population is subject to statis-

tical regularities, then it is reasonable to regard dA(ω) and dB(ω) as mutually

uncorrelated random variables with well-deﬁned distributions which depend

upon the parameters of the population.

We may therefore assume that, for any value of ω,

(9)

E{dA(ω)} = E{dB(ω)} = 0 and

E{dA(ω)dB(ω)} = 0.

Moreover, to express the discontinuous nature of the generalised functions, we

assume that, for any two values ω and λ in their domain, we have

(10) E{dA(ω)dA(λ)} = E{dB(ω)dB(λ)} = 0,

which means that A(ω) and B(ω) are stochastic processes—indexed on the

frequency parameter ω rather than on time—which are uncorrelated in non-

overlapping intervals. Finally, we assume that dA(ω) and dB(ω) have a com-

mon variance so that

(11) V {dA(ω)} = V {dB(ω)} = dG(ω).

9

THE METHODS OF TIME-SERIES ANALYSIS

1

2

3

4

5

0 π/4 π/2 3π/4 π

Figure 6. The spectrum of the process y(t) = 1.343y(t −1) −0.655y(t −

2) +ε(t) which generated the series in Figure 4. A series of a more regular

nature would be generated if the spectrum were more narrowly concentrated

around its modal value.

Given the assumption of the mutual uncorrelatedness of dA(ω) and dB(ω),

it therefore follows from (8) that the variance of y(t) is expressible as

(12)

V {y(t)} =

_

π

0

_

cos

2

(ωt)V {dA(ω)} + sin

2

(ωt)V {dB(ω)}

¸

=

_

π

0

dG(ω).

The function G(ω), which is called the spectral distribution, tells us how much

of the variance is attributable to the periodic components whose frequencies

range continuously from 0 to ω. If none of these components contributes more

than an inﬁnitesimal amount to the total variance, then the function G(ω) is

absolutely continuous, and we can write dG(ω) = g(ω)dω under the integral

of equation (11). The new function g(ω), which is called the spectral den-

sity function or the spectrum, is directly analogous to the function expressing

the squared amplitude which is associated with each component in the simple

harmonic model discussed in our earlier sections.

10

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

5. Smoothing the Periodogram

It might be imagined that there is little hope of obtaining worthwhile es-

timates of the parameters of the population from which the single available

realisation y(t) has been drawn. However, provided that y(t) is a stationary

process, and provided that the statistical dependencies between widely sep-

arated elements are weak, the single realisation contains all the information

which is necessary for the estimation of the spectral density function. In fact,

a modiﬁed version of the traditional periodogram analysis is suﬃcient for the

purpose of estimating the spectral density.

In some respects, the problems posed by the estimation of the spectral

density are similar to those posed by the estimation of a continuous probability

density function of unknown functional form. It is fruitless to attempt directly

to estimate the ordinates of such a function. Instead, we might set about our

task by constructing a histogram or bar chart to show the relative frequencies

with which the observations that have been drawn from the distribution fall

within broad intervals. Then, by passing a curve through the mid points of the

tops of the bars, we could construct an envelope that might approximate to

the sought-after density function. A more sophisticated estimation procedure

would not group the observations into the ﬁxed intervals of a histogram; instead

it would record the number of observations falling within a moving interval.

Moreover, a consistent method of estimation, which aims at converging upon

the true function as the number of observations increases, would vary the width

of the moving interval with the size of the sample, diminishing it suﬃciently

slowly as the sample size increases for the number of sample points falling

within any interval to increase without bound.

A common method for estimating the spectral density is very similar to

the one which we have described for estimating a probability density function.

Instead of basing itself on raw sample observations as does the method of

density-function estimation, it bases itself upon the ordinates of a periodogram

which has been ﬁtted to the observations on y(t). This procedure for spectral

estimation is therefore called smoothing the periodogram.

A disadvantage of the procedure, which for many years inhibited its wide-

spread use, lies in the fact that calculating the periodogram by what would

seem to be the obvious methods by can be vastly time-consuming. Indeed, it

was not until the mid 1960’s that wholly practical computational methods were

developed.

6. The Equivalence of the Two Domains

It is remarkable that such a simple technique as smoothing the peri-

odogram should provide a theoretical resolution to the problems encountered

by Beveridge and others in their attempts to detect the hidden periodicities in

economic and astronomical data. Even more remarkable is the way in which

11

THE METHODS OF TIME-SERIES ANALYSIS

the generalised harmonic analysis that gave rise to the concept of the spec-

tral density of a time series should prove itself to be wholly conformable with

the alternative methods of time-series analysis in the time domain which arose

largely as a consequence of the failure of the traditional methods of periodogram

analysis.

The synthesis of the two branches of time-series analysis was achieved in-

dependently and almost simultaneously in the early 1930’s by Norbert Wiener

[23] in America and A. Khintchine [12] in Russia. The Wiener–Khintchine

theorem indicates that there is a one-to-one relationship between the autoco-

variance function of a stationary process and its spectral density function. The

relationship is expressed, in one direction, by writing,

(13) g(ω) =

1

2π

∞

τ=−∞

γ

τ

cos(ωτ) ; γ

τ

= γ

−τ

,

where g(ω) is the spectral density function and {γ

τ

; τ = 0, 1, 2, . . .} is the

sequence of the autocovariances of the series y(t).

The relationship is invertible in the sense that it is equally possible to

express each of the autocovariances as a function of the spectral density:

(14) γ

τ

=

_

π

ω=0

cos(ωτ)g(ω)dω.

If we set τ = 0, then cos(ωτ) = 1, and we obtain, once more, the equation (12)

which neatly expresses the way in which the variance γ

0

= V {y(t)} of the series

y(t) is attributable to the constituent harmonic components; for g(ω) is simply

the expected value of the squared amplitude of the component at frequency ω.

We have stated the relationships of the Wiener–Khintchine theorem in

terms of the theoretical spectral density function g(ω) and the true autocovari-

ance function {γ

τ

; τ = 0, 1, 2, . . .}. An analogous relationship holds between

the periodogram I(ω

j

) deﬁned in (2) and the sample autocovariance function

{c

τ

; τ = 0, 1, . . . , T − 1} where c

τ

=

(y

t

− ¯ y)(y

t−τ

− ¯ y)/T. Thus, in the

appendix, we demonstrate the identity

(15) I(ω

j

) = 2

T−1

t=1−T

c

τ

cos(ω

j

τ) ; c

τ

= c

−τ

.

The upshot of the Wiener–Khintchine theorem is that many of the tech-

niques of time-series analysis can, in theory, be expressed in two mathematically

equivalent ways which may diﬀer markedly in their conceptual qualities.

Often, a problem which appears to be intractable from the point of view

of one of the domains of time-series analysis becomes quite manageable when

12

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

translated into the other domain. A good example is provided by the matter of

spectral estimation. Given that there are diﬃculties in computing all T of the

ordinates of the periodogram when the sample size is large, we are impelled to

look for a method of spectral estimation which depends not upon smoothing

the periodogram but upon performing some equivalent operation upon the se-

quence of autocovariances. The fact that there is a one-to-one correspondence

between the spectrum and the sequence of autocovariances assures us that this

equivalent operation must exist; though there is, of course, no guarantee that

it will be easy to perform.

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 7. The periodogram of Wolfer’s Sunspot Numbers 1749–1924.

In fact, the operation which we perform upon the sample autocovariances is

simple. For, if the sequence of autocovariances {c

τ

; τ = 0, 1, . . . , T −1} in (15)

is replaced by a modiﬁed sequence {w

τ

c

τ

; τ = 0, 1, . . . , T − 1} incorporating

a specially devised set of declining weights {w

τ

; τ = 0, 1, . . . , T − 1}, then

an eﬀect which is much the same as that of smoothing the periodogram can

be achieved. Moreover, it may be relatively straightforward to calculate the

weighted autocovariance function.

The task of devising appropriate sets of weights provided a major research

topic in time-series analysis in the 1950’s and early 1960’s. Together with the

task of devising equivalent procedures for smoothing the periodogram, it came

to be known as spectral carpentry.

13

THE METHODS OF TIME-SERIES ANALYSIS

0.2

0.4

0.6

0.8

0 π/4 π/2 3π/4 π

Figure 8. The spectrum of the sunspot numbers calculated from

the autocovariances using Parzen’s [15] system of weights.

7. The Maturing of Time-Series Analysis

In retrospect, it seems that time-series analysis reached its maturity in the

1970’s when signiﬁcant developments occurred in both of its domains.

A major development in the frequency domain occurred when Cooley and

Tukey [7] described an algorithm which greatly reduces the eﬀort involved in

computing the periodogram. The Fast Fourier Transform, as this algorithm has

come to be known, allied with advances in computer technology, has enabled the

routine analysis of extensive sets of data; and it has transformed the procedure

of smoothing the periodogram into a practical method of spectral estimation.

The contemporaneous developments in the time domain were inﬂuenced by

an important book by Box and Jenkins [5]. These authors developed the time-

domain methodology by collating some of its major themes and by applying it

to such important functions as forecasting and control. They demonstrated how

wide had become the scope of time-series analysis by applying it to problems

as diverse as the forecasting of airline passenger numbers and the analysis of

combustion processes in a gas furnace. They also adapted the methodology to

the computer.

Many of the current practitioners of time-series analysis have learnt their

skills in recent years during a time when the subject has been expanding rapidly.

Lacking a longer perspective, it is diﬃcult for them to gauge the signiﬁcance

of the recent practical advances. One might be surprised to hear, for example,

14

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

that as late as 1971 Granger and Hughes [8] were capable of declaring that

Beveridge’s calculation of the Periodogram of the Wheat Price Index, com-

prising 300 ordinates, was the most extensive calculation of its type to date.

Nowadays, computations of this order are performed on a routine basis using

microcomputers containing specially designed chips which are dedicated to the

purpose.

The rapidity of the recent developments also belies the fact that time-series

analysis has had a long history. The frequency domain of time-series analy-

sis, to which the idea of the harmonic decomposition of a function is central,

is an inheritance from Euler (1707–1783), d’Alembert (1717–1783), Lagrange

(1736–1813) and Fourier (1768–1830). The search for hidden periodicities was

a dominant theme of 19th century science. It has been transmogriﬁed through

the reﬁnements of Wiener’s Generalised Harmonic Analysis which has enabled

us to understand how cyclical phenomena can arise out of the aggregation of

random causes. The parts of time-series analysis which bear a truly 20th-

century stamp are the time-domain models which originate with Slutsky and

Yule and the computational technology which renders the methods of both

domains practical.

The eﬀect of the revolution in digital electronic computing upon the practi-

cability of time-series analysis can be gauged by inspecting the purely mechan-

ical devices (such as the Henrici–Conradi and Michelson–Stratton harmonic

analysers invented in the 1890’s) which were once used, with very limited suc-

cess, to grapple with problems which are nowadays almost routine. These

devices, some of which are displayed in London’s Science Museum, also serve

to remind us that many of the developments of applied mathematics which

startle us with their modernity were foreshadowed many years ago.

Mathematical Appendix

Mathematical Expectations

The mathematical expectation or the expected value of a random variable

y is deﬁned by

(i) E(x) =

_

∞

x=−∞

xdF(x),

where F(x) is the probability distribution function of x. The probability distri-

bution function is deﬁned by the expression F(x

∗

) = P{x < x

∗

} which denotes

the probability that x assumes a value less than x

∗

. If F(x) is continuous

function, then we can write dF(x) = f(x)dx in equation (i). The function

f(x) = dF(x)/dx is called the probability density function.

If y(t) = {y

t

; t = 0, ±1, ±2, . . .} is a stationary stochastic process, then

E(y

t

) = µ is the same value for all t.

15

THE METHODS OF TIME-SERIES ANALYSIS

If y

0

, . . . , y

T−1

is a sample of T values generated by the process, then we

may estimate µ from the sample mean

(ii) ¯ y =

1

T

T−1

t=0

y

t

.

Autocovariances

The autocovariance of lag τ of the a stationary stochastic process y(t) is

deﬁned by

(iii) γ

τ

= E{(y

t

−µ)(y

t−τ

−µ)}.

The autocovariance of lag τ provides a measure of the relatedness of the ele-

ments of the sequence y(t) which are separated by τ time periods.

The variance, which is denoted by V {y(t)} = γ

0

and deﬁned by

(iv) γ

0

= E

_

(y

t

−µ)

2

_

,

is a measure of the dispersion of the elements of y(t). It is formally the auto-

covariance of lag zero.

If y

t

and y

t−τ

are statistically independent, then their joint probability

density function is the product of their individual probability density functions

so that f(y

t

, y

t−τ

) = f(y

t

)f(y

t−τ

). It follows that

(v) γ

τ

= E(y

t

−µ)E(y

t−τ

−µ) = 0 for all τ = 0.

If y

0

, . . . , y

T

is a sample from the process, and if τ < T, then we may estimate

γ

τ

from the sample autocovariance or empirical autocovariance of lag τ:

(vi) c

τ

=

1

T

T−1

t=τ

(y

t

− ¯ y)(y

t−τ

− ¯ y).

The periodogram and the autocovariance function

The periodogram is deﬁned by

(vii) I(ω

j

) =

2

T

_

_

T−1

t=0

cos(ω

j

t)(y

t

− ¯ y)

_

2

+

_

T−1

t=0

sin(ω

j

t)(y

t

− ¯ y)

_

2

_

.

16

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

The identity

t

cos(ω

j

t)(y

t

− ¯ y) =

t

cos(ω

j

t)y

t

follows from the fact that,

by construction,

t

cos(ω

j

t) = 0 for all j. Hence the above expression has the

same value as the expression in (2). Expanding the expression in (vii) gives

(viii)

I(ω

j

) =

2

T

_

t

s

cos(ω

j

t) cos(ω

j

s)(y

t

− ¯ y)(y

s

− ¯ y)

_

+

2

T

_

t

s

sin(ω

j

t) sin(ω

j

s)(y

t

− ¯ y)(y

s

− ¯ y)

_

,

and, by using the identity cos(A) cos(B) +sin(A) sin(B) = cos(A−B), we can

rewrite this as

(ix) I(ω

j

) =

2

T

_

t

s

cos(ω

j

[t −s])(y

t

− ¯ y)(y

s

− ¯ y)

_

.

Next, on deﬁning τ = t − s and writing c

τ

=

t

(y

t

− ¯ y)(y

t−τ

− ¯ y)/T, we can

reduce the latter expression to

(x) I(ω

j

) = 2

T−1

τ=1−T

cos(ω

j

τ)c

τ

,

which appears in the text as equation (15).

References

[1] Alberts, W. W., L. E. Wright and B. Feinstein (1965), “Physiological

Mechanisms of Tremor and Rigidity in Parkinsonism.” Conﬁnia Neuro-

logica, 26, 318–327.

[2] Beveridge, Sir W. H. (1921), “Weather and Harvest Cycles.” Economic

Journal, 31, 429–452.

[3] Beveridge, Sir W. H. (1922), “Wheat Prices and Rainfall in Western Eu-

rope.” Journal of the Royal Statistical Society, 85, 412–478.

[4] Box, G. E. P. and D. R. Cox (1964), “An Analysis of Transformations.”

Journal of the Royal Statistical Society, Series B, 26, 211–243.

[2] Box, G. E. P. and G. M. Jenkins (1970), Time Series Analysis, Forecasting

and Control. Holden–Day: San Francisco.

[6] Buys–Ballot, C. D. H. (1847), “Les Changements Periodiques de Temper-

ature.” Utrecht.

[7] Cooley, J. W. and J. W. Tukey (1965), “An Algorithm for the Machine

Calculation of Complex Fourier Series.” Mathematics of Computation, 19,

297–301.

17

THE METHODS OF TIME-SERIES ANALYSIS

[8] Granger, C. W. J. and A. O. Hughes (1971), “A New Look at Some Old

Data: The Beveridge Wheat Price Series.”Journal of the Royal Statistical

Society, Series A, 134, 413–428.

[9] Groves, G. W. and E. J. Hannan, (1968), “Time-Series Regression of Sea

Level on Weather.” Review of Geophysics, 6, 129–174.

[10] Gudmundson, G. (1971), “Time-Series Analysis of Imports, Exports and

other Economic Variables.”Journal of the Royal Statistical Society, Series

A, 134, 383.

[11] Hassleman, K., W. Munk and G. MacDonald, (1963), “Bispectrum of

Ocean Waves.” In Time Series Analysis, M. Rosenblatt, (ed.) 125–139.

John Wiley and Sons: New York.

[12] Khintchine, A. (1934), “Korrelationstheorie der Stationaren Stochastis-

chen Prozesse.” Math. Ann., 109, 604–615.

[13] Lagrange, E. (1772, 1778), “Oeuvres.”

[14] Moore, H. L. (1914), “Economic Cycles: Their Laws and Cause.” Macmil-

lan: New York.

[15] Parzen, E. (1957), “On Consistent Estimates of the Spectrum of a Sta-

tionary Time Series.” Annals of Mathematical Statistics, 28, 329–348.

[16] Rice, S. O. (1963), “Noise in FM Receivers.” In Time Series Analysis, M.

Rosenblatt, (ed.) 395–422. John Wiley and Sons: New York.

[17] Schuster, Sir A. (1898), “On the Investigation of Hidden Periodicities with

Application to a Supposed Twenty-Six Day Period of Meteorological Phe-

nomena.” Terrestrial Magnetism, 3, 13–41.

[18] Schuster, Sir A. (1906), “On the Periodicities of Sunspots.” Philosophical

Transactions of the Royal Society, Series A, 206, 69–100.

[19] Slutsky, E. (1937), “The Summation of Random Causes as the Source of

Cyclical Processes.” Econometrica, 5, 105–146.

[20] Tee, L. H. and S. U. Wu (1972), “An Application of Stochastic and Dy-

namic Models for the Control of a Papermaking Process.” Technometrics,

14 481–496.

[21] Tukey, J. W. (1965), “ Data Analysis and the Frontiers of Geophysics.”

Science, 148, 1283–1289.

[22] Whittaker, E. T. and G. Robinson (1924), “The Calculus of Observations,

A Treatise on Numerical Mathematics.” Blackie and Sons: London.

[23] Wiener, N. (1930), “Generalised Harmonic Analysis.” Acta Mathematica,

35, 117–258.

[24] Yule, G. U. (1927), “On a Method of Investigating Periodicities in Dis-

turbed Series with Special Reference to Wolfer’s Sunspot Numbers.” Philo-

sophical Transactions of the Royal Society, 89, 1–64.

[25] Yuzuriha, T. (1960), “The Autocorrelation Curves of Schizophrenic Brain

Waves and the Power Spectrum.” Psych. Neurol. Jap. 26, 911–924.

18

THE METHODS OF TIME-SERIES ANALYSIS

by

D.S.G. Pollock

Queen Mary and Westﬁeld College,

The University of London

This paper describes some of the principal themes of time-series analysis

and it gives an historical account of their development.

There are two distinct yet broadly equivalent modes of time-series anal-

ysis which may be pursued. On the one hand there are the time-domain

methods which have their origin in the classical theory of correlation; and

they lead inevitably towards the construction of structural or parametric

models of the autoregressive moving-average type. On the other hand are

the frequency-domain methods of spectral analysis which are based on an

extension of the methods of Fourier analysis.

The paper describes the developments which led to the synthesis of

the two branches of time-series analysis and it indicates how this synthesis

was achieved.

It remains true that the majority of time-series analysts operate prin-

cipally in one or other of the two domains. Such specialisation is often

inﬂuenced by the academic discipline to which the analyst adheres. How-

ever, it is clear that there are many advantages to be derived from pursuing

the two modes of analysis concurrently.

Address for correspondence:

D.S.G. Pollock

Department of Economics

Queen Mary College

University of London

Mile End Road

London E1 4 NS

Tel : +44-71-975-5096

Fax : +44-71-975-5500

19

LECTURE 7

Forecasting

with ARMA Models

Minimum Mean-Square Error Prediction

Imagine that y(t) is a stationary stochastic process with E{y(t)} = 0.

We may be interested in predicting values of this process several periods into

the future on the basis of its observed history. This history is contained in

the so-called information set. In practice, the latter is always a ﬁnite set

{y

t

, y

t−1

, . . . , y

t−p

} representing the recent past. Nevertheless, in developing

the theory of prediction, it is also useful to consider an inﬁnite information set

I

t

= {y

t

, y

t−1

, . . . , y

t−p

, . . .} representing the entire past.

We shall denote the prediction of y

t+h

which is made at the time t by

ˆ y

t+h|t

or by ˆ y

t+h

when it is clear that we are predicting h steps ahead.

The criterion which is commonly used in judging the performance of an

estimator or predictor ˆ y of a random variable y is its mean-square error deﬁned

by E{(y − ˆ y)

2

}. If all of the available information on y is summarised in its

marginal distribution, then the minimum-mean-square-error prediction is sim-

ply the expected value E(y). However, if y is statistically related to another

random variable x whose value can be observed, and if the form of the joint

distribution of x and y is known, then the minimum-mean-square-error predic-

tion of y is the conditional expectation E(y|x). This proposition may be stated

formally:

(1) Let ˆ y = ˆ y(x) be the conditional expectation of y given x which is

also expressed as ˆ y = E(y|x). Then E{(y − ˆ y)

2

} ≤ E{(y − π)

2

},

where π = π(x) is any other function of x.

Proof. Consider

(2)

E

_

(y −π)

2

_

= E

_

_

(y − ˆ y) + (ˆ y −π)

_

2

_

= E

_

(y − ˆ y)

2

_

+ 2E

_

(y − ˆ y)(ˆ y −π)

_

+ E

_

(ˆ y −π)

2

_

1

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

Within the second term, there is

(3)

E

_

(y − ˆ y)(ˆ y −π)

_

=

_

x

_

y

(y − ˆ y)(ˆ y −π)f(x, y)∂y∂x

=

_

x

__

y

(y − ˆ y)f(y|x)∂y

_

(ˆ y −π)f(x)∂x

= 0.

Here the second equality depends upon the factorisation f(x, y) = f(y|x)f(x)

which expresses the joint probability density function of x and y as the product

of the conditional density function of y given x and the marginal density func-

tion of x. The ﬁnal equality depends upon the fact that

_

(y − ˆ y)f(y|x)∂y =

E(y|x) − E(y|x) = 0. Therefore E{(y − π)

2

} = E{(y − ˆ y)

2

} + E{(ˆ y − π)

2

} ≥

E{(y − ˆ y)

2

}, and the assertion is proved.

The deﬁnition of the conditional expectation implies that

(4)

E(xy) =

_

x

_

y

xyf(x, y)∂y∂x

=

_

x

x

__

y

yf(y|x)∂y

_

f(x)∂x

= E(xˆ y).

When the equation E(xy) = E(xˆ y) is rewritten as

(5) E

_

x(y − ˆ y)

_

= 0,

it may be described as an orthogonality condition. This condition indicates

that the prediction error y − ˆ y is uncorrelated with x. The result is intuitively

appealing; for, if the error were correlated with x, we should not using the

information of x eﬃciently in forming ˆ y.

The proposition of (1) is readily generalised to accommodate the case

where, in place of the scalar x, there is a vector x = [x

1

, . . . , x

p

]

. This gen-

eralisation indicates that the minimum-mean-square-error prediction of y

t+h

given the information in {y

t

, y

t−1

, . . . , y

t−p

} is the conditional expectation

E(y

t+h

|y

t

, y

t−1

, . . . , y

t−p

).

In order to determine the conditional expectation of y

t+h

given {y

t

, y

t−1

,

. . . , y

t−p

}, we need to known the functional form of the joint probability den-

sity function all of these variables. In lieu of precise knowledge, we are often

prepared to assume that the distribution is normal. In that case, it follows that

the conditional expectation of y

t+h

is a linear function of {y

t

, y

t−1

, . . . , y

t−p

};

and so the problem of predicting y

t+h

becomes a matter of forming a linear

2

D.S.G. POLLOCK : FORECASTING

regression. Even if we are not prepared to assume that the joint distribution

of the variables in normal, we may be prepared, nevertheless, to base the pre-

diction of y upon a linear function of {y

t

, y

t−1

, . . . , y

t−p

}. In that case, the

criterion of minimum-mean-square-error linear prediction is satisﬁed by form-

ing ˆ y

t+h

= φ

1

y

y

+ φ

2

y

t−1

+· · · + φ

p+1

y

t−p

from the values φ

1

, . . . , φ

p+1

which

minimise

(6)

E

_

(y

t+h

− ˆ y

t+h

)

2

_

= E

_

_

y

t+h

−

p+1

j=1

φ

j

y

t−j+1

_

2

_

= γ

0

−2

j

φ

j

γ

h+j−1

+

i

j

φ

i

φ

j

γ

i−j

,

wherein γ

i−j

= E(ε

t−i

ε

t−j

). This is a linear least-squares regression problem

which leads to a set of p + 1 orthogonality conditions described as the normal

equations:

(7)

E

_

(y

t+h

− ˆ y

t+h

)y

t−j+1

_

= γ

h+j−1

−

p

i=1

φ

i

γ

i−j

= 0 ; j = 1, . . . , p + 1.

In matrix terms, these are

(8)

_

¸

¸

_

γ

0

γ

1

. . . γ

p

γ

1

γ

0

. . . γ

p−1

.

.

.

.

.

.

.

.

.

.

.

.

γ

p

γ

p−1

. . . γ

0

_

¸

¸

_

_

¸

¸

_

φ

1

φ

2

.

.

.

φ

p+1

_

¸

¸

_

=

_

¸

¸

_

γ

h

γ

h+1

.

.

.

γ

h+p

_

¸

¸

_

.

Notice that, for the one-step-ahead prediction of y

t+1

, they are nothing but the

Yule–Walker equations.

In the case of an optimal predictor which combines previous values of the

series, it follows from the orthogonality principle that the forecast errors are

uncorrelated with the previous predictions.

A result of this sort is familiar to economists in connection with the so-

called eﬃcient-markets hypothesis. A ﬁnancial market is eﬃcient if the prices of

the traded assets constitute optimal forecasts of their discounted future returns,

which consist of interest and dividend payments and of capital gains.

According to the hypothesis, the changes in asset prices will be uncorre-

lated with the past or present price levels; which is to say that asset prices will

follow random walks. Moreover, it should not be possible for someone who is

appraised only of the past history of asset prices to reap speculative proﬁts on

a systematic and regular basis.

3

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

Forecasting with ARMA Models

So far, we have avoided making speciﬁc assumptions about the nature of

the process y(t). We are greatly assisted in the business of developing practical

forecasting procedures if we can assume that y(t) is generated by an ARMA

process such that

(9) y(t) =

µ(L)

α(L)

ε(t) = ψ(L)ε(t).

We shall continue to assume, for the sake of simplicity, that the forecasts

are based on the information contained in the inﬁnite set {y

t

, y

t−1

, y

t−2

, . . .} =

I

t

comprising all values that have been taken by the variable up to the present

time t. Knowing the parameters in ψ(L) enables us to recover the sequence

{ε

t

, ε

t−1

, ε

t−2

, . . .} from the sequence {y

t

, y

t−1

, y

t−2

, . . .} and vice versa; so ei-

ther of these constitute the information set. This equivalence implies that the

forecasts may be expressed in terms {y

t

} or in terms {ε

t

} or as a combination

of the elements of both sets.

Let us write the realisations of equation (9) as

(10)

y

t+h

= {ψ

0

ε

t+h

+ ψ

1

ε

t+h−1

+· · · + ψ

h−1

ε

t+1

}

+{ψ

h

ε

t

+ ψ

h+1

ε

t−1

+· · ·}.

Here the ﬁrst term on the RHS embodies disturbances subsequent to the time

t when the forecast is made, and the second term embodies disturbances which

are within the information set {ε

t

, ε

t−1

, ε

t−2

, . . .}. Let us now deﬁne a forecast-

ing function, based on the information set, which takes the form of

(11) ˆ y

t+h|t

= {ρ

h

ε

t

+ ρ

h+1

ε

t−1

+· · ·}.

Then, given that ε(t) is a white-noise process, it follows that the mean square

of the error in the forecast h periods ahead is given by

(12) E

_

(y

t+h

− ˆ y

t+h

)

2

_

= σ

2

ε

h−1

i=0

ψ

2

i

+ σ

2

ε

∞

i=h

(ψ

i

−ρ

i

)

2

.

Clearly, the mean-square error is minimised by setting ρ

i

= ψ

i

; and so the

optimal forecast is given by

(13) ˆ y

t+h|t

= {ψ

h

ε

t

+ ψ

h+1

ε

t−1

+· · ·}.

This might have been derived from the equation y(t + h) = ψ(L)ε(t + h),

which generates the true value of y

t+h

, simply by putting zeros in place of the

unobserved disturbances ε

t+1

, ε

t+2

, . . . , ε

t+h

which lie in the future when the

4

D.S.G. POLLOCK : FORECASTING

forecast is made. Notice that, on the assumption that the process is stationary,

the mean-square error of the forecast tends to the value of

(14) V

_

y(t)

_

= σ

2

ε

ψ

2

i

as the lead time h of the forecast increases. This is nothing but the variance of

the process y(t).

The optimal forecast of (5) may also be derived by specifying that the

forecast error should be uncorrelated with the disturbances up to the time of

making the forecast. For, if the forecast errors were correlated with some of

the elements of the information set, then, as we have noted before, we would

not be using the information eﬃciently, and we could not be generating opti-

mal forecasts. To demonstrate this result anew, let us consider the covariance

between the forecast error and the disturbance ε

t−i

:

(15)

E

_

(y

t+h

− ˆ y

t+h

)ε

t−i

_

=

h

k=1

ψ

h−k

E(ε

t+k

ε

t−i

)

+

∞

j=0

(ψ

h+j

−ρ

h+j

)E(ε

t−j

ε

t−i

)

= σ

2

ε

(ψ

h+i

−ρ

h+i

).

Here the ﬁnal equality follows from the fact that

(16) E(ε

t−j

ε

t−i

) =

_

σ

2

ε

, if i = j,

0, if i = j.

If the covariance in (15) is to be equal to zero for all values of i ≥ 0, then we

must have ρ

i

= ψ

i

for all i, which means that the forecasting function must be

the one that has been speciﬁed already under (13).

It is helpful, sometimes, to have a functional notation for describing the

process which generates the h-steps-ahead forecast. The notation provided by

Whittle (1963) is widely used. To derive this, let us begin by writing

(17) y(t + h) =

_

L

−h

ψ(L)

_

ε(t).

On the LHS, there are not only the lagged sequences {ε(t), ε(t−1), . . .} but also

the sequences ε(t + h) = L

−h

ε(t), . . . , ε(t + 1) = L

−1

ε(t), which are associated

with negative powers of L which serve to shift a sequence forwards in time. Let

{L

−h

ψ(L)}

+

be deﬁned as the part of the operator containing only nonnegative

powers of L. Then the forecasting function can be expressed as

(18)

ˆ y(t + h|t) =

_

L

−h

ψ(L)

_

+

ε(t),

=

_

ψ(L)

L

h

_

+

1

ψ(L)

y(t).

5

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

Example. Consider an ARMA (1, 1) process represented by the equation

(19) (1 −φL)y(t) = (1 −θL)ε(t).

The function which generates the sequence of forecasts h steps ahead is given

by

(20)

ˆ y(t + h|t) =

_

L

−h

_

1 +

(φ −θ)L

1 −φL

__

+

ε(t)

= φ

h−1

(φ −θ)

1 −φL

ε(t)

= φ

h−1

(φ −θ)

1 −θL

y(t).

When θ = 0, this gives the simple result that ˆ y(t + h|t) = φ

h

y(t).

Generating The Forecasts Recursively

We have already seen that the optimal (minimum-mean-square-error) fore-

cast of y

t+h

can be regarded as the conditional expectation of y

t+h

given the

information set I

t

which comprises the values of {ε

t

, ε

t−1

, ε

t−2

, . . .} or equally

the values of {y

t

, y

t−1

, y

t−2

, . . .}. On taking expectations of y(t) and ε(t) con-

ditional on I

t

, we ﬁnd that

(21)

E(y

t+k

|I

t

) = ˆ y

t+k|t

if k > 0,

E(y

t−j

|I

t

) = y

t−j

if j ≥ 0,

E(ε

t+k

|I

t

) = 0 if k > 0,

E(ε

t−j

|I

t

) = ε

t−j

if j ≥ 0.

In this notation, the forecast h periods ahead is

(22)

E(y

t+h

|I

t

) =

h

k=1

ψ

h−k

E(ε

t+k

|I

t

) +

∞

j=0

ψ

h+j

E(ε

t−j

|I

t

)

=

∞

j=0

ψ

h+j

ε

t−j

.

In practice, the forecasts may be generated using a recursion based on the

equation

(23)

y(t) = −

_

α

1

y(t −1) + α

2

y(t −2) +· · · + α

p

y(t −p)

_

+ µ

0

ε(t) + µ

1

ε(t −1) +· · · + µ

q

ε(t −q).

6

D.S.G. POLLOCK : FORECASTING

By taking the conditional expectation of this function, we get

(24)

ˆ y

t+h

= −{α

1

ˆ y

t+h−1

+· · · + α

p

y

t+h−p

}

+ µ

h

ε

t

+· · · + µ

q

ε

t+h−q

when 0 < h ≤ p, q,

(25) ˆ y

t+h

= −{α

1

ˆ y

t+h−1

+· · · + α

p

y

t+h−p

} if q < h ≤ p,

(26)

ˆ y

t+h

= −{α

1

ˆ y

t+h−1

+· · · + α

p

ˆ y

t+h−p

}

+ µ

h

ε

t

+· · · + µ

q

ε

t+h−q

if p < h ≤ q,

and

(27) ˆ y

t+h

= −{α

1

ˆ y

t+h−1

+· · · + α

p

ˆ y

t+h−p

} when p, q < h.

It can be from (27) that, for h > p, q, the forecasting function becomes a

pth-order homogeneous diﬀerence equation in y. The p values of y(t) from

t = r = max(p, q) to t = r −p +1 serve as the starting values for the equation.

The behaviour of the forecast function beyond the reach of the starting

values can be characterised in terms of the roots of the autoregressive operator.

It may be assumed that none of the roots of α(L) = 0 lie inside the unit circle;

for, if there were roots inside the circle, then the process would be radically

unstable. If all of the roots are less than unity, then ˆ y

t+h

will converge to

zero as h increases. If one of the roots of α(L) = 0 is unity, then we have an

ARIMA(p, 1, q) model; and the general solution of the homogeneous equation

of (27) will include a constant term which represents the product of the unit

root with an coeﬃcient which is determined by the starting values. Hence the

forecast will tend to a nonzero constant. If two of the roots are unity, then

the general solution will embody a linear time trend which is the asymptote to

which the forecasts will tend. In general, if d of the roots are unity, then the

general solution will comprise a polynomial in t of order d −1.

The forecasts can be updated easily once the coeﬃcients in the expansion

of ψ(L) = µ(L)/α(L) have been obtained. Consider

(28)

ˆ y

t+h|t+1

= {ψ

h−1

ε

t+1

+ ψ

h

ε

t

+ ψ

h+1

ε

t−1

+· · ·} and

ˆ y

t+h|t

= {ψ

h

ε

t

+ ψ

h+1

ε

t−1

+ ψ

h+2

ε

t−2

+· · ·}.

The ﬁrst of these is the forecast for h − 1 periods ahead made at time t + 1

whilst the second is the forecast for h periods ahead made at time t. It can be

seen that

(29) ˆ y

t+h|t+1

= ˆ y

t+h|t

+ ψ

h−1

ε

t+1

,

7

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

where ε

t+1

= y

t+1

− ˆ y

t+1

is the current disturbance at time t + 1. The later is

also the prediction error of the one-step-ahead forecast made at time t.

Example. For an example of the analytic form of the forecast function, we

may consider the Integrated Autoregressive (IAR) Process deﬁned by

(30)

_

1 −(1 + φ)L + φL

2

_

y(t) = ε(t),

wherein φ ∈ (0, 1). The roots of the auxiliary equation z

2

− (1 + φ)z + φ = 0

are z = 1 and z = φ. The solution of the homogeneous diﬀerence equation

(31)

_

1 −(1 + φ)L + φL

2

_

ˆ y(t + h|t) = 0,

which deﬁnes the forecast function, is

(32) ˆ y(t + h|t) = c

1

+ c

2

φ

h

,

where c

1

and c

2

are constants which reﬂect the initial conditions. These con-

stants are found by solving the equations

(33)

y

t−1

= c

1

+ c

2

φ

−1

,

y

t

= c

1

+ c

2

.

The solutions are

(34) c

1

=

y

t

−φy

t−1

1 −φ

and c

2

=

φ

φ −1

(y

t

−y

t−1

).

The long-term forecast is ¯ y = c

1

which is the asymptote to which the forecasts

tend as the lead period h increases.

Ad-hoc Methods of Forecasting

There are some time-honoured methods of forecasting which, when anal-

ysed carefully, reveal themselves to be the methods which are appropriate to

some simple ARIMA models which might be suggested by a priori reason-

ing. Two of the leading examples are provided by the method of exponential

smoothing and the Holt–Winters trend-extrapolation method.

Exponential Smoothing. A common forecasting procedure is exponential

smoothing. This depends upon taking a weighted average of past values of the

time series with the weights following a geometrically declining pattern. The

function generating the one-step-ahead forecasts can be written as

(35)

ˆ y(t + 1|t) =

(1 −θ)

1 −θL

y(t)

= (1 −θ)

_

y(t) + θy(t −1) + θ

2

y(t −2) +· · ·

_

.

8

D.S.G. POLLOCK : FORECASTING

On multiplying both sides of this equation by 1 −θL and rearranging, we get

(36) ˆ y(t + 1|t) = θˆ y(t|t −1) + (1 −θ)y(t),

which shows that the current forecast for one step ahead is a convex combina-

tion of the previous forecast and the value which actually transpired.

The method of exponential smoothing corresponds to the optimal fore-

casting procedure for the ARIMA(0, 1, 1) model (1 − L)y(t) = (1 − θL)ε(t),

which is better described as an IMA(1, 1) model. To see this, let us consider

the ARMA(1, 1) model y(t) −φy(t −1) = ε(t) −θε(t −1). This gives

(37)

ˆ y(t + 1|t) = φy(t) −θε(t)

= φy(t) −θ

(1 −φL)

1 −θL

y(t)

=

{(1 −θL)φ −(1 −φL)θ}

1 −θL

y(t)

=

(φ −θ)

1 −θL

y(t).

On setting φ = 1, which converts the ARMA(1, 1) model to an IMA(1, 1) model,

we obtain precisely the forecasting function of (35).

The Holt–Winters Method. The Holt–Winters algorithm is useful in ex-

trapolating local linear trends. The prediction h periods ahead of a series

y(t) = {y

t

, t = 0, ±1, ±2, . . .} which is made at time t is given by

(38) ˆ y

t+h|t

= ˆ α

t

+

ˆ

β

t

h,

where

(39)

ˆ α

t

= λy

t

+ (1 −λ)(ˆ α

t−1

+

ˆ

β

t−1

)

= λy

t

+ (1 −λ)ˆ y

t|t−1

is the estimate of an intercept or levels parameter formed at time t and

(40)

ˆ

β

t

= µ(ˆ α

t

− ˆ α

t−1

) + (1 −µ)

ˆ

β

t−1

is the estimate of the slope parameter, likewise formed at time t. The coeﬃ-

cients λ, µ ∈ (0, 1] are the smoothing parameters.

The algorithm may also be expressed in error-correction form. Let

(41) e

t

= y

t

− ˆ y

t|t−1

= y

t

− ˆ α

t−1

−

ˆ

β

t−1

9

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

be the error at time t arising from the prediction of y

t

on the basis of information

available at time t −1. Then the formula for the levels parameter can be given

as

(42)

ˆ α

t

= λe

t

+ ˆ y

t|t−1

= λe

t

+ ˆ α

t−1

+

ˆ

β

t−1

,

which, on rearranging, becomes

(43) ˆ α

t

− ˆ α

t−1

= λe

t

+

ˆ

β

t−1

.

When the latter is drafted into equation (40), we get an analogous expression

for the slope parameter:

(44)

ˆ

β

t

= µ(λe

t

+

ˆ

β

t−1

) + (1 −µ)

ˆ

β

t−1

= λµe

t

+

ˆ

β

t−1

.

In order reveal the underlying nature of this method, it is helpful to com-

bine the two equations (42) and (44) in a simple state-space model:

(45)

_

ˆ α(t)

ˆ

β(t)

_

=

_

1 1

0 1

_ _

ˆ α(t −1)

ˆ

β(t −1)

_

+

_

λ

λµ

_

e(t).

This can be rearranged to give

(46)

_

1 −L −L

0 1 −L

_ _

ˆ α(t)

ˆ

β(t)

_

=

_

λ

λµ

_

e(t).

The solution of the latter is

(47)

_

ˆ α(t)

ˆ

β(t)

_

=

1

(1 −L)

2

_

1 −L L

0 1 −L

_ _

λ

λµ

_

e(t).

Therefore, from (38), it follows that

(48)

ˆ y(t + 1|t) = ˆ α(t) +

ˆ

β(t)

=

(λ + λµ)e(t) + λe(t −1)

(1 −L)

2

.

This can be recognised as the forecasting function of an IMA(2, 2) model of

the form

(49) (I −L)

2

y(t) = µ

0

ε(t) + µ

1

ε(t −1) + µ

2

ε(t −2)

10

D.S.G. POLLOCK : FORECASTING

for which

(50) ˆ y(t + 1|t) =

µ

1

ε(t) + µ

2

ε(t −1)

(1 −L)

2

.

The Local Trend Model. There are various arguments which suggest that

an IMA(2, 2) model might be a natural model to adopt. The simplest of these

arguments arises from an elaboration of a second-order random walk which

adds an ordinary white-noise disturbance to the tend. The resulting model

may be expressed in two equations

(51)

(I −L)

2

ξ(t) = ν(t),

y(t) = ξ(t) + η(t),

where ν(t) and η(t) are mutually independent white-noise processes. Combining

the equations, and using the notation ∇ = 1 −L, gives

(52)

y(t) =

ν(t)

∇

2

+ η(t)

=

ν(t) +∇

2

η(t)

∇

2

.

Here the numerator ν(t)+∇

2

η(t) = {ν(t)+η(t)}−2η(t−1)+η(t−2) constitutes

an second-order MA process.

Slightly more elaborate models with the same outcome have also been

proposed. Thus the so-called structural model consists of the equations

(53)

y(t) = µ(t) + ε(t),

µ(t) = µ(t −1) + β(t −1) + η(t),

β(t) = β(t −1) + ζ(t).

Working backwards from the ﬁnal equation gives

(54)

β(t) =

ζ(t)

∇

,

µ(t) =

β(t −1)

∇

+

η(t)

∇

=

ζ(t −1)

∇

2

+

η(t)

∇

,

y(t) =

ζ(t −1)

∇

2

+

η(t)

∇

+ ε(t)

=

ζ(t −1) +∇η(t) +∇

2

ε(t)

∇

2

.

11

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

Once more, the numerator constitutes a second-order MA process.

Equivalent Forecasting Functions

Consider a model which combines a global linear trend with an autoregres-

sive disturbance process:

(55) y(t) = γ

0

+ γ

1

t +

ε(t)

I −φL

.

The formation of an h-step-ahead prediction is straightforward; for we can

separate the forecast function into two additive parts.

The ﬁrst part of the function is the extrapolation of the global linear trend.

This takes the form of

(56)

z

t+h|t

= γ

0

+ γ

1

(t + h)

= z

t

+ γ

1

h

where z

t

= γ

0

+ γ

1

t.

The second part is the prediction associated with the AR(1) disturbance

term η(t) = (I −φL)

−1

ε(t). The following iterative scheme is provides a recur-

sive solution to the problem of generating the forecasts:

(57)

ˆ η

t+1|t

= φη

t

,

ˆ η

t+2|t

= φˆ η

t+1|t

,

ˆ η

t+3|t

= φˆ η

t+2|t

, etc.

Notice that the analytic solution of the associated diﬀerence equation is just

(58) ˆ η

t+h|t

= φ

h

η

t

.

This reminds us that, whenever we can express the forecast function in terms

of a linear recursion, we can also express it in an analytic form embodying the

roots of a polynomial lag operator. The operator in this case is the AR(1)

operator I −φL. Since, by assumption, |φ| < 1, it is clear that the contribution

of the disturbance part to the overall forecast function

(59) ˆ y

t+h|t

= z

t+h|t

+ ˆ η

t+h|t

,

becomes negligible when h becomes large.

Consider the limiting case when φ → 1. Now, in place of an AR(1) distur-

bance process, we have to consider a random-walk process. We know that the

forecast function of a random walk consists of nothing more than a constant

12

D.S.G. POLLOCK : FORECASTING

function. On adding this constant to the linear function z

t+h|t

= γ

0

+γ

1

(t +h)

we continue to have a simple linear forecast function.

Another way of looking at the problem depends upon writing equation

(55) as

(60) (I −φL)

_

y(t) −γ

0

−γ

1

t

_

= ε(t).

Setting φ = 1 turns the operator I −φL into the diﬀerence operator I −L = ∇.

But ∇γ

0

= 0 and ∇γ

1

t = γ

1

, so equation (60) with φ = 1 can also be written

as

(61) ∇y(t) = γ

1

+ ε(t).

This is the equation of a process which is described as random walk with drift.

Yet another way of expressing the process is via the equation y(t) = y(t −1) +

γ

1

+ ε(t).

It is intuitively clear that, if the random walk process ∇z(t) = ε(t) is

associated with a constant forecast function, and if z(t) = y(t) −γ

0

−γ

1

t, then

y(t) will be associated with a linear forecast function.

The purpose of this example has been to oﬀer a limiting case where mod-

els with local stochastic trends—ie. random walk and unit root models—and

models with global polynomial trends come together. Finally, we should notice

that the model of random walk with drift has the same linear forecast function

as the model

(62) ∇

2

y(t) = ε(t)

which has two unit roots in the AR operator.

13

LECTURE 8

The Identiﬁcation

of ARIMA Models

As we have established in a previous lecture, there is a one-to-one cor-

respondence between the parameters of an ARMA(p, q) model, including the

variance of the disturbance, and the leading p + q + 1 elements of the auto-

covariance function. Given the true autocovariances of a process, we might

be able to discern the orders p and q of its autoregressive and moving-average

operators and, given these orders, we should then be able to deduce the values

of the parameters.

There are two other functions, prominent in time-series analysis, from

which it is possible to recover the parameters of an ARMA process. These

are the partial autocorrelation function and the spectral density function. The

appearance of each of these functions gives an indication of the nature of the

underlying process to which they belong; and, in theory, the business of iden-

tifying the model and of recovering its parameters can be conducted on the

basis of any of them. In practice, the process is assisted by taking account of

all three functions.

The empirical versions of the three functions which are used in a model-

building exercise may diﬀer considerably from their theoretical counterparts.

Even when the data are truly generated by an ARMA process, the sampling

errors which aﬀect the empirical functions can lead one to identify the wrong

model. This hazard is revealed by sampling experiments. When the data come

from the real world, the notion that there is an underlying ARMA process

is a ﬁction, and the business of model identiﬁcation becomes more doubtful.

Then there may be no such thing as the correct model; and the choice amongst

alternative models must be made partly with a view their intended uses.

The Autocorrelation Functions

The techniques of model identiﬁcation which are most commonly used were

propounded originally by Box and Jenkins (1972). Their basic tools were the

sample autocorrelation function and the partial autocorrelation function. We

shall describe these functions and their use separately from the spectral density

function which ought, perhaps, to be used more often in selecting models.

The fact that spectral density function is often overlooked is probably due to

1

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3

an unfamiliarity with frequency-domain analysis on the part of many model

builders.

Autocorrelation function (ACF). Given a sample y

0

, y

1

, . . . , y

T−1

of T

observations, we deﬁne the sample autocorrelation function to be the sequence

of values

(1) r

τ

= c

τ

/c

0

, τ = 0, 1, . . . , T −1,

wherein

(2) c

τ

=

1

T

T−1

t=τ

(y

t

− ¯ y)(y

t−τ

− ¯ y)

is the empirical autocovariance at lag τ and c

0

is the sample variance. One

should note that, as the value of the lag increases, the number of observations

comprised in the empirical autocovariance diminishes until the ﬁnal element

c

T−1

= T

−1

(y

0

− ¯ y)(y

T−1

− ¯ y) is reached which comprises only the ﬁrst and

last mean-adjusted observations.

In plotting the sequence {r

τ

}, we shall omit the value of r

0

which is in-

variably unity. Moreover, in interpreting the plot, one should be wary of giving

too much credence to the empirical autocorrelations at lag values which are

signiﬁcantly high in relation to the size of the sample.

Partial autocorrelation function (PACF). The sample partial autocor-

relation p

τ

at lag τ is simply the correlation between the two sets of residuals

obtained from regressing the elements y

t

and y

t−τ

on the set of intervening

values y

1

, y

2

, . . . , y

t−τ+1

. The partial autocorrelation measures the dependence

between y

t

and y

t−τ

after the eﬀect of the intervening values has been removed.

The sample partial autocorrelation p

τ

is virtually the same quantity as

the estimated coeﬃcient of lag τ obtained by ﬁtting an autoregressive model of

order τ to the data. Indeed, the diﬀerence between the two quantities vanishes

as the sample size increases. The Durbin–Levinson algorithm provides an eﬃ-

cient way of computing the sequence {p

τ

} of partial autocorrelations from the

sequence of {c

τ

} of autocovariances. It can be seen, in view of this algorithm,

that the information in {c

τ

} is equivalent to the information contained jointly

in {p

τ

} and c

0

. Therefore the sample autocorrelation function {r

t

} and the

sample partial autocorrelation function {p

t

} are equivalent in terms of their

information content.

The Methodology of Box and Jenkins

The model-building methodology of Box and Jenkins, relies heavily upon

the two functions {r

t

} and {p

t

} deﬁned above. It involves a cycle comprising

the three stages of model selection, model estimation and model checking. In

view of the diﬃculties of selecting an appropriate model, it is envisaged that

the cycle might have to be repeated several times and that, at the end, there

might be more than one model of the same series.

2

D.S.G. POLLOCK : ARIMA IDENTIFICATION

15.5

16.0

16.5

17.0

17.5

18.0

18.5

0 50 100 150

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20 25

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

Figure 1. The concentration readings from a chemical process with the

autocorrelation function and the autocorrelation function of the diﬀerences.

3

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3

Reduction to stationarity. The ﬁrst step, which is taken before embarking

on the cycle, is to examine the time plot of the data and to judge whether or

not it could be the outcome of a stationary process. If a trend is evident in

the data, then it must be removed. A variety of techniques of trend removal,

which include the ﬁtting of parametric curves and of spline functions, have

been discussed in previous lectures. When such a function is ﬁtted, it is to the

sequence of residuals that the ARMA model is applied.

However, Box and Jenkins were inclined to believe that many empirical

series can be modelled adequately by supposing that some suitable diﬀerence

of the process is stationary. Thus the process generating the observed series

y(t) might be modelled by the ARIMA(p, d, q) equation

(3) α(L)∇

d

y(t) = µ(L)ε(t),

wherein ∇

d

= (I − L)

d

is the dth power of the diﬀerence operator. In that

case, the diﬀerenced series z(t) = ∇

d

y(t) will be described by a stationary

ARMA(p, q) model. The inverse operator ∇

−1

is the summing or integrating

operator, which accounts for the fact that the model depicted by equation (3)

is described an autoregressive integrated moving-average model.

To determine whether stationarity has been achieved, either by trend re-

moval or by diﬀerencing, one may examine the autocorrelation sequence of the

residual or processed series. The sequence corresponding to a stationary process

should converge quite rapidly to zero as the value of the lag increases. An em-

pirical autocorrelation function which exhibits a smooth pattern of signiﬁcant

values at high lags indicates a nonstationary series.

An example is provided by Figure 1 where a comparison is made between

the autocorrelation function of the original series and that of its diﬀerences.

Although the original series does not appear to embody a systematic trend,

it does drift in a haphazard manner which suggests a random walk; and it is

appropriate to apply the diﬀerence operator.

Once the degree of diﬀerencing has been determined, the autoregressive

and moving-average orders are selected by examining the sample autocorrela-

tions and sample partial autocorrelations. The characteristics of pure autore-

gressive and pure moving-average process are easily spotted. Those of a mixed

autoregressive moving-average model are not so easily unravelled.

Moving-average processes. The theoretical autocorrelation function {ρ

τ

}

of a pure moving-average process of order q has ρ

τ

= 0 for all τ > q. The

corresponding partial autocorrelation function {π

τ

} is liable to decay towards

zero gradually. To judge whether the corresponding sample autocorrelation

function {r

τ

} shows evidence of a truncation, we need some scale by which to

judge the signiﬁcance of the values of its elements.

4

D.S.G. POLLOCK : ARIMA IDENTIFICATION

0

1

2

3

4

0

−1

−2

−3

−4

−5

0 25 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

0 5 10 15 20 25

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

−0.75

0 5 10 15 20 25

Figure 2. The graph of 120 observations on a simulated series generated

by the MA(2) process y(t) = (1 + 0.90L + 0.81L

2

)ε(t) together with the

theoretical and empirical ACF’s (middle) and the theoretical and empirical

PACF’s (bottom). The theoretical values correspond to the solid bars.

5

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3

As a guide to determining whether the parent autocorrelations are in fact

zero after lag q, we may use a result of Bartlett [1946] which shows that, for a

sample of size T, the standard deviation of r

τ

is approximately

(4)

1

√

T

1 + 2(r

2

1

+ r

2

2

+· · · + r

2

q

)

1/2

for τ > q.

The result is also given by Fuller [1976, p. 237]. A simpler measure of the scale

of the autocorrelations is provided by the limits of ±1.96/

√

T which are the

approximate 95% conﬁdence bounds for the autocorrelations of a white-noise

sequence. These bounds are represented by the dashed horizontal lines on the

accompanying graphs.

Autoregressive processes. The theoretical autocorrelation function {ρ

τ

}

of a pure autoregressive process of order p obeys a homogeneous diﬀerence

equation based upon the autoregressive operator α(L) = 1 +α

1

L+· · · +α

p

L

p

.

That is to say

(5) ρ

τ

= −(α

1

ρ

τ−1

+· · · + α

p

ρ

τ−p

) for all τ ≥ p.

In general, the sequence generated by this equation will represent a mixture of

damped exponential and sinusoidal functions. If the sequence is of a sinusoidal

nature, then the presence of complex roots in the operator α(L) is indicated.

One can expect the empirical autocovariance function of a pure AR process to

be of the same nature as its theoretical parent.

It is the partial autocorrelation function which serves most clearly to iden-

tify a pure AR process. The theoretical partial autocorrelations function {π

τ

}

of a AR(p) process has π

τ

= 0 for all τ > p. Likewise, all elements of the

sample partial autocorrelation function are expected to be close to zero for lags

greater than p, which corresponds to the fact that they are simply estimates

of zero-valued parameters. The signiﬁcance of the values of the partial auto-

correlations is judged by the fact that, for a pth order process, their standard

deviations for all lags greater that p are approximated by 1/

√

T. Thus the

bounds of ±1.96/

√

T are also plotted on the graph of the partial autocorrela-

tion function.

Mixed processes. In the case of a mixed ARMA(p, q) process, neither the

theoretical autocorrelation function not the theoretical partial autocorrelation

function have any abrupt cutoﬀs. Indeed, there is little that can be inferred

from either of these functions or from their empirical counterparts beyond the

fact that neither a pure MA model nor a pure AR model would be inappropriate.

On its own, the autocovariance function of an ARMA(p, q) process is not easily

distinguished from that of a pure AR process. In particular, its elements γ

τ

satisfy the same diﬀerence equation as that of a pure AR model for all values

of τ > max(p, q).

6

D.S.G. POLLOCK : ARIMA IDENTIFICATION

0

5

10

15

0

−5

−10

−15

0 25 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

−0.75

−1.00

0 5 10 15 20 25

Figure 3. The graph of 120 observations on a simulated series generated

by the AR(2) process (1 −1.69L + 0.81L

2

)y(t) = ε(t) together with the

theoretical and empirical ACF’s (middle) and the theoretical and empirical

PACF’s (bottom). The theoretical values correspond to the solid bars.

7

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3

0

10

20

30

40

0

−10

−20

−30

0 20 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

−0.75

−1.00

0 5 10 15 20 25

Figure 4. The graph of 120 observations on a simulated series generated by the

ARMA(2, 2) process (1−1.69L+0.81L

2

)y(t) = (1+0.90L+0.81L

2

)ε(t) together

with the theoretical and emprical ACF’s (middle) and the theoretical and empirical

PACF’s (bottom). The theoretical values correspond to the solid bars.

8

D.S.G. POLLOCK : ARIMA IDENTIFICATION

There is good reason to regard mixed models as more appropriate in prac-

tice than pure models of either variety. For a start, there is the fact that a

rational transfer function is far more eﬀective in approximating an arbitrary

impulse response than is an autoregressive transfer function, whose parameters

are conﬁned to the denominator, or a moving-average transfer function, which

has its parameters in the numerator. Indeed, it might be appropriate, some-

times, to approximate a pure process of a high order by a more parsimonious

mixed model.

Mixed models are also favoured by the fact that the sum of any two mu-

tually independent autoregressive process gives rise to an ARMA process. Let

y(t) and z(t) be autoregressive processes of orders p and r respectively which

are described by the equations α(L)y(t) = ε(t) and ρ(L)z(t) = η(t), wherein

ε(t) and η(t) are mutually independent white-noise processes. Then their sum

will be

(6)

y(t) + z(t) =

ε(t)

α(L)

+

η(t)

ρ(L)

=

ρ(L)ε(t) + α(L)η(t)

α(L)ρ(L)

=

µ(L)ζ(t)

α(L)ρ(L)

,

where µ(L)ζ(t) = ρ(L)ε(t) + α(L)η(t) constitutes a moving-average process of

order max(p, r).

In economics, where the data series are highly aggregated, mixed models

would seem to be called for often. In the context of electrical and mechanical

engineering, there may be some justiﬁcation for pure AR models. Here there is

often abundant data, suﬃcient to sustain the estimation of pure autoregressive

models of high order. Therefore the principle of parametric parsimony is less

persuasive than it might be in an econometric context. However, pure AR

models perform poorly whenever the data is aﬀected by errors of observation;

and, in this respect, a mixed model is liable to be more robust. One can

understand this feature of mixed models by recognising that the sum of a pure

AR(p) process an a white-noise process is an ARMA(p, p) process.

9

LECTURE 9

Nonparametric Estimation of

the Spectral Density Function

The Spectrum and the Periodogram

The spectral density of a stochastic process is deﬁned by

(1) f(ω) =

1

2π

_

γ

0

+ 2

∞

τ=1

γ

τ

cos(ωτ)

_

, ω ∈ [0, π].

The obvious way to estimate this function is to replace the unknown autoco-

variances {γ

τ

} by the corresponding empirical moments {c

τ

} where

(2) c

τ

=

1

T

T−1

t=τ

(y

t−τ

− ¯ y)(y

t

− ¯ y) if τ ≤ T −1.

Notice that, beyond a lag of τ = T −1, the autocovariances are not estimable

since

(3) c

T−1

=

1

T

(y

0

− ¯ y)(y

T−1

− ¯ y)

comprises the ﬁrst and the last elements of the sample; and therefore, we must

set c

τ

= 0 when τ > T −1. Thus we obtain a sample spectrum in the form of

(4) f

r

(ω) =

1

2π

_

c

0

+ 2

T−1

τ=1

c

τ

cos(ωτ)

_

.

The sample spectrum deﬁned in this way is just 1/4π times the periodogram

of the sample which is given by

(5)

I(ω

j

) = 2

_

c

0

+ 2

T−1

τ=1

c

τ

cos(ω

j

τ)

_

=

__

t

y

t

cos(ω

j

t)

_

2

+

_

t

y

t

sin(ω

j

t)

_

2

_

=

T

2

_

α

2

j

+ β

2

j

_

,

1

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

where

(6) α

j

=

1

T

t

y

t

cos ω

j

t and β

j

=

1

T

t

y

t

sin ω

j

t.

As we have deﬁned it above, the periodogram has just n ordinates which cor-

respond to the values

(7)

ω

j

= 0, 2

π

T

, . . . , π

(T −1)

T

when T is odd, or

ω

j

= 0, 2

π

T

, . . . , π when T is even.

Although this method of estimating the spectrum via the periodogram

may result, in some cases, in unbiased estimates of the corresponding ordinates

of the spectral density function, it does not result in consistent estimates. This

is hardly suprising when we recall that, in the case where T is even, the Fourier

decomposition of the sample y

0

, . . . , y

T−1

, upon which the method is directly

based, requires us to determine the T coeﬃcients

α

0

, (α

1

, β

1

), . . . , (α

n−1

, β

n−1

), β

n

,

where n = T/2, from a total of T observations. For a set of parameters to be

estimated consistently, we require that the amount of the relevant information

which is available should increase with the size of the sample; and this cannot

happpen in the present case.

These conclusions can be illustrated quite simply in the case where y(t) =

ε(t) is a white-noise sequence with a uniform spectrum f(ω) = σ

2

/2π over the

range {−π ≤ ω ≤ π}. The values of α

j

and β

j

which characterize the sample

spectrum and the periodogram are precisely the ones which would result from

ﬁtting the regression model

(8) y(t) = α

j

cos(ω

j

t) + β

j

sin(ω

j

t) + ε(t)

to the to the data y

0

, . . . , y

T−1

. From the ordinary theory of linear regression,

it follows that, if the population values which are estimated by α

j

and β

j

are

in fact zero, which they must be on the assumption that y(t) = ε(t), then

(9)

1

σ

2

_

α

2

j

t

cos

2

(ω

j

t) + β

2

j

t

sin

2

(ω

j

t)

_

=

T

2σ

2

_

α

2

j

+ β

2

j

_

=

I

j

σ

2

has a chi-square distribution of two degrees of freedom. The variance of a

chi-square distribution of k degrees of freedom is just 2k. Thus we ﬁnd that

2

D.S.G. POLLOCK : SPECTRAL ESTIMATION

V (I

j

/σ

2

) = 4; whence it follows that the variance of the spectral estimate

f

r

(ω

j

) = I

j

/4π is

(10) V {f

r

(ω

j

)} =

σ

4

4π

2

= f

2

(ω

j

).

Clearly, this value does not diminish as T increases.

A further consequence of using the periodogram directly to estimate the

spectrum is that the estimators of f(ω

j

) and f(ω

k

) will be uncorrelated for all

j = k. This follows from the orthogonality of the sine and cosine functions

which serve as a basis for the Fourier decomposition of the sample. The fact

that adjacent values of the estimated spectrum are uncorrelated means that it

will have a particularly volatile appearance.

Spectrum Averaging

One way of improving the properties of the estimate of f(ω

j

) is to comprise

within the estimator several adjacent values from the periodogram. Thus we

may deﬁne a new estimator in the form of

(11) f

s

(ω

j

) =

k=m

k=−m

µ

k

f

r

(ω

j−k

).

In addition to the value of the periodogram at the point ω

j

, this comprises

a further m adjacent values falling on either side. The set of weights

{µ

−m

, µ

1−m

, . . . , µ

m−1

, µ

m

}

should sum to unity as well being symmetric in the sense that µ

−k

= µ

k

. They

deﬁne what is known as a spectral window. Some obvious problems arise in

deﬁning values of the estimate towards the boundaries of the set of frequencies

{ω

j

; 0 ≤ ω

j

≤ π}. These problems can be overcome by treating the spectrum

as symmetric about the points 0 and π so that, for example, we deﬁne

(12) f

s

(π) = µ

0

f

r

(π) + 2

m

k=1

µ

k

f

r

(π −ω

k

).

The estimate f

s

(ω

j

) comprises a total of M = 2m + 1 ordinates of the

periodogram which span an interval of Q = 4mπ/T radians. This number of

radians Q is the so-called bandwidth of the estimator. If Q is kept constant,

then M increases at the same rate as T. This means that, in spite of the

increasing sample size, we are denied the advantage of increasing the acuity

or resolution of our estimation; so that narrow peaks in the spectrum, which

3

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

have been smoothed over, may escape detection. Conversely, if we maintain

the value of M, then the size of the bandwith will decrease with T, and we

may retain some of the disadvantages of the original periodogram. Ideally, we

should allow M to increase at a slower rate than T so that, as M → ∞, we will

have Q → 0.

Weighting in the Time Domain

An alternative approach to spectral estimation is to give diﬀerential

weighting to the estimated autocovariances comprised in our formula for the

sample spectrum, so that diminishing weights are given to the values of c

τ

as τ

increases. This seems reasonable since the precision of these estimates decreases

as τ increases. If the series of weights associated with the the autocovariances

c

0

, c

1

, . . . , c

T−1

are denoted by m

0

, m

1

, . . . , m

T−1

, then our revised estimator

for the spectrum takes the form of

(13) f

w

(ω) =

1

2π

_

m

0

c

0

+ 2

T−1

τ=1

m

τ

c

τ

cos(ωτ)

_

.

The series of weights deﬁne what is described as a lag window. If the weights

are zero-valued beyond m

R

, then we describe R as the truncation point.

A wide variety of lag windows have been deﬁned. Amongst those which

are used nowadays are the Tukey–Hanning window deﬁned by

(14) m

τ

=

1

2

_

1 + cos

_

πτ

R

_

_

; τ = 0, 1, . . . , R

and the Parzen window deﬁned by

(15)

m

τ

= 1 −6

_

τ

R

_

2

+ 6

_

τ

R

_

3

; 0 ≤ τ ≤

1

2

R,

m

τ

= 2

_

1 −

τ

R

_

3

;

1

2

R ≤ τ ≤ R.

The Relationship between Smoothing and Weighting

It would be suprising if we were unable to interpret the method of smooth-

ing the periodogram in terms of an equivalent method of weighting the auto-

covariance function and vice versa.

Consider the smoothed periodogram deﬁned by

(16) f

s

(ω

j

) =

m

k=−m

µ

k

f

r

(ω

j−k

).

4

D.S.G. POLLOCK : SPECTRAL ESTIMATION

Given that the ordinates of the original periodogram I(ω

j

) corrrespond to the

points ω

j

deﬁned in (7), it follows that f

r

(ω

j−k

) = f

r

(ω

j

− ω

k

), where ω

k

=

2πk/T. Therefore, on substituting

(17)

f

r

(ω

j−k

) =

1

2π

T−1

τ=1−T

c

τ

exp(−iω

j−k

τ)

=

1

2π

τ

c

τ

exp(−i[ω

j

−ω

k

]τ)

into (16), we get

(18)

f

s

(ω

j

) =

k

µ

k

_

1

2π

τ

c

τ

exp(−i[ω

j

−ω

k

]τ)

_

=

1

2π

τ

_

k

µ

k

exp(iω

k

τ)

_

c

τ

exp(−iω

j

τ)

=

1

2π

τ

m

τ

c

τ

exp(−iω

j

τ)

where

(19) m

τ

=

m

k=−m

µ

k

e

iω

k

τ

; ω

k

=

2πk

T

is the ﬁnite Fourier transform of the sequence of weights

{µ

−m

, µ

1−m

, . . . , µ

m−1

, µ

m

}

which deﬁne the spectral window.

The ﬁnal expression under (18) would be the same as our expression for

the spectral estimator given under (13) were it not for the fact that we have

deﬁned the present function over the set of values {ω

j

; j = 1, . . . , n} instead

of over the interval ω = [0, π], and for the fact that we have used a complex

exponential expression instead of a cosine.

It is also possible to demonstrate an inverse relationship whereby a spec-

tral estimator which depends upon weighting the autocovariance function is

equivalent to another estimator which smooths the periodogram. Consider a

spectral estimator in the form of

(20) f

w

(ω

0

) =

1

2π

T−1

τ=1−T

m

τ

c

τ

exp(−iω

0

τ).

5

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

where

(21) m

τ

=

_

ω

u(ω)e

iω

τ

dω

has an inverse Fourier transform given by

(22) u(ω) =

1

2π

∞

τ=−∞

m

τ

e

−iω

τ

On substituting the expression for m

τ

from (21) into (20), we get

(23)

f

w

(ω

0

) =

1

2π

τ

__

ω

u(ω)e

iω

τ

dω

_

c

τ

e

−iω

0

τ

=

_

ω

u(ω)

_

τ

c

τ

e

i(ω

−

ω

0

)τ

_

dω

=

_

ω

u(ω)f

r

(ω

0

−ω)dω.

This shows that the technique of weighting the autocovariance function cor-

responds, in general, to a technique of smoothing the periodogram. However,

to sustain this interpretation, we must deﬁne the periodogram not just at n

frequency points {ω

j

; j = 1, . . . , n}, as we have done in (5), but over the entire

interval [−π, π]. Notice that, on setting ω = 0 in (21), we get

(24) m

0

=

_

ω

u(ω)dω

It is desirable that the weighting function should integrate to unity over the

relevant range, and this requires us to set m

0

= 1. The latter is exactly the

value by which we would expect to weight the estimated variance c

0

within the

formula in (13) which deﬁnes the spectral estimator f

w

(ω).

6

LECTURE 10

Seasonal Models and

Seasonal Adjustment

So far we have relied upon the method of trigonometrical regression for

building models which can be used for forecasting seasonal economic time series.

It has proved necessary, invariably, to perform the preliminary task of elimi-

nating a trend from the data before determining the seasonal pattern from the

residuals. In most of the cases which we have analysed, the trend has been

modelled quite successfully by a simple analytic function such as a quadratic.

However, it is not always possible to ﬁnd an analytic function which serves the

purpose. In some cases a stochastic trend seems to be more appropriate. Such

a trend is generated by an autoregressive operator with units roots. Once a

stochastic unit-root model has been adopted for the trend, it seems natural

to model the pattern of seasonal ﬂuctuations in the same manner by using

autoregressive operators with complex-valued roots of unit modulus.

The General Multiplicative Seasonal Model

Let

(1) z(t) = ∇

d

y(t)

be a de-trended series which exhibits seasonal behaviour with a periodicity of s

periods. Imagine, for the sake of argument, that the period between successive

observations is one month, which means that the seasons have a cycle of s = 12

months. Once the trend has been extracted from the original series y(t) by

diﬀerencing, we would expect to ﬁnd a strong relationship between the values

of observations taken in the same month of successive years. In the simplest

circumstances, we might ﬁnd that the diﬀerence between y

t

and y

t−12

is a small

random quantity. If the sequence of the twelve-period diﬀerences were white

noise, then we should have a relationship of the form

(2) z(t) = z(t −12) + ε(t) or, equivalently, ∇

12

y(t) = ε(t).

This is ostensibly an autoregressive model with an operator in the form of

∇

12

= 1 − L

12

. However, it is interesting to note in passing that, if y(t) were

1

D.S.G. POLLOCK: TIME SERIES AND FORECASTING

generated by a regression model in the form of

(3) y(t) =

6

j=0

ρ

j

cos(ω

j

−θ

j

) + η(t),

where ω

j

= πj/6 = j ×30

◦

, then we should have

(4) (1 −L

12

)y(t) = η(t) −η(t −12) = ζ(t);

and, if the disturbance sequence η(t) were white noise, then the residual term

ζ(t) = η(t) −η(t −12) would show the following pattern of correlation:

(5) C(ζ

t

, ζ

t−j

) =

σ

2

, if j mod 12 = 0;

0, otherwise.

It can be imagined that a more complicated relationship stretches over the

years which connects the months of the calender. By a simple analogy with the

ordinary ARMA model, we can devise a model of the form

(6) Φ(L

12

)∇

D

12

z(t) = Θ(L

12

)η(t),

where Φ(z) is a polynomial of degree P and Θ(z) is a polynomial of degree

Q. In eﬀect, this model is applied to twelve separate time series—one for each

month of the year—whose observations are seperated by yearly intervals. If

η(t) were a white-noise sequence of independently and identically distributed

random variables, then there would be no connection between the twelve time

series.

If there is a connection between successive months within the year, then

there should be a pattern of serial correlation amongst the elements of the

disturbance process η(t). One might propose to model this pattern using a

second ARMA of the form

(7) α(L)η(t) = µ(L)ε(t),

where α(z) is a polynomial of degree p and µ(z) is a polynomial of degree q.

The various components of our analysis can now be assembled. By com-

bining equations (1) (6) and (7), we can derive the following general model for

the sequence y(t):

(8) Φ(L

12

)α(L)∇

D

12

∇

d

y(t) = Θ(L

12

)µ(L)ε(t).

A model of this sort has been described by Box and Jenkins as the general

multiplicative seasonal model. To denote such a model in a summary fashion,

2

D.S.G. POLLOCK : SEASONALITY

they describe it as an ARIMA (P, D, Q) × (p, d, q) model. Although, in the

general version of the model, the seasonal diﬀerence operator ∇

12

is raised to

the power D; it is unusual to ﬁnd values other that D = 0, 1.

Factorisation of The Seasonal Diﬀerence Operator

The equation under (8) should be regarded as a portmanteau in which

a collection of simpliﬁed models can be placed. The profusion of symbols in

equation (8) tends to suggest a model which is too complicated to be of practical

use. Moreover, even with ∇

12

in place of ∇

D

12

, there is a redundancy in the

notation to we should draw attention. This redundancy arises from the fact that

the seasonal diﬀerence operator ∇

D

12

already contains the operator ∇ = I −L as

one of its factors. Therefore, unless this factor is eliminated, there is a danger

that the original sequence y(t) will be subjected, inadvertently, to one more

diﬀerencing operation than is intended.

The twelve factors of the operator ∇

D

12

= I − L

12

contain the so-called

twelfth-order roots of unity which are the solutions of the algebraic equation

1 = z

12

. The factorisation may be demonstrated in three stages. To begin, it

is easy to see that

(9)

I −L

12

= (I −L)(I + L + L

2

+· · · + L

11

)

= (I −L)(I + L

2

+ L

4

+· · · + L

10

)(I + L).

The next step is to recognise that

(10)

(I + L

2

+ L

4

+· · · + L

10

)

= (1 −

√

3L + L

2

)(I −L + L

2

)(I + L

2

)(I + L + L

2

)(1 +

√

3L + L

2

).

Finally, it can be see that the generic quadratic factor has the form of

(11) 1 −2 cos(ω

j

)L + L

2

= (1 −e

iω

j

L)(1 −e

−iω

j

L).

where ω

j

= πj/6 = j ×30

◦

.

Figure 1 shows the disposition of the twelfth roots of unity around the unit

circle in the complex plane.

A cursory inspection of equation (9) indicates that the ﬁrst-order diﬀerence

operator ∇ = I −L is indeed one of the factors of ∇

12

= I −L

12

. Therefore, if

the sequence y(t) has been reduced to stationarity already by the application

of d ﬁrst-order diﬀerencing operations, then its subsequent diﬀerencing via the

operator ∇

12

is unnecessary and is liable to destroy some of the characteristics

of the sequence which ought to be captured by the ARIMA model.

The factorisation of the seasonal diﬀerence operator also helps to explain

how the seasonal ARMA model can give rise to seemingly regular cycles of the

appropriate duration.

3

D.S.G. POLLOCK: TIME SERIES AND FORECASTING

−i

i

−1 1

Re

Im

Figure 1. The 12th roots of unity inscribed in the unit circle.

Consider a simple second-order autoregressive model with complex-valued

roots of unit modulus:

(12)

I −2 cos(ω

j

)L + L

2

y

j

(t) = ε

j

(t).

Such a model can gives rise to quite regular cycles whose average duration is

2π/ω

j

periods. The graph of the sequence generated by a model with ω

j

=

ω

1

= π/6 = 30

◦

is given in Figure 2. Now consider generating the full set of

stochastic sequences y

j

(t) for j = 1, . . . , 5. Also included in this set should be

the sequences y

0

(t) and y

6

(t) generated by the ﬁrst-order equations

(13) (I −L)y

0

(t) = ε

0

(t) and (I + L)y

6

(t) = ε

6

(t).

These sequences, which resemble trigonometrical functions, will be harmoni-

cally related in the manner of the trigonometrical functions comprised by equa-

tion (3) which also provides a model for a seasonal time series. It follows that

a good representation of a seasonal economic time series can be obtained by

taking a weighted combination of the stochastic sequences.

For simplicity, imagine that the white-noise sequences ε

j

(t); j = 0, . . . , 6

are mutually independent and that their variances can take a variety of values.

Then the sum of the stochastic sequences will be given by

(14)

y(t) =

6

j=0

y

j

(t)

=

ε

0

(t)

I −L

+

5

j=1

ε

j

(t)

I −2 cos(ω

j

)L + L

2

+

ε

6

(t)

I −L

.

4

D.S.G. POLLOCK : SEASONALITY

0

5

10

15

20

25

0

−5

−10

−15

−20

−25

0 20 40 60 80

Figure 2. The graph of 84 observations on a simulated series

generated by the AR(2) process (1 −1.732L + L

2

)y(t) = ε(t).

The terms on the RHS of this expression can be combined. Their common

denominator is simply the operator ∇

12

= I − L

12

. The numerator is a sum

of 7 mutually independent moving-average process, each with an order of 10

or 11. This also amounts to an MA(11) process which can be denoted by

η(t) = θ(L)ε(t). Thus the combination of the harmonically related unit-root

AR(2) processes gives rise to a seasonal process in the form of

(15)

y(t) =

θ(L)

I −L

12

ε(t) or, equivalently,

∇

12

y(t) = θ(L)ε(t).

The equation of this model is contained within the portmanteau equation of the

general multiplicative model given under (8). However, although it represents

a simpliﬁcation of the general model, it still contains a number of parameters

which is liable to prove excessive. A typical model, which contain only a few

parameter, is the ARIMA (0, 1, 1)×(0, 1, 1) model which Box and Jenkins ﬁtted

to the logarithms of the AIRPASS data. The AIRPASS model takes the form of

(16) (I −L

12

)(I −L)y(t) = (1 −θL

12

)(1 −µL)ε(t).

Notice how the unit-root autoregressive operators I −L

12

and I −L are coupled

with the moving-average operators I − θL

12

and I − µL respectively. These

serve to enhance the regularity of the stochastic cycles and to smooth the trend.

5

D.S.G. POLLOCK: TIME SERIES AND FORECASTING

Forecasting with Unit-Root Seasonal Models

Although their appearances are superﬁcially similar, the seasonal economic

series and the series generated by equations such as (16) are, fundamentally,

of very diﬀerent natures. In the case of the series generated by a unit-root

stochastic diﬀerence equation, there is no bound, in the long run, on the ampli-

tude of the cycles. Also there is a tendency for the phases of the cycles to drift

without limit. If the latter were a feature of the monthly time series of con-

sumer expenditures, for example, then we could not expect the annual boom

in sales to occur at a deﬁnite time of the year. In fact, it occurs invariably at

Christmas time.

The advantage of unit-root seasonal models does not lie in the realism with

which they describe the processes which generate the economic data series.

For that purpose the trigonometrical model seems more appropriate. Their

advantage lies, instead, in their ability to forecast the seasonal series.

The simplest of the seasonal unit-root models is the one which is speciﬁed

by equation (2). This is a twelfth-order diﬀerence equation with a white-noise

forcing function. In generating forecasts from the model, we need only replace

the elements of ε(t) which lie in the future by their zero-valued expectations.

Then the forecasts may be obtained iteratively from a homogeneous diﬀerence

equation in which the initial conditions are simply the values of y(t) observed

over the preceding twelve months. In eﬀect, we observe the most recent annual

cycle and we extrapolate its form exactly year-in year-out into the indeﬁnite

future.

A somewhat diﬀerent forecasting rule is associated with the model deﬁned

by the equation

(17) (I −L

12

)y(t) = (1 −θL

12

)ε(t)

This equation is analogous to the simple IMA(1, 1) equation in the form of

(18) (I −L)y(t) = (1 −θL)ε(t)

which was considered at the beginning of the course. The later equation was

obtained by combining a ﬁrst-order random walk with a white-noise error of

observation. The two equations, whose combination gives rise to (18), are

(19)

ξ(t) = ξ(t −1) + ν(t),

y(t) = ξ(t) + η(t),

wherein ν(t) and η(t) are generated by two mutually independent white-noise

processes.

6

D.S.G. POLLOCK : SEASONALITY

0

5

0

−5

−10

−15

0 20 40 60

Figure 3. The sample trajectory and the forecast function of

the nonstationary 12th-order process y(t) = y(t −12) + ε(t).

Equation (17), which represents the seasonal model which was used by Box

and Jenkins, is generated by combining the following the equations which are

analogous to these under (19):

(20)

ξ(t) = ξ(t −12) + ν(t),

y(t) = ξ(t) + η(t).

Here ν(t) and η(t) continue to represent a pair of independent white-noise

processes.

The procedure for forecasting the IMA model consisted of extrapolating

into the indeﬁnite future a constant value ˆ y

t+1|t

which represents the one-

step-ahead forecast made at time t. The forecast itself was obtained from

a geometrically-weighted combination of all past values of the sequence y(t)

which represent erroneous observations on the random-walk process ξ(t). The

forecasts for the seasonal model of (17) are obtained by extrapolating a so-called

annual reference cycle into the future so that it applies in every successive

year. The reference cycle is constructed by taking a geometrically weighted

combination of all past annual cycles. The analogy with the IMA model is

perfect!

It is interesting to compare the forecast function of a stochastic unit-root

seasonal model of (17) with the forecast function of the corresponding trigono-

metrical model represented by (3). In the latter case, the forecast function

7

D.S.G. POLLOCK: TIME SERIES AND FORECASTING

depends upon a reference cycle which is the average of all of the annual cycles

which are represented by the data set from which the regression parameters

have been computed. The stochastic model seems to have the advantage that,

in forming its average of previous annual cycles, it gives more weight to recent

years. However, it is not diﬃcult to contrive a regression model which has the

same feature.

8

**LECTURES IN TIME-SERIES ANALYSIS AND FORECASTING
**

by D.S.G. Pollock Queen Mary and Westﬁeld College, The University of London

These two booklets contain some of the material of the courses titled Methods of Time-Series Analysis and Economic Forecasting which have been taught in the Department of Economics of Queen Mary College in recent years. The material is presented in the form of a series of ten lectures for a course given at the Institute for Advanced Studies in Vienna titled A Short Course in Time-Series Analysis.

Book 1 1 Trends in Economic Time Series 2 Seasons and Cycles in Time Series 3 Models and Methods of Time-Series Analysis 4 Time-Series Analysis in the Frequency Domain 5 Linear Stochastic Models Book 2 6 State-Space Analysis and Structural Time-Series Models 7 Forecasting with ARIMA Models 8 Identiﬁcation and Estimation of ARIMA Models 9 Identiﬁcation and Estimation in the Frequency Domain 10 Seasonality and Linear Filtering

**THE METHODS OF TIME-SERIES ANALYSIS
**

by D.S.G. Pollock Queen Mary and Westﬁeld College, The University of London

This paper describes some of the principal themes of time-series analysis and it gives an historical account of their development. There are two distinct yet broadly equivalent modes of time-series analysis which may be pursued. On the one hand there are the time-domain methods which have their origin in the classical theory of correlation; and they lead inevitably towards the construction of structural or parametric models of the autoregressive moving-average type. On the other hand are the frequency-domain methods of spectral analysis which are based on an extension of the methods of Fourier analysis. The paper describes the developments which led to the synthesis of the two branches of time-series analysis and it indicates how this synthesis was achieved. It remains true that the majority of time-series analysts operate principally in one or other of the two domains. Such specialisation is often inﬂuenced by the academic discipline to which the analyst adheres. However, it is clear that there are many advantages to be derived from pursuing the two modes of analysis concurrently.

Address for correspondence: D.S.G. Pollock Department of Economics Queen Mary College University of London Mile End Road London E1 4 NS Tel : +44-71-975-5096 Fax : +44-71-975-5500

Many of the more prominent macroeconomic indicators are amenable to a decomposition of the sort depicted above. broad movements can be discerned which evolve more gradually than the other motions which are evident. and we are inclined to decompose the time series into the corresponding components. the trend should be regarded as nothing more than the accumulated eﬀect of the ﬂuctuations. and this could well be a reﬂection of the fact that some economic activities. The ghost of an annual cycle {St } might also be apparent in the index. In some cases. to some extent. In other cases. One can imagine. it is traditional to decompose time series into a variety of components. then its generic element is liable to be expressed as (1. we feel that the trends and the ﬂuctuations represent diﬀerent sorts of inﬂuences. In economics. to the average lifetime of the legislative assembly. The growth trend might be obscured. for example. The changes which are of a transitory nature are described as ﬂuctuations. more or less. Yt = Tt + Ct + St + εt . some or all of which may be present in a particular instance. a quarterly index of Gross National Product which appears to be following an exponential growth trend {Tt }. which happens to correspond. If {Yt } is the sequence of values of an economic index. The reasons for this curious coincidence need not concern us here. is the seasonal variation and is an irregular component.LECTURE 1 Trends in Economic Time Series In many time series. by a superimposed cycle {Ct } with a period of roughly four and a half years. These gradual changes are described as trends and cycles. is a secular cycle.1) where Tt Ct St εt is the global trend. 1 .

Thus. The ﬁrst purpose is to give a summary description of the salient features of the time series. Extracting the Trend There are essentially two ways of extracting trends from a time series. By combining the separate predictions of the components. and we should set about the task of redeﬁning them. or if it manifests a more or less regular pattern. 2 . If the residue follows a trend. for example.2) 1 ˆ Yt+3 + 2Yt+2 + 3Yt+1 + 4Yt + 3Yt−1 + 2Yt−2 + Yt−3 . the secular cycle and the seasonal cycle—have been extracted from the index.D.S. This might help us to gain an insight into the fundamental workings of the economic or social structure which has generated the series. regardless of the date at which it begins or ends. the residue should correspond to an irregular component {εt } for which no unique explanation can be oﬀered. if we eliminate the irregular and seasonal components from the series. a forecast can be derived which may be superior to one derived by a method which pays no attention to the underlying structure of the time series. A ﬁlter is a carefully crafted moving average which spans a number of data points and which attributes a weight to each of them. we are left with an index which may give a clearer picture of the more important features. are signiﬁcantly aﬀected by the weather and by the duration of sunlight. The other purpose in decomposing the series is to predict its future values. There are two distinct purposes for which we might wish to eﬀect such a decomposition. Thus. Such a series has the characteristic that any segment of consecutive elements looks much like any other segment of the same duration. The weights should sum to unity to ensure that the ﬁlter does not systematically inﬂate or deﬂate the values of the series. then it contains features which ought to have been attributed to the other components. the following moving average might serve to eliminate the annual cycle from an economic series which is recorded at quarterly intervals: (1. a particular method of prediction is appropriate. The ﬁrst way is to apply to the series a variety of so-called ﬁlters which annihilate or nullify all of the components which are not regarded as trends.G. For each component of the time series. POLLOCK : TIME SERIES AND FORECASTING such as building construction. This component ought to resemble a time series generated by a so-called stationary stochastic process. When the foregoing components—the trend. Yt = 16 Another ﬁlter with a wider span and a diﬀerent proﬁle of weights might serve to eliminate the four-and-a-half-year cycle which is present in our imaginary series of Gross National Product.

3 .G.D. Every body continues in its state of rest or of uniform motion in a straight line unless it is compelled to change that state by forces impressed upon it.S. The thought occurs to us that such trends might also arise in the social world. POLLOCK : TRENDS IN TIME SERIES Finally a ﬁlter could be designed which smooths away the irregularities of the index which defy systematic explanation. Diﬀerent functions are appropriate to diﬀerent forms of trend. which is an indeﬁnite sum whose terms contain rising powers of the argument. and what is left after they have been applied should give a picture of the underlying trend {Tt } of the index. According to a well-known dictum. Once an analytic function has been ﬁtted to the series. This is Newtons’s ﬁrst law of motion. and it suggests no way of predicting their future values. applied in series. However. The order in which the three ﬁlters are applied is immaterial. Other collections of ﬁlters. When there is no theory to specify a mathematical form for the trend. The kinematic equation for the distance covered by a body moving with constant velocity in a straight line is (1. The process of ﬁltering is often a good way of deriving an index which represents the more important historical characteristics of the time series. might serve to isolate the other components {Ct } and {St } which are to be found in equation (1).4) x = x0 + ut. There are also arguments from physics which suggest that ﬁrst-degree and second-degree polynomials in t. it may be possible to approximate it by a polynomial of low degree. Polynomial Trends Amongst the mathematical functions which suggest themselves as means of modelling a trend is a pth-degree polynomial whose argument is the time index t: (1. and some functions which analysts tend to favour see almost always to be inappropriate. it generates no model for the underlying trends. The alternative way of extracting the trend from the index is to ﬁt some function which is capable of adapting itself to whatever form the trend happens to display. Thus the polynomial in t may be construed as an approximation to an analytic function which is obtained by discarding all but the leading terms of a power-series expansion. are common in the natural world. it may be used to provide extrapolative forecasts of the trend. which are linear and quadratic time trends in other words.3) φ(t) = φ0 + φ1 t + · · · + φp tp . This notion is suggested by the formal result that every analytic mathematical function can be expanded as a power series.

A linear or a quadratic function may be appropriate if the trend in question is monotonically increasing or decreasing.S. Figure 1 is the result of ﬁtting a cubic function to an economic time series by least-squares regression. A cubic function ﬁtted to data on meat consumption in the United States. and x0 represents the initial position of the body at time t = 0. 180 170 160 150 140 1920 1925 1930 1935 1940 Figure 1. POLLOCK : TIME SERIES AND FORECASTING where u is the uniform velocity.G.5) 1 x = x0 + u0 t + at2 . 2 where u0 is the velocity at time t = 0 and a is the constant acceleration due to the motive force. This is just a quadratic in t. Newton’s second law of motion asserts that The change of motion is proportional to the motive force impressed. 4 . In other cases. and is made in the direction of the straight line in which the force is impressed. The kinematic equation for the distance travelled under uniformly accelerated rectilinear motion is (1.D. In modern language. 1919–1941. polynomials of higher degrees might be ﬁtted. This is nothing but a ﬁrst-degree polynomial in t. this is expressed by saying that the acceleration of a body along a straight line is proportional to the force which is applied in that direction.

The result will be a curve which ﬁts the data more closely. such a disparity of numbers taxes the precision of the computer. .G. . . the value of t3 is in excess of 7. Some care has to be taken in ﬁtting a polynomial time trend by the method of least-squares regression. . . . The estimated coeﬃcients associated with these orthogonal polynomials can be converted into the coeﬃcients φ0 . . . 300 million. . The change would aﬀect only the value of the intercept term φ0 which could be adjusted ex post. is to form a matrix X of regressors in which the generic row [t0 . . The annual growth factor for an 5 . we should ﬁnd that. t2 . and these dates might be taken as the initial and terminal values of t. For. which comes immediately to mind. it will be found that one of the branches of the polynomial—the left branch in this case—has changed direction. . tp are replaced by a set of so-called orthogonal polynomials which give rise to vectors of regressors whose cross products are zero-valued. the recourse might be to increase the degree of the polynomial by one. . In that case. φ1 . The annual data on meat consumption. tp ] contains rising powers of the argument t. 11 for the range of the argument. Unfortunately. . which are plotted in Figure 1. 1941. This might also be regarded as a undesirable property for a function which is to be used in extrapolative forecasting. Clearly. . . this is a highly unsatisfactory circumstance. Consider a ﬁnancial asset with an annual rate of return of γ. Another feature of a polynomial function is that its branches tend to plus or minus inﬁnity with increasing rapidity as the argument increases or decreases beyond a range of central values where the function has its stationary points and its points of inﬂection.D. . t. run from 1919 to 1941. The reason lies in the peculiarly ill-conditioned nature of the matrix (X X)−1 of cross products. when t = 1941. . and.S. t. In general. . A straightforward procedure. such a recourse in not always adequate to ensure the numerical accuracy of the computation. a specialised procedure of polynomial regression is often called for in which the functions t0 . Exponential and Logistic Trends The notion of exponential or geometric growth is common in economics where it is closely related to the idea of compound interest. Thus. the eﬀect of altering the degree of the polynomial by one will be to alter the direction of one or other of the branches of the ﬁtted function. In that case. we might take t = −11. . In fact. Also. for example. An obvious recourse is to recode the values of t. from the point of view of forecasting. POLLOCK : TRENDS IN TIME SERIES It might be felt that there are salient features in the data which are not captured by the cubic polynomial. . The values found by extrapolating the quartic function backwards in time will diﬀer radically from those found by extrapolating the cubic function. whereas t0 = 1 for all values of t = 1919. there would be a vast diﬀerences in the values of the elements of the matrix X. φp of equation (3).

then the value of the investment at time t would be given by (1.7) y = αeγt .S. then its growth 4 1 factor would be lim(n → ∞)(1 + n γ)n = eγ . we expect there to be upper limits to the levels which can be attained by economic variables. .8) dy = γy. The value of the asset at time t would be given by (1. Since real resources are ﬁnite. The equation of exponential growth is a solution of the diﬀerential equation (1.10) ln yt = ln α + (ln λ)t + εt . and if the returns were compounded with the principal on an annual basis. This is obtained by taking the logarithm of equation (7) and adding a disturbance term εt .D. it is implausible to suggest that such a process can be sustained for long when real resources are involved. Then the transformed growth equation becomes (1. POLLOCK : TIME SERIES AND FORECASTING investment of unit value is (1 + γ). It is equivalent to say that the proportional rate of growth (1/y)(dy/dt) is constant. we might imagine a process whereby the ownership of a consumer durable grows until the majority 6 . by applying ordinary least-squares regression to the equation (1.6) yt = α(1 + γ)t . . An alternative parametrisation is obtained by setting λ = eγ . yn . and the geometric growth rate is λ − 1.9) ln yt = ln α + γt + εt . If α units were invested at time t = 0. and one which is compounded quarterly has a growth factor 2 of (1 + 1 γ)4 . For an example of a trend with an upper bound. . Whereas unhindered exponential growth might well be a possibility for certain monetary or ﬁnancial quantities. and this is the equation for exponential growth. .G. dt The implication of the diﬀerential equation is that the absolute rate of growth in y is proportional to the value already attained by y. If an investment were compounded continuously. sampled at regular intervals. An exponential growth trend can be ﬁtted to observations y1 . An investment which is compounded twice a year has an annual growth factor of (1 + 1 γ)2 .

Then come the ﬁrst signs that the market is being saturated. at least in an early period. or at a small positive replacement rate if it is not. The reason is that the replacement sales depend not only on the size of the ownership of the durable goods but also upon the age of the stock of goods. the sales begin to accelerate.0 0.D.S. or experience of it. if the good is wholly durable. 1. One of the simplest ways of modelling the growth of ownership is to employ the so-called logistic curve. is spread amongst consumers. For large negative values of x. POLLOCK : TRENDS IN TIME SERIES of households or individuals are in possession of it. The logistic function ex /(1 + ex ) and its derivative. It is very diﬃcult to specify the dynamics of a process such as the one we have described whenever there are replacement sales to be taken into account. Typically. as the level of ownership approaches the saturation point. Then.5 0. 7 . This classical device has its origins in the mathematics of biology where it has been used to model the growth of a population of animals in an environment with limited food resources. as information about the durable. Good examples are provided by the sales of domestic electrical appliances such are fridges and colour television sets. the rate of sales will decline to a constant level. The latter is a function. their cumulated total might appear to follow an exponential growth path. Eventually. and there is a point of inﬂection in the cumulative curve where its second derivative—which is the rate of increase in sales per period—passes from positive to negative. Often we have to be content with modelling only the growth of ownership. which may be at zero. In the case of the exponential function ex . they coincide for all values of x. the rate of sales is slow. For a time. of the way in which sales have grown at the outset.25 −4 −2 2 4 Figure 2. when the new durable is introduced.G. the function and its derivative are close.

The parameters β and α 8 . The general from of the function is (1.14) x(π) = ln π . The inverse mapping x = x(π) is easily derived. with a value of 2. . . is already signiﬁcantly aﬀected by the term ex . POLLOCK : TIME SERIES AND FORECASTING The simplest version of the function is given by (1. and a value of unity. which is approached as x → −∞.11) π(x) = 1 ex = . the denominator. . where x = 0. We may begin by noting that.D. which is the saturation level of ownership in the example of the consumer durable. Consider 1−π = (1. The logistic curve varies between a value of zero. Therefore.15) y(t) = γ γeh(t) = .G. with the eﬀect that the value of π never exceeds unity. the value of the function is π(0) = 1 . the rate of increase declines rapidly toward zero. . x 1+e e This is rearranged to give (1. which is approached as x → +∞.S. The alternative expression for the logistic curve also lends itself to an interpretation.13) ex = π . the term 1 + ex . as x increases from such values towards zero. These 2 characteristics can be understood easily in reference to the ﬁrst expression. Here γ is the upper asymptote of the function. 1 + e−x 1 + ex The second expression comes from multiplying top and bottom of the ﬁrst expression by ex .12) 1 + ex ex − 1 + ex 1 + ex 1 π = = x. 1−π The logistic curve needs to be elaborated before it can be ﬁtted ﬂexibly to a set of observations y1 . is not signiﬁcantly diﬀerent from unity. 1−π whence the inverse function is found by taking natural logarithms: (1. At the mid point. yn tending to an upper asymptote. which is found in the denominator. By the time x reaches zero. the logistic function closely resembles an exponential function. for large negative values of x. Thereafter. At that point. there is an inﬂection in the curve as the rate of increase in π begins to decline. 1 + e−h(t) 1 + eh(t) h(t) = α + βt.

D.S.G. POLLOCK : TRENDS IN TIME SERIES determine respectively the rate of ascent of the function and the mid point of its ascent, measured on the time-axis. It can be seen that (1.16) ln y(t) γ − y(t) = h(t).

Therefore, with the inclusion of a residual term, the equation for the generic element of the sample is (1.17) ln yt γ − yt = α + βt + et .

For a given value of γ, one may calculate the value of the dependent variable on the LHS. Then the values of α and β may be found by least-squares regression. The value of γ may also be determined according to the criterion of minimising the sum of squares of the residuals. A crude procedure would entail running numerous regressions, each with a diﬀerent value for γ. The deﬁnitive value would be the one from the regression with the least residual sum of squares. There are other procedures for ﬁnding the minimising value of γ of a more systematic and eﬃcient nature which might be used instead. Amongst these are the methods of Golden Section Search and Fibonnaci Search which are presented in many texts of numerical analysis. The objection may be raised that the domain of the logistic function is the entire real line—which spans all of time from creation to eternity—whereas the sales history of a consumer durable dates only from the time when it is introduced to the market. The problem might be overcome by replacing the time variable t in equation (15) by its logarithm and by allowing t to take only nonnegative values. Then, whilst t ∈ [0, ∞), we still have ln(t) ∈ (−∞, ∞), which is the entire domain of the logistic function.

1.0 0.8 0.6 0.4 0.2

1

2

3

4

Figure 3. The function y(t) = γ/(1 + exp{α − β ln(t)}) with γ = 1, α = 4 and β = 7. The positive values of t are the domain of the function.

9

D.S.G. POLLOCK : TIME SERIES AND FORECASTING There are many curves which will serve the purpose of modelling a sigmoidal growth process. Their number is equal, at least, to the number of theoretical probability density functions—for the corresponding (cumulative) distribution functions rise monotonically from zero to unity in ways with are suggestive of processes of bounded growth. In fact, we do not need to have an analytic form for a cumulative function before it can be ﬁtted to a growth process. It is enough to have a table of values of a standardised form of the function. An example is provided by the normal density function whose distribution function is regularly ﬁtted to data points in the course of probit analysis. In this case, the ﬁtting involves ﬁnding values for the location parameter µ and the dispersion parameter σ 2 by which the standard normal function is converted into an arbitrary normal function. Nowadays, there are eﬃcient procedures for numerical optimisation which can accomplish such tasks with ease. Flexible Trends If the purpose of decomposing a time series is to form predictions of its components, then it is important to obtain adequate representations of these components at every point within the sample period. The device which is most appropriate to the extrapolative forecasting of a trend is rarely the best means of representing it within the sample. An extrapolation is usually based upon a simple analytic function; and any attempt to make the function reﬂect the local variations of the sample will endow it with global characteristics which may aﬀect the forecasts adversely. One way of modelling the local characteristics of a trend without prejudicing its global characteristics is to use a segmented curve. In many applications, it has been found that a curve with cubic polynomial segments is appropriate. The segments must be joined in a way which avoids evident discontinuities. In practice, the requirement is usually for continuous ﬁrst-order and second-order derivatives. A curve whose segments are joined in this way is described as a cubic spline. A spline is a draughtsman’s tool which was once used in drawing smooth curves. It is a thin ﬂexible piece of wood which was clamped to a series of pins which were placed along the path of the curve which had to be described. Some of the essential properties of a mathematical spline can be understood by bearing the real spline in mind. The pins to which a draughtsman clamped his spline correspond to the data points through which we might interpolate a mathematical spline. The segments of the mathematical spline would be joined at the data points. The cubic spline becomes a device for modelling a trend when, instead of passing through the data points, it is allowed, in the interests of smoothness, to deviate from them. The Reinsch smoothing spline is ﬁtted by minimising 10

D.S.G. POLLOCK : TRENDS IN TIME SERIES

180

170

160

150

λ = 0.75

140 1920 1925 1930 1935 1940

180

170

160

150

λ = 0.125

140 1920 1925 1930 1935 1940

Figure 4. Cubic smoothing splines ﬁtted to data on meat consumption in the United States, 1919–1941.

11

Stochastic Trends It is possible that what is perceived as a trend is the result of the accumulation of small stochastic ﬂuctuations which have no systematic basis. There are various ways in which the curve of a cubic spline may be extrapolated to form forecasts of the trend.20) yt = y0 + εt + εt−1 + · · · + ε1 . Let {yt } be the random-walk sequence. the following expression can be derived: (1. in this respect. the extrapolation is quadratic.18) yt = yt−1 + εt . Then its value at time t is obtained from the previous value via the equation (1. The forces exerted by ordinary springs are proportional to their extension. is imperfect. In normal circumstances. one might think of attaching the draughtsman’s spline to the pins by springs instead of by clamps. Figure 4 shows the consequences of ﬁtting the smoothing spline to the data on meat consumption which is also used in Figure 1 where a cubic polynomial has been ﬁtted. Here εt is an element of a white-noise sequence of independently and identically distributed random variables with (1. and. In that case. In that case. it is possible to clamp the ends of the spline in a way which imposes a value on their ﬁrst derivatives. when the ends of the spline are left free. The measure of curvature is based upon second derivatives. POLLOCK : TIME SERIES AND FORECASTING a criterion function which imposes both a penalty for deviating from the data points and a penalty for excessive curvature in the segments. the analogy. The degree of ﬂexibility of the spline corresponds to the value of λ. which requires the forces to be proportional to the squares of their extensions.G.D. By a process of back-substitution. whilst the measure of deviation is the sum of the squared distances of the points from the curve.S. there are some clearly deﬁned ways of removing the trend from the data as well as for extrapolating it into the future. the second derivatives are zero-valued and the extrapolation is linear. The precise form of the curve would depend upon the stiﬀness of the spline and the forces exerted by the springs. 12 . It is a matter of judgment how the value of λ should be chosen so as to reﬂect the trend. The simplest model embodying a stochastic trend is the so-called ﬁrstorder random walk.19) E(εt ) = 0 and V (εt ) = σ 2 for all t. A single parameter λ governs the trade-oﬀ between the objectives of smoothness and goodness of ﬁt. However. As an analogy for the smoothing spline.

G. if its starting point is in the indeﬁnite past rather than at time t = 0. For a physical example of Brownian motion. This depicts yt as the sum of an initial value y0 and of an accumulation of stochastic increments. POLLOCK : TRENDS IN TIME SERIES 2 1 0 −1 −2 −3 0 25 50 75 100 Figure 5. have a tendency to wander haphazardly. then the values of the stochastic increments will also be small and the random walk will wander slowly. There is no central tendency in the random-walk process. The values of a random walk. and.22) yt − yt−1 = εt . it is necessary only to take its ﬁrst diﬀerences. However. ﬂoating on the surface of a viscous liquid. then the mean and the variance of yt are be given by (1. A sequence generated by a white-noise process. It is debatable whether the outcome of such a process deserves to be called a trend. The viscosity might be expected to bring the particles to a halt quickly if they 13 . then the mean and variance are undeﬁned. Thus (1.21) E(yt ) = y0 and V (yt ) = t × σ 2 . as the name implies.D. If y0 has a ﬁxed ﬁnite value. if the variance of the white-noise process is small.S. To reduce the random walk to a stationary stochastic process. one can imagine small particles. A ﬁrst-order random walk over a surface is what is know as Brownian motion. such a pollen grains.

A ﬁrst-order random walk 50 0 −50 −100 −150 −200 0 25 50 75 100 Figure 7.D.S. A second-order random walk 14 .G. POLLOCK : TIME SERIES AND FORECASTING 4 2 0 −2 −4 −6 −8 0 25 50 75 100 Figure 6.

by (1.S. ˆ ˆ ˆ E(yt+1 |It ) = yt+1|t = yt .25) E(yt+h |It ) = yt+h|t = yt+h−1|t .D. . which are conditional upon the information of the set It = {yt . to reduce the sequence {zt } to the stationary white-noise sequence. The nature of a second-order process can be understood by recognising that it represents a trend in which the slope—which is its ﬁrst diﬀerence— follows a random walk. This is demonstrated by taking the expected values of the elements of the equation (1.23) yt+h = yt+h−1 + εt+h which represents the value which lies h periods ahead at time t. A second-order random walk is formed by accumulating the values of a ﬁrst-order process. which comes from (23). then the slope of 15 (1.24) E(yt+h |It ) = yt+h|t . . we must take ﬁrst diﬀerences twice in succession. depends upon the fact that E(εt+h |It ) = 0.G. However. respectively. ˆ yt+h . In these terms. if {εt } and {yt } are respectively a white-noise sequence and the sequence from a ﬁrst-order random walk. . If the random walk wanders slowly. may be denoted as follows: (1.26) . if h ≤ 0. There is no better way of predicting the outcome of a random walk than to take the most recently observed value and to extrapolate it indeﬁnitely into the future. The second. then zt = zt−1 + yt = zt−1 + yt−1 + εt = 2zt−1 − zt−2 + εt deﬁnes the second-order random walk. if h > 0. Thus. It is clear that.} containing observations on the series up to time t. the predictions of the values of the random walk for h > 1 periods ahead and for one period ahead are given. The ﬁrst of these. then they will dart hither and thither on the surface of the liquid under the impact of its molecules which are themselves in constant motion. which comes from taking expectations in the equation yt+1 = yt + εt+1 . POLLOCK : TRENDS IN TIME SERIES were in motion. Here the ﬁnal expression is obtained by setting yt−1 = zt−1 − zt−2 in the second expression. yt−1 . uses the fact that the value of yt is already known. The implication of the two equations is that yt serves as the optimal predictor for all future values of the random walk. The expectations. if the particles are very light.

To demonstrate that the forecast function for a second-order random walk is a straight line. the second-order random walk may appear to follow a linear time trend.29) zt+h|t = α + βh with α = zt ˆ and β = zt − zt−1 . we simply extrapolate its linear motion free from disturbances. If the kinetic energy of the body is large relative to the energy of the impacts. which generates a linear time trend. Thus a third-order random walk is formed by accumulating the values of a second-order process. POLLOCK : TIME SERIES AND FORECASTING this trend is liable to change only gradually. for extended periods. wherein ηt and νt are generated by two mutually independent white-noise processes. For a physical analogy of a second-order random walk. A third-order process can be expected to give rise to local quadratic trends.D. of the elements of the the equation (1. we may take the expectations. Therefore.G. ξt = ξt−1 + νt . In order to predict where the body might be in some future period.27) zt+h = 2zt+h−1 − zt+h−2 + εt+h . 16 . which are conditional upon It .S. The equations combine to give (1. It is possible to deﬁne random walks of higher orders. The model is speciﬁed by the equations (1.28) E(zt+h |It ) = zt+h|t = 2ˆt+h−1|t − zt+h−2|t . then its linear motion will be disturbed only slightly. we can imagine a body in motion which suﬀers a series of small impacts. For h periods ahead and for one period ahead. A stochastic trend of the random-walk variety may be elaborated by the addition of an irregular component. this gives (1. It is straightforward to conﬁrm that these diﬀerence equations have an analytic solution of the form (1. A simple model consists of a ﬁrst-order random walk with an added white-noise component. ˆ z ˆ ˆ E(zt+1 |It ) = zt+1|t = 2zt − zt−1 .30) yt = ξt + ηt . which together serve to deﬁne a simple iterative scheme.31) yt − yt−1 = ξt − ξ−1 + ηt − ηt−1 = νt + ηt − ηt−1 . and the appropriate way of predicting its values is by quadratic extrapolation.

and it is the basis of the widely-used forecasting procedure known as exponential smoothing. if the information at time t−1 consists of the elements of the set It−1 = {yt−1 .D. By applying a straightforward process of back-substitution to the ﬁnal equation of (35). To form the one-step-ahead forecast yt+1|t in the manner indicated by the ˆ ﬁrst of the equations under (36). it will be found that (1. This is a so-called exponentially-weighted moving average. which leads to the identity εt = yt − yt|t−1 ˆ ˆ upon which the second equality of (35) depends. yt+3|t etc.33) yt = yt−1 + εt − µεt−1 . reﬂects the fact that. It will transpire.S. which is obtained by taking expectations in the equation yt+h = yt+h−1 + εt+h − µεt+h−1 . 17 . is ˆ E(yt+1 |It ) = yt+1|t = yt − µεt (1. which is obtained from the equation yt+1 = yt + εt+1 − µεt . the combination of the random walk and white noise gives rise to the single equation (1. then εt−1 is a know quantity which is unaﬀected by the process of taking expectations.36) yt+1|t = (1 − µ)(yt + µyt−1 + · · · + µt−1 y1 ) + µt y0 ˆ ˆ = (1 − µ){yt + µyt−1 + µ2 yt−2 + · · ·}. . where the ﬁnal expression stands for an inﬁnite series. The result yt|t−1 = yt−1 − µεt−1 . in subsequent lectures. POLLOCK : TRENDS IN TIME SERIES The expression on the RHS can be reformulated to give (1. is given by (1. an initial value y0 is required. where εt and εt−1 are elements of a white-noise sequence and µ is a parameter of an appropriate value. The forecast for h steps ahead. Thus.35) = yt − µ(yt − yt|t−1 ) ˆ y = (1 − µ)yt + µˆt|t−1 . . Equation (34) ˆ indicates that all the succeeding forecasts yt+2|t .G. .} and the value of µ.32) νt + ηt − ηt−1 = εt − µεt−1 . yt−2 .34) E(yt+h |It ) = yt+h|t = yt+h−1|t . ˆ ˆ The forecast for one step ahead. have the same value ˆ ˆ as the one-step-ahead forecast. that equation (33) is a simple example of an Integrated Autoregressive Moving-Average or ARIMA model. There exists a readily accessible general theory of the forecasting of ARIMA processes which we shall expound at length.

52..F. Steyn.W. 18 . Ratkowsky.J. and I. “Spline Functions and the Problem of Graduation”.S. Digital Filters: Third Edition.. Reinsch.. Englewood Cliﬀs. De Vos.L. 10.L. POLLOCK : TIME SERIES AND FORECASTING References Eubank. (1990). (1985). Vrije Universiteit Amsterdam. Nonlinear Regression Modelling: A Uniﬁed Approach. (1964).. New York. (1967). New York. (1988). 177–183. (1989).J. C. R. 947–950.J. Numerische Mathematik. Proceedings of the National Academy of Science. “Smoothing by Spline Functions”. “Stochastic Nonlinearity: A Firm Basis for the Flexible Functional Form”: Research Memorandum 1990-13.. Schoenberg. D. Marcel Dekker Inc. I. R.D. Marcel Dekker Inc. Hamming.H. Spline Smoothing and Nonparametric Regression. Prentice–Hall Inc. A. N..G.

Let time t be reckoned from an instant when the radius joining the point to the centre is at an angle of θ below the horizontal. instead of the peak of the function occurring at time t = 0. The phase displacement. β = ρ sin θ 19 and α2 + β 2 = ρ2 . also measured in radians.LECTURE 2 Seasons and Cycles in Time Series Cycles of a regular nature are often encountered in physics and engineering. Thus. The point might be the axis of the ‘big end’ of a connecting rod which joins a piston to a ﬂywheel.1) x = ρ cos(ωt − θ). x = ρ cos θ cos(ωt) + ρ sin θ sin(ωt) = α cos(ωt) + β sin(ωt). as it would with an ordinary cosine function. indicates the extent to which the cosine function has been displaced by a shift along the time axis. it now occurs a time t = θ/ω. The angular velocity is a measure in radians per unit period. Using the compound-angle formula cos(A − B) = cos A cos B + sin A sin B. The parameters of the function are as follows: ρ is the amplitude. If the point is projected onto the horizontal axis. The quantity 2π/ω measures the period of the cycle. then the distance of the projection from the centre is given by (2. . Consider a point moving with constant speed in a circle of radius ρ. ω is the angular velocity or frequency and θ is the phase displacement.3) α = ρ cos θ.2) with (2. we can rewrite equation (1) as (2. The movement of the projection back and forth along the horizontal axis is described as simple harmonic motion.

. . then a general model for a seasonal ﬂuctuation would comprise the frequencies (2. π]. where ct (ω) = cos(ωt) and st (ω) = sin(ωt). . eT −1 ] are vectors of T elements. . where et is a residual element which might represent an irregular white-noise component in the process underlying the data. It may be unreasonable to expect that an idealised seasonal cycle can be represented by a simple sinusoidal function. For the seasonality of economic activities is related. . .D. . 20 .7) yt = j=0 αj cos(ωj t) + βj sin(ωj t) + et . In matrix terms. . the values of the dependent variable should be deviations about a mean value. .S. t = 0. . s j = 0. . An equation may be written in the form of (2. β where c = [c0 .6) ωj = 2πj . . wave forms of a more complicated nature may be synthesised by employing a series of sine and cosine functions whose frequencies are integer multiples of the fundamental seasonal frequency. 2 which are equally spaced in the interval [0. n = s . However. The parameters α. cT −1 ] and s = [s0 . . Such a series of frequencies is described as an harmonic scale.4) yt = αct (ω) + βst (ω) + et . in that case. Such a technique may be used for extracting a seasonal component from an economic time series. sT −1 ] and e = [e0 . β can be found by running regressions for a wide range of values of ω and by selecting the regression which delivers the lowest value for the residual sum of squares.G. . to the near-perfect regularities of the solar system which are reﬂected in the annual calender. A model of seasonal ﬂuctuation comprising the full set of harmonicallyrelated frequencies would take the form of n (2. ultimately. equation (4) becomes (2.5) y = [c s] α + e. POLLOCK : TIME SERIES AND FORECASTING Extracting a Regular Cyclical Component A cyclical component which is concealed beneath other motions may be extracted from a data sequence by a straightforward application of the method of linear regression. . . To avoid the need for an intercept term. and. T − 1. we know in advance what value to give to ω. . If there are s = 2n observations per annum. . . .

Matters are no more complicated in the case of monthly data.5 −0. However.5 −1 Figure 1. sin(ωn t) = sin(πt) = 0.10) = . (2.5 −1 1 0. 2 2 If the four seasons are indexed by j = 0. we have sin(ω0 t) = sin(0) = 0. When there are four observations per annum.5 −0. cos(ωn t) = cos(πt) = (−1)t . when s is even. it appears that there are s + 2 components in the sum. . we have ω0 = 0. 3.5 −1 1 0. Trigonometrical functions. . This simple seasonal model is illustrated adequately by the case of quarterly data. ω1 = π/2 and ω2 = π. cos(ω0 t) = cos(0) = 1.S.5 −0.D. POLLOCK : SEASONS AND CYCLES 1 0. of frequencies ω1 = π/2 and ω2 = π . . then the values from the year τ can be represented by the following matrix equation: α0 eτ 0 yτ 0 1 1 0 1 1 −1 α1 eτ 1 yτ 1 1 0 + (2. .9) yt = α0 + α1 cos πt πt + β1 sin + α2 (−1)t + et .8) Therefore there are only s nonzero coeﬃcients to be determined.5 −0.G. and equation (7) assumes the form of (2. 1 −1 0 yτ 2 eτ 2 1 β1 yτ 3 α2 eτ 3 1 0 −1 −1 21 .5 −1 1 2 3 4 1 2 3 4 1 0. associated with a quarterly model of a seasonal ﬂuctuation. 1 2 3 4 1 2 3 4 At ﬁrst sight.

G. POLLOCK : TIME SERIES AND FORECASTING It will be observed that the vectors of the matrix are mutually orthogonal. . The inverse mapping is 1 1 1 1 δ0 α0 4 4 4 4 α 1 0 −1 0 δ1 1 2 2 (2.14) = .11) β1 = α2 = 2 T 1 T 2 T T −1 yt . the coeﬃcients of the equation are given by 1 α0 = T α1 = (2. in place of equation (10). τ =1 for j = 0.D. t=0 p (yτ 0 − yτ 2 ). . we may take yτ 0 δ0 eτ 0 1 0 0 0 yτ 1 0 1 0 0 δ1 eτ 1 (2.13) 4 δj = T p yτ j . τ =1 p (yτ 1 − yτ 3 ).12) = 0 0 1 0 δ2 + eτ 2 . An alternative model of seasonality. 1 1 β1 0 0 −2 δ2 2 1 1 −1 −1 α2 δ3 4 4 4 4 Another way of parametrising the model of seasonality is to adopt the following form: yτ 0 φ eτ 0 1 1 0 0 yτ 1 1 0 1 0 γ0 eτ 1 (2. τ =1 It is the mutual orthogonality of the vectors of ‘explanatory’ variables which accounts for the simplicity of these formulae. assigns an individual dummy variable to each season. . 3. A comparison of equations (10) and (12) establishes the mapping from the coeﬃcients of the trigonometrical functions to the coeﬃcients of the dummy variables.15) = 1 0 0 1 γ1 + eτ 2 . yτ 2 yτ 3 δ3 eτ 3 0 0 0 1 where (2. yτ 2 yτ 3 γ2 eτ 3 1 0 0 0 22 . τ =1 p (yτ 0 − yτ 1 + yτ 2 − yτ 3 ). When the data consist of T = 4p observations which span p years. which is used more often by econometricians. .S. Thus.

An attempt might be made to correct this feature by adding to the matrix an extra column with a unit at the bottom and with zeros elsewhere and by introducing an accompanying parameter γ3 . ρ sin θ). decade after decade. The problem highlights a diﬃculty which might arise if either of the schemes under (10) or (12) were ﬁtted to the data by multiple regression in the company of a polynomial φ(t) = φ0 + φ1 t + · · · + φp tp designed to capture a trend. as the eight-year period postulated by Moore. let us imagine. β) = (ρ cos θ. Then we should encounter another diﬃculty. At the instant t = 0. A classic expression of skepticism was made by Slutsky [19] in a famous article of 1927: Suppose we are inclined to believe in the reality of the strict periodicity of the business cycle. and this will make the parameters indeterminate unless an additional constraint is imposed which sets γ0 + · · · + γ3 = 0. Irregular Cycles Whereas it seems reasonable to model a seasonal ﬂuctuation in terms of trigonometrical functions. reproduces the same sinusoidal wave which rises and falls on the surface of the social ocean with the regularity of day and night? It seems that something other than a perfectly regular sinusoidal component is required to model the secular ﬂuctuations of economic activity which are described as business cycles.G.D. To generate a cycle which is more fundamentally aﬀected by randomness. However. To begin. the coordinates are given by (2. the columns of the resulting matrix will be linearly dependent. POLLOCK : SEASONS AND CYCLES This scheme is unbalanced in that it does not treat each season in the same manner. To make such a regression viable.S. Wherein lies the source of this regularity? What is the mechanism of causality which. when the point makes a positive angle of θ with the horizontal axis. one would have to eliminate the intercept parameter φ0 . we must construct a model which has random eﬀects in both the phase and the amplitude. To obtain a model for a seasonal ﬂuctuation. it is diﬃcult to accept that other cycles in economic activity should have such regularity.16) (α. it has been enough to modify the equation of harmonic motion by superimposing a disturbance term which aﬀects the amplitude. a point on the circumference of a circle of radius ρ which is travelling with an angular velocity of ω. for example. once more. 23 . such.

17) Their addition gives (2. 24 . a tendency for the phases of their cycles to drift without limit. α sin ω + β cos ω). ω (α.S. The ﬁrst of the equations within the matrix expression can be written as (2. There is no longer any bound on the amplitudes which the components might acquire in the long run. we may replace ω in equation (19) by whatever angle will be reached at the time in question. the transformation becomes (2. (0. 0) and (0.19) y z = cos ω sin ω − sin ω cos ω α . β To ﬁnd the values of the coordinates at a time which is an integral number of periods ahead. likewise. in a limited period. we may transform the vector [y . 0) −→ (α cos ω. we may add a random disturbance term to each of its components. the trajectories of y and z are liable. β) separately and add them. equation (19) speciﬁes the horizontal and vertical components of a circular motion which amount to a pair of synchronous harmonic motions.G. α sin ω). to resemble those of the simple harmonic motions. Nevertheless. in the absence of uncommonly large disturbances. The second equation may be lagged by one period and rearranged to give (2. POLLOCK : TIME SERIES AND FORECASTING To ﬁnd the coordinates of the point after it has rotated through an angle of ω in one period of time. To introduce the appropriate irregularities into the motion. β) −→ (−β sin ω. zt−1 ζt Now the character of the motion is radically altered. The discrete-time equation of the resulting motion may be expressed as follows: (2. ω ω In matrix terms. In eﬀect.18) (α.22) zt−1 − czt−2 = syt−2 + ζt−1 . and there is. β) −→ (y. we may rotate the component vectors (α. Alternatively.D. z) = (α cos ω − β sin ω.21) yt = cyt−1 − szt−1 + υt . β cos ω). The rotation of the components is depicted as follows: (2. It is easy to decouple the equations of y and z.20) yt zt = cos ω sin ω − sin ω cos ω yt−1 υ + t . z ] by premultiplying it the appropriate number of times by the matrix of the rotation.

D.G. we get (2. 25 . A quasi-cyclical sequence generated by the equation yt = 2 cos ωyt−1 − yt−2 + εt when ω = 20◦ .25) υt ζt = − sin ω ηt . within equation (20). If we use the result that yt−2 cos2 +yt−2 sin2 = yt−2 and if we collect the disturbances to form a new variable εt = υt − sζt−1 − cυt−1 . However. then we can rearrange the second equality to give (2. 40 30 20 10 0 −10 −20 −30 −40 0 25 50 75 100 Figure 2. A sequence generated by equation (24) when {εt } is a white-noise sequence is depicted in Figure 2.24) yt = 2 cos ωyt−1 − yt−2 + εt . (2.S. then the lagged terms within εt will cancel leaving a sequence whose elements are mutually uncorrelated. cos ω where {ηt } is a white-noise sequence. Here it is not true in general that the sequence of disturbances {εt } will be white noise.23) yt − cyt−1 = cyt−1 − c2 yt−2 − szt−1 + cszt−2 + υt − cυt−1 = cyt−1 − c2 yt−2 − s2 yt−2 − sζt−1 + υt − cυt−1 . POLLOCK : SEASONS AND CYCLES By taking the ﬁrst diﬀerence of equation (21) and by using equation (22) to eliminate the values of z. if we specify that.

POLLOCK : TIME SERIES AND FORECASTING It is interesting to recognise that equation (24) becomes the equation of a second-order random walk in the case where ω = 0. An understanding of this phenomenon can be reached by considering a physical analogy. in terms of random causes.D. The second-order random walk gives rise to trends which can remain virtually linear over considerable periods. in the absence of clock weights. Whereas there is little diﬃculty in understanding that an accumulation of purely random disturbances can give rise to linear trend. a cycle of roughly 11 years which characterises the Wolfer sunspot index. sometimes from one side and sometimes from the other. we can see that. 26 . The Fourier decomposition of a series is a matter of explaining the series entirely as a composition of sinusoidal functions. the impacts of the peas impart very little energy to the system compared with the sum of its kinetic and potential energies at any point in time. on taking a longer view. The graph will remain surprisingly smooth. but amplitude and phase will vary continuously. However. there are good reasons to persist with the business of explaining a data sequence in terms of such functions. One such analogy. was provided by Yule whose article of 1927 introduced the concept of a second-order autoregressive process of which equation (24) is a limiting case. The motion is now aﬀected not by superposed ﬂuctuations but by true disturbances. In the short term. there is often surprise at the fact that such disturbances can also generate cycles which are more or less regular. The phenomenon described by Yule is due to the inertia of the pendulum. and the eﬀect on the graph will be of an entirely diﬀerent kind. The recording apparatus is left to itself and unfortunately boys get into the room and start pelting the pendulum with peas. The Fourier Decomposition of a Time Series In spite of the notion that a regular trigonometrical function is an inappropriate means for modelling an economic cycle other than a seasonal ﬂuctuation.G. which is a very apposite.26) yt = j=0 αj cos(ωj t) + βj sin(ωj t) . Thus it is possible to represent the generic element of the sample as n (2. Any deviations from perfectly harmonic motion which might be recorded must be the result of superimposed errors of observation which could be all but eliminated if a long sequence of observations were subjected to a regression analysis. Yule invited his readers to imagine a pendulum attached to a recording device.S. the system is driven by the impacts alone. Yules’s purpose was to explain.

.D.27) ωj = 2πj . As we might infer from our analysis of a seasonal ﬂuctuation. . π] To demonstrate this.S. It follows that the mapping from the sample values to the coeﬃcients constitutes a one-to-one invertible transformation. POLLOCK : SEASONS AND CYCLES Assuming that T = 2n is even. we can imagine calculating the Fourier coeﬃcients using an ordinary regression procedure to ﬁt equation (26) to the data. He should notice a very gradual recession and advance of the water level. which indicates that ω and ω ∗ are observationally indistinguishable. 27 . the true frequency is f = 1 − 1/28 which gives 27 tides in 14 days. The same conclusion arises in the slightly more complicated case where T is odd. If a component with a frequency in excess of π were included in the sum in (26). Then cos(ωt) = cos (2π − ω ∗ )t (2. Calculation of the Fourier Coeﬃcients For heuristic purposes. this sum comprises T functions whose frequencies (2.G. for the reason that we are ‘estimating’ a total of T coeﬃcients from T data points. T j = 0. and 6pm. In this case. Here. consider the case of a pure cosine wave of unit amplitude and zero phase whose frequency ω lies in the interval π < ω < 2π. In fact. . ω ∗ ∈ [0. there would be no regression residuals. π] is described as the alias of ω > π. so we are actually solving a set of T linear equations in T unknowns. For an illustration of the problem of aliasing. The angular velocity ωj = 2πj/T relates to a pair of trigonometrical components which accomplish j cycles in the T periods spanned by the data. the frequency of the cycle being f = 1/28 which amounts to one tide in 14 days. n = T 2 are at equally spaced points in the interval [0. . let us imagine that a person observes the sea level at 6am. The highest velocity ωn = π corresponds to the so-called Nyquist frequency. Let ω ∗ = 2π − ω. π]. Observing the sea level every six hours should enable him to infer the correct frequency.28) = cos(2π) cos(ω ∗ t) + sin(2π) sin(ω ∗ t) = cos(ω ∗ t). then its eﬀect would be indistinguishable from that of a component with a frequency in the range [0. there are as many nonzeros elements in the sum under (26) as there are data points. each day. for the reason that two of the functions within the sum—namely sin(ω0 t) = sin(0) and sin(ωn t) = sin(πt)—are identically zero.

si sj = 0 if i = j.S. It follows that y y − α0 i i = ¯ y y − y y = (y − y ) (y − y ). .29) In addition. j j 28 . there are the following sums of squares: c0 c0 = cn cn = T. T cj cj = sj sj = . t By pursuing the analogy of multiple regression. Let cj = [c0j . sT −1. in this case. . cT −1. the vectors of ‘explanatory’ variables are mutually orthogonal.j .32) αj = (cj cj )−1 cj y = yt cos ωi t.G. Then there are the following orthogonality conditions: ci cj = 0 if i = j.j ] and sj = [s0. . Therefore we can rewrite the equation as ¯¯ ¯ ¯ (2. . .30) The ‘regression’ formulae for the Fourier coeﬃcients are therefore (2. . .35) (y − y ) (y − y ) = ¯ ¯ T 2 2 2 αj + βj = j T 2 ρ2 .33) βj = (sj sj )−1 sj y = yt sin ωj t. . s0 s0 = sn sn = 0. 2 Now consider writing α0 i i = y 2 i i = y y where y = [¯. we can understand that there is a complete decomposition of the sum of squares of the elements of y which is given by (2. t (2.j ] represent vectors of T values of the generic functions cos(ωj t) and sin(ωj t) respectively. . . 2 (2. (2. .D. POLLOCK : TIME SERIES AND FORECASTING A reason for not using a multiple regression procedure is that. j. . ¯ t (2. y ] is the vector ¯ ¯¯ ¯ y ¯ 2 whose repeated element is the sample mean y . ci sj = 0 for all i.31) α0 = (i i)−1 i y = 1 T 2 T 2 T yt = y .34) 2 y y = α0 i i + j 2 αj cj cj + j 2 βj sj sj . Therefore T applications of a univariate regression procedure would be appropriate to our purpose.

then we can expect the proportion of the variance attributed to the individual frequencies to decline as the sample size increases.D. In order provide a graphical representation of the decomposition of the sample variance. 2 2 The graph of the function I(ωj ) = (T /2)(αj + βj ) is know as the periodogram.36) T −1 1 (yt − y ) = ¯ 2 t=0 2 n 2 2 (αj + βj ) j=1 2 2 2 = 2 T yt cos ωj t j t + t yt sin ωj t . The periodogram of Wolfer’s Sunspot Numbers 1749–1924. The proportion of the variance which is attributable to the component at fre2 2 quency ωj is (αj + βj )/2 = ρ2 /2.S. if the variance of the sample remains ﬁnite. where ρj is the amplitude of the component. we must scale the elements of equation (36) by a factor of T . 29 . If there is such a regular component within the process. 40 30 20 10 0 π/4 π/2 3π/4 π Figure 3. and if there are no regular harmonic components in the process generating the data. j The number of the Fourier frequencies increases at the same rate as the sample size T . Therefore.G. POLLOCK : SEASONS AND CYCLES and it follows that we can express the variance of the sample as 1 T (2. then we can expect the proportion of the variance attributable to it to converge to a ﬁnite value as the sample size increases.

G. ¯ ¯ t=τ The empirical autocorrelation of lag τ is deﬁned by rτ = cτ /c0 where c0 . The periodogram had so many peaks that at least twenty possible hidden periodicities could be picked out.37) cτ = (yt − y )(yt−τ − y ). It is straightforward to establish the relationship between the periodogram and the sequence of autocovariances.S. It was shown by Whittaker and Robinson in 1924 that this series could be described almost completely in terms of two trigonometrical functions with periods of 24 and 29 days. Such ﬁndings seem to diminish the importance of periodogram analysis in econometrics. The attempts to discover underlying components in economic time-series have been less successful.D. by 30 . The Empirical Autocovariances A natural way of representing the serial dependence of the elements of a data sequence is to estimate their autocovariances. POLLOCK : TIME SERIES AND FORECASTING There are many impressive examples where the estimation of the periodogram has revealed the presence of regular harmonic components in a data series which might otherwise have passed undetected. One of the best-know examples concerns the analysis of the brightness or magnitude of the star T.38) I(ωj ) = cos(ωj t)(yt − y ) ¯ t=0 + t=0 sin(ωj t)(yt − y ) ¯ . The empirical autocovariance of lag τ is deﬁned by the formula 1 T T −1 (2. the fundamental importance of the periodogram is established once it is recognised that it represents nothing less than the Fourier transform of the sequence of empirical autocovariances. However. The periodogram may be written as 2 T T −1 2 T −1 2 (2. Ursa Major. One application of periodogram analysis which was a notorious failure was its use by William Beveridge in 1921 and 1923 to analyse a long series of European wheat prices. and this seemed to be many more than could be accounted for by plausible explanations within the realms of economic history. The identity t cos(ωj t)(yt − y ) = ¯ t cos(ωj t)yt follows from the fact that. which is formally the autocovariance of lag 0. is the variance of the sequence. The autocorrelation provides a measure of the relatedness of data points separated by τ periods which is independent of the units of measurement.

39) sin(ωj t) sin(ωj s)(yt − y )(ys − y ) . . Then T −1 T −1 cos(ωj t) = t=0 t=0 sin(ωj t) = 0. by using the identity cos(A) cos(B) + sin(A) sin(B) = cos(A − B). . By similar means.41) I(ωj ) = 2 τ =1−T cos(ωj τ )cτ . . which is a Fourier transform of the sequence of empirical autocovariances. 1. . By Euler’s equations.S. Expanding the expression in (38) gives 2 T + 2 T cos(ωj t) cos(ωj s)(yt − y )(ys − y ) ¯ ¯ t s I(ωj ) = (2.G.40) I(ωj ) = 2 T cos(ωj [t − s])(yt − y )(ys − y ) . . and hence t exp(i2πj/T ) = 0. . on deﬁning τ = t − s and writing cτ = reduce the latter expression to T −1 − y )(yt−τ − y )/T . we can rewrite this as (2. t=0 By using the formula 1 + λ + · · · + λT −1 = (1 − λT )/(1 − λ). we have T −1 t=0 1 cos(ωj t) = 2 T −1 t=0 1 exp(i2πjt/T ) + 2 T −1 exp(−i2πjt/T ). we can ¯ ¯ (2. . we can show 31 . so the numerator in the expression above is zero. we ﬁnd that T −1 exp(i2πjt/T ) = t=0 1 − exp(i2πj) . ¯ ¯ t s t (yt Next. 1 − exp(i2πj/T ) But exp(i2πj) = cos(2πj) + i sin(2πj) = 1. T /2} if T is even and j ∈ {0. POLLOCK : SEASONS AND CYCLES construction. An Appendix on Harmonic Cycles Lemma 1. t cos(ωj t) = 0 for all j. (T − 1)/2} if T is odd.D. ¯ ¯ t s and. Let ωj = 2πj/T where j ∈ {0. 1. . Proof.

1–64. “Economic Cycles: Their Laws and Cause.. T 2 . . Slutsky. Let ωj = 2πj/T where j ∈ 0. (T − 1)/2 if T is odd. (1921). (1922). t=0 1 = 2 We ﬁnd. it follows that analogous proof shows that t sin(ωj t) = 0. . References Beveridge. (1914). and. in consequence of Lemma 1. 412–478. T /2 if T is even and j ∈ 0.. L. H. Then T −1 (a) t=0 T −1 cos(ωj t) cos(ωk t) = sin(ωj t) sin(ωk t) = t=0 T −1 0. . . . . “Wheat Prices and Rainfall in Western Europe.” Philosophical Transactions of the Royal Society.. “The Summation of Random Causes as the Source of Cyclical Processes. From the formula cos A cos B = 1 {cos(A + B) + cos(A − B)} we have 2 T −1 cos(ωj t) cos(ωk t) = t=0 1 2 T −1 {cos([ωj + ωk ]t) + cos([ωj − ψk ]t)} {cos(2π[j + k]t/T ) + cos(2π[j − k]t/T )} . 32 . “On a Method of Investigating Periodicities in Disturbed Series with Special Reference to Wolfer’s Sunspot Numbers. (c) t=0 cos(ωj t) sin(ψk t) = 0 Proof. and thus we have the ﬁrst part of (a). G. T 2 if j = k.. This gives the second part of (a). 85. then cos(2π[j −k]t/T ) = cos 0 = 1 and so. then both terms on the RHS vanish. If j = k. (1927). (1937). .” Econometrica. 1. if j = k. 1. H. ifj = k.S. Beveridge.” Macmillan: New York. Moore. “Weather and Harvest Cycles.D. t cos(ωj t) = 0. Yule. 429–452. Sir W. . 89. U. POLLOCK : TIME SERIES AND FORECASTING that t exp(−i2πj/T ) = 0.” Journal of the Royal Statistical Society. 31. H. E. therefore. Sir W. if j = k. 5. whilst the ﬁrst term vanishes.” Economic Journal. The proofs of (b) and (c) follow along similar lines. An Lemma 2. .G. 105–146.. that if j = k. (b) 0. the second terms yields the value of T under summation. if j = k.

it is also common to represent it by the equation p k q (3. where y(t) = {yt . where it is usually taken for granted that α0 = 1. which we shall call the general temporal regression model. is one which postulates a relationship comprising any number of consecutive elements of x(t). ±1. but if the model is to be viable. An example is provided by the simple regression model (3.3) y(t) = i=1 φi y(t − i) + i=0 βi x(t − i) + i=0 µi ε(t − i). . indexed by the time subscript t. . . . This places the lagged versions of the sequence y(t) on the RHS in the company of the input sequence x(t) and its lags.2) i=0 αi y(t − i) = i=0 βi x(t − i) + i=0 µi ε(t − i). where φi = −αi for i = 1. This normalisation of the leading coeﬃcient on the LHS identiﬁes y(t) as the output sequence. The model may be represented by the equation p k q (3. Although it is convenient to write the general model in the form of (2). 33 . .1) y(t) = x(t)β + ε(t). . {βi } and {µi } can depend on only a limited number of parameters. Any of the sums in the equation can be inﬁnite. A more general model.LECTURE 3 Models and Methods of Time-Series Analysis A time-series model is one which postulates a relationship amongst a number of temporal sequences or time series. t = 0. the sequences of coeﬃcients {αi }. p. which is a combination of an observable signal sequence x(t) = {xt } and an unobservable white-noise sequence ε(t) = {εt } of independently and identically distributed random variables.} is a sequence. y(t) and ε(t). . ±2.

D.S.G. POLLOCK : TIME SERIES AND FORECASTING Whereas engineers are liable to describe this as a feedback model, economists are more likely to describe it as a model with lagged dependent variables. The foregoing models are termed regression models by virtue of the inclusion of the observable explanatory sequence x(t). When x(t) is deleted, we obtain a simpler unconditional linear stochastic model:

p q

(3.4)

i=0

αi y(t − i) =

i=0

µi ε(t − i).

This is the autoregressive moving-average (ARMA) model. A time-series model can often assume a variety of forms. Consider a simple dynamic regression model of the form (3.5) y(t) = φy(t − 1) + x(t)β + ε(t),

where there is a single lagged dependent variable. By repeated substitution, we obtain y(t) = φy(t − 1) + βx(t) + ε(t) (3.6) = φ2 y(t − 2) + β x(t) + φx(t − 1) + ε(t) + φε(t − 1) . . . = φn y(t − n) + β x(t) + φx(t − 1) + · · · + φn−1 x(t − n + 1) + ε(t) + φε(t − 1) + · · · + φn−1 ε(t − n + 1). If |φ| < 1, then lim(n → ∞)φn = 0; and it follows that, if x(t) and ε(t) are bounded sequences, then, as the number of repeated substitutions increases indeﬁnitely, the equation will tend to the limiting form of

∞ ∞

(3.7)

y(t) = β

i=0

φ x(t − i) +

i i=0

φi ε(t − i).

It is notable that, by this process of repeated substitution, the feedback structure has been eliminated from the model. As a result, it becomes easier to assess the impact upon the output sequence of changes in the values of the input sequence. The direct mapping from the input sequence to the output sequence is described by engineers as a transfer function or as a ﬁlter. For models more complicated than the one above, the method of repeated substitution, if pursued directly, becomes intractable. Thus we are motivated to use more powerful algebraic methods to eﬀect the transformation of the equation. This leads us to consider the use of the so-called lag operator. A proper understanding of the lag operator depends upon a knowledge of the algebra of polynomials and of rational functions. 34

D.S.G. POLLOCK : MODELS AND METHODS The Algebra of the Lag Operator A sequence x(t) = {xt ; t = 0, ±1, ±2, . . .} is any function mapping from the set of integers Z = {0, ±1, ±2, . . .} to the real line. If the set of integers represents a set of dates separated by unit intervals, then x(t) is described as a temporal sequence or a time series. The set of all time series represents a vector space, and various linear transformations or operators can be deﬁned over the space. The simplest of these is the lag operator L which is deﬁned by (3.8) Lx(t) = x(t − 1).

Now, L{Lx(t)} = Lx(t − 1) = x(t − 2); so it makes sense to deﬁne L2 by L2 x(t) = x(t − 2). More generally, Lk x(t) = x(t − k) and, likewise, L−k x(t) = x(t + k). Other operators are the diﬀerence operator = I − L which has the eﬀect that (3.9) x(t) = x(t) − x(t − 1),

the forward-diﬀerence operator ∆ = L−1 − I, and the summation operator S = (I − L)−1 = {I + L + L2 + · · ·} which has the eﬀect that

∞

(3.10)

Sx(t) =

i=0

x(t − i).

In general, we can deﬁne polynomials of the lag operator of the form p(L) = pi Li having the eﬀect that p0 + p1 L + · · · + pn Ln = p(L)x(t) = p0 x(t) + p1 x(t − 1) + · · · + pn x(t − n) (3.11)

n

=

i=0

pi x(t − i).

In these terms, the equation under (2) of the general temporal model becomes (3.12) α(L)y(t) = β(L)x(t) + µ(L)ε(t).

The advantage which comes from deﬁning polynomials in the lag operator stems from the fact that they are isomorphic to the set of ordinary algebraic polynomials. Thus we can rely upon what we know about ordinary polynomials to treat problems concerning lag-operator polynomials.

35

D.S.G. POLLOCK : TIME SERIES AND FORECASTING Algebraic Polynomials Consider the equation φ0 + φ1 z + φ2 z 2 = 0. Once the equation has been divided by φ2 , it can be factorised as (z − λ1 )(z − λ2 ) where λ1 , λ2 are the roots or zeros of the equation which are given by the formula (3.13) λ= −φ1 ± φ2 − 4φ2 φ0 1 . 2φ2

If φ2 ≥ 4φ2 φ0 , then the roots λ1 , λ2 are real. If φ2 = 4φ2 φ0 , then λ1 = λ2 . 1 1 If φ2 < 4φ2 φ0 , then the √ roots are the conjugate complex numbers λ = α + iβ, 1 λ∗ = α − iβ, where i = −1. There are three alternative ways of representing the conjugate complex numbers λ and λ∗ : (3.14) where (3.15) ρ= α2 + β 2 and θ = tan−1 β α . λ = α + iβ = ρ(cos θ + i sin θ) = ρeiθ , λ∗ = α − iβ = ρ(cos θ − i sin θ) = ρe−iθ ,

These are called, respectively, the Cartesian form, the trigonometrical form and the exponential form. The Cartesian and trigonometrical representations are understood by considering the Argand diagram:

Im

β

λ

θ

Re

−θ

ρ

α

λ*

Figure 1. The Argand Diagram showing a complex number λ = α + iβ and its conjugate λ∗ = α − iβ .

36

G. then there is a corresponding root λ∗ = α−iβ such that the product (z−λ)(z−λ∗ ) = z 2 + 2αz + (α2 + β 2 ) is real and quadratic. It follows from adding (17) and (18) that (3. we can factorise this as (3.S. iθ3 θ4 θ2 + + − ··· 2! 3! 4! These are Euler’s equations.23) 0 = zn − i λi z n−1 + i j λi λj z n−2 − · · · (−1)n λ1 λ2 · · · λn . we get (3.19) cos θ = eiθ + e−iθ .16) θ4 θ6 θ2 + − + ··· . 2i Now consider the general equation of the nth order: (3. 3! 5! 7! iθ3 θ4 θ2 − + + ··· 2! 3! 4! i sin θ = iθ − Adding these gives (3.22) (z − λ1 )(z − λ2 ) · · · (z − λn ) = 0. so that.18) cos θ − i sin θ = 1 − iθ − = e−iθ . On dividing by φn .17) cos θ + i sin θ = 1 + iθ − = eiθ . 2! 4! 6! iθ3 iθ5 iθ7 + − + ··· . 37 .D.21) φ0 + φ1 z + φ2 z 2 + · · · + φn z n = 0.20) −i iθ (e − e−iθ ) 2 1 iθ = (e − e−iθ ). POLLOCK : MODELS AND METHODS The exponential form is understood by considering the following series expansions of cos θ and i sin θ about the point θ = 0: cos θ = 1 − (3. 2 Subtracting (18) from (17) gives sin θ = (3. we obtain the expansion (3. if λ = α + iβ is a complex root. The complex roots come in conjugate pairs. where some of the roots may be real and others may be complex. by subtraction. Likewise. When we multiply the n factors together.

is that the series expansion of δ(z)/γ(z) should be convergent whenever |z| ≤ 1.26) y(t) = δ(L) x(t). γ(L) For this to have a meaningful interpretation in the context of a time-series model. n (3. then the ratio δ(z)/γ(z) is described as a proper rational function. γ(z) γ1 (z) γ2 (z) where δ1 (z)/γ1 (z) and δ2 (z)/γ2 (z) are proper rational functions. in that case. where rising powers of z are associated with rising indices on the coeﬃcients.G. 38 . By equating coeﬃcients of the two expressions. We can determine whether or not the sequence will converge by expressing the ratio δ(z)/γ(z) as a sum of partial fractions.S. The necessary and suﬃcient condition for the boundedness of y(t). Rational Functions of Polynomials If δ(z) and γ(z) are polynomial functions of z of degrees d and g respectively with d < g. Thus we can express the polynomial in any of the following forms: φi z i = φn (3. if λ is a root of the primary equation φi z i = 0. Confusion can arise from not knowing which of the two equations one is dealing with. then µ = 1/λ is a root of the equation φi z n−i = 0. This follows since φi λi = φi µ−i = 0 implies that µn φi µ−i = φi µn−i = 0. We shall often encounter expressions of the form (3. then the function can be uniquely expressed as δ1 (z) δ2 (z) δ(z) = + . equivalently. (z − λi ) We should also note that. we normally require that y(t) should be a bounded sequence whenever x(t) is bounded. POLLOCK : TIME SERIES AND FORECASTING This can be compared with the expression (φ0 /φn ) + (φ1 /φn )z + · · · + z n = 0.D. The basic result is as follows: (3.27) If δ(z)/γ(z) = δ(z)/{γ1 (z)γ2 (z)} is a proper rational function.25) = φ0 = φ0 (z − λi ) (−λi )−1 1− z λi . and if γ1 (z) and γ2 (z) have no common factor. which has declining powers of z instead. we ﬁnd that (φ0 /φn ) = (−1)n λi or.24) φn = φ0 i=1 (−λi )−1 .

Example. κ2 . The convergence of the expansion of δ(z)/γ(z) is a straightforward matter. . in a summary notation. . κg .29) κ = κ 1 + z/λ + (z/λ)2 + · · · 1 − z/λ to converge when |z| ≤ 1.32) α0 x(t) + α1 x(t − 1) + · · · + αn x(t − n) = u(t).D. as (3. κ2 = −1. it is necessary and suﬃcient that |λ| > 1.28) δ(z) κ1 κ2 κg = + + ··· + . so κ2 = −κ1 .30) Equating the terms of the numerator gives (3. γ(z) 1 − z/λ1 1 − z/λ2 1 − z/λg By adding the terms on the RHS. we ﬁnd an expression with a numerator of degree n − 1. The equation can be written. we can ﬁnd the values κ1 . . = (1 − z)(1 + 2z) (3. Consider the function 3z 3z = 1 + z − 2z 2 (1 − z)(1 + 2z) κ1 κ2 = + 1−z 1 + 2z κ1 (1 + 2z) + κ2 (1 − z) . For the expansion (3. POLLOCK : MODELS AND METHODS Imagine that γ(z) = (1 − z/λi ).33) α(L)x(t) = u(t). which gives 3 = (2κ1 − κ2 ) = 3κ1 .S. 39 . By equating the terms of the numerator with the terms of δ(z).31) 3z = (2κ1 − κ2 )z + (κ1 + κ2 ). and thus we have κ1 = 1.G. where u(t) is some speciﬁed sequence which is described as the forcing function. Linear Diﬀerence Equations An nth-order linear diﬀerence equation is a relationship amongst n + 1 consecutive elements of a sequence x(t) of the form (3. . For the series converges if and only if the expansion of each of the partial fractions converges. Then repeated applications of this basic result enables us to write (3.

then yj (t) = (1/λj )t is a solution of the equation α(L)y(t) = 0. . it is possible to generate any number of the succeeding elements of the sequence. Next. The general analytic solution of the equation α(L)x(t) = u(t) is expressed as x(t. . and z(t) = α−1 (L)u(t) is called a particular solution of the inhomogeneous equation. In this way. where y(t) is the general solution of the homogeneous equation α(L)y(t) = 0. it follows that α(L)(1/λj )t = 0. cn ] which can be determined once we are given a set of n consecutive values of x(t) which are called initial conditions. . we can deduce the function x(t) from the diﬀerence equation. Solution of the Homogeneous Diﬀerence Equation If λj is a root of the equation α(z) = α0 + α1 z + · · · + αn z n = 0 such that α(λj ) = 0. in eﬀect.G. We shall discuss in detail only the solution of the homogeneous equation. c) + z(t). c) = y(t. and since L 1− λj 1 λj t − L/λi ). The function x(t. The values of the sequence prior to t = 1 can be generated likewise.D. . = 1 λj t − 1 λj t = 0. . This can be see this by considering the expression α(L) 1 λj t = α0 + α1 L + · · · + αn Ln = α0 1 λj t 1 λj t + α1 (3. However. POLLOCK : TIME SERIES AND FORECASTING where α(L) = α0 + α1 L + · · · + αn Ln . say x1 . We may solve the diﬀerence equation in three steps. we often seek an analytic expression for x(t). . If n consecutive values of x(t) are given. . i (1 Alternatively. . we use the n initial values of x to determine the constants c1 . expressing the analytic solution. c2 . First. . Finally. x2 . xn . . we ﬁnd the particular solution z(t) which embodies no unknown quantities. then the relationship can be used to ﬁnd the succeeding value xn+1 . c). and thus. .S. we ﬁnd the general solution of the homogeneous equation.34) 1 λj t−1 + · · · + αn 1 λj t 1 λj t−n = α0 + α1 λj + · · · + αn λn j = α(λj ) 1 λj t . instead of a recursive solution. so long as u(t) is fully speciﬁed. cn . 40 . one may consider the factorisation α(L) = α0 Within this product is the term 1 − L/λj . . c2 . will comprise a set of n constants in c = [c1 .

38) y(t) = cµt + c∗ (µ∗ )t . but.G. dr−1 then. If λ. in the case where α(L) = 0 has distinct real roots. let us extract the term (1 − L/λj )2 from the factorisation α(L) = α0 i (1 − L/λi ). . To show this. µ∗ = γ − iδ = κ(cos ω − i sin ω) = κe−iω . c2 . we have (1 − L/λj )2 (1/λj )t = 0.37) µ = γ + iδ = κ(cos ω + i sin ω) = κeiω . t. µ∗ . If these coeﬃcients are d0 . . Then. t(1/λj )t . then so too are µ. let us write (3.S. Then the functions 1. cn are the constants which are determined by the initial conditions. . 41 . there will be found the term d0 +d1 t+d2 t2 +· · ·+dr−1 tr−1 which represents a polynomial in t of degree r − 1. according to the previous argument. . . . is given by (3.36) 1 λj t t = 2L L2 1− + 2 λj λj 1 λj t t 1 λj 1 λj t t − 2(t − 1) + (t − 2) = 0. The 2nd-order Diﬀerence Equation with Complex Roots Imagine that the 2nd-order equation α(L)y(t) = α0 y(t) + α1 y(t − 1) + α2 y(t − 2) = 0 is such that α(z) = 0 has complex roots λ = 1/µ and λ∗ = 1/µ∗ . t2 . . tr−1 (1/λj )t are solutions to the equation α(L)y(t) = 0. A particularly important special case arises when there are r repeated roots of unit value.35) y(t. . where c1 . d2 . the equation α(L)y(t) = 0 has the solutions y1 (t) = (1/λj )t and y2 (t) = t(1/λj )t . tr−1 are all solutions to the homogeneous equation. within the general solution of the homogeneous equation. d1 . . In the case where two roots coincide at a value of λj . . t2 (1/λj )t . also. . then all of (1/λj )t . . c) = c1 1 λ1 t + c2 1 λ2 t + · · · + cn 1 λn t . . In general. . if there are r repeated roots with a value of λj . . λ∗ are conjugate complex numbers. POLLOCK : MODELS AND METHODS The general solution. we have L 1− λj =t 2 t 1 λj (3. These will appear in a general solution of the diﬀerence equation of the form (3. With each solution is associated a coeﬃcient which can be determined in view of the initial conditions.D. . Therefore.

since a real term must equal its own conjugate. Thus the general solution becomes cµt + c∗ (µ∗ )t = ρe−iθ (κeiω )t + ρeiθ (κe−iω )t (3. and it serves to shift the cosine function along the axis of t so that. c = ρ(cos θ − i sin θ) = ρe−iθ .G.69. The term θ is called the phase displacement of the cosine wave. it follows that c and c∗ must be conjugate numbers of the form (3. is called the angular velocity or the angular frequency of the wave.40) = ρκt ei(ωt−θ) + e−i(ωt−θ) = 2ρκt cos(ωt − θ). The value f = ω/2π is its frequency in cycles per unit period.69L + 0. p1 p2 0 5 10 15 20 25 The solution of the homogeneous diﬀerence equation (1 − 1.S. The duration of one cycle.39) c∗ = ρ(cos θ + i sin θ) = ρeiθ . To analyse the ﬁnal expression. is r = 2π/ω. POLLOCK : TIME SERIES AND FORECASTING 8 6 4 2 0 −2 −5 Figure 2. the peak would occur at the value of t = θ/ω instead of at t = 0. This represents a real-valued sequence. The value ω. This is a displaced cosine wave. also called the period. 42 .81L2 )y(t) = 0 for the initial conditions y0 = 1 and y1 = 3. The time lag of the phase displacement p1 and the duration of the cycle p2 are also indicated. and. in the absence of damping. which is a number of radians per unit period.D. consider ﬁrst the factor cos(ωt − θ).

The damping also serves to shift the peaks of the cosine function slightly to the left.G. must lie inside the unit circle. |µ2 | < 1 imply that (3. We can reveal these conditions most readily by considering the auxiliary polynomial ρ(z) = z 2 α(z −1 ) whose roots. The phase angle θ is also a product of the initial conditions.D. which is necessary and suﬃcient for stability. POLLOCK : MODELS AND METHODS √ Next consider the term κt wherein κ = (γ 2 + δ 2 ) is the modulus of the complex roots. which might be real or complex. where is is assumed that α0 > 0. Finally. Consider (3. When κ has a value of less than unity.S. imposes certain restrictions on the coeﬃcients of α(z) which can be checked easily. From (37) it follows that (3. the factor 2ρ aﬀects the initial amplitude of the cosine wave which is the value which it assumes when t = 0.43) α0 + α1 L + α2 L2 = α0 (1 − 2κ cos ω L + κ2 L2 ). and it is usual to set α0 = 1.44) = α0 (z − µ1 )(z − µ2 ) = α0 z 2 − (µ1 + µ2 )z + µ1 µ2 . The condition that the roots of α(z) = 0 must lie outside the unit circle. 43 . Let the roots of ρ(z). This representation indicates that a necessary condition for the roots to be complex. be denoted by µ1 . which are the inverses of those of α(z). Therefore the polynomial operator which is entailed by the diﬀerence equation is (3. Since ρ is just the modulus of the values c and c∗ .41) α(z) = α0 (1 − µz)(1 − µ∗ z) = α0 1 − (µ + µ∗ )z + µµ∗ z 2 . Then we can write ρ(z) = α0 z 2 + α1 z + α2 (3.45) −α0 < α2 < α0 . It is easy to ascertain by inspection whether or not the second-order difference equation is stable. which is not a suﬃcient condition. it becomes a damping factor which serves to attenuate the cosine wave as t increases. is that α2 /α0 > 0. Therefore the conditions |µ1 |. It is instructive to derive an expression for the second-order diﬀerence equation which is in terms of the parameters of the trigonometrical or exponential representations of a pair of complex roots. µ2 .42) µ + µ∗ = 2κ cos ω and µµ∗ = κ2 . This indicates that α2 /α0 = µ1 µ2 . this amplitude reﬂects the initial conditions.

. However. . . in reference to which we may deﬁne the following variables: ξ1 (t) = x(t).48) . then this condition will ensure that µ∗ µ = α2 /α0 < 1. . . if α0 > 0. . which is the condition that they are within the unit circle. certain of these are described as canonical forms by virtue of special structures in the matrix. it follows that the conditions under (45) and (46) in combination are necessary and suﬃcient to ensure that the roots of ρ(z) = 0 are within the unit circle and that the roots of α(z) = 0 are outside. . POLLOCK : TIME SERIES AND FORECASTING If the roots are complex conjugate numbers µ.47) ξ2 (t) = ξ1 (t − 1) = x(t − 1). = . . . a ﬁrst-order vector equation may be constructed in the form of ξ (t) −α1 1 ξ2 (t) 1 . Now consider the fact that.. If the roots are real. . then they will be found in the interval (−1. (3. 1 44 ξ (t − 1) 1 −αn 1 0 ξ2 (t − 1) 0 + . State-Space Models An nth-order diﬀerence equation in a single variable can be transformed into a ﬁrst-order system in n variables which are the elements of a so-called state vector. . 0 .. . 1) if and only if ρ(−1) = α0 − α1 + α2 > 0 and ρ(1) = α0 + α1 + α2 > 0. . .. 0 0 ξn (t − 1) (3. 0 ξn (t) .D.. µ∗ = γ ± iδ.G. (3. . . In demonstrating one of the more common canonical forms. ξn (t) = ξn−1 (t − 1) = x(t − n + 1). . . From these arguments. If the roots are complex then these conditions are bound to be satisﬁed. There is a wide variety of alternative forms which can be assumed by a ﬁrst-order vector diﬀerence equation corresponding to the nth-order scalar equation.S. . −αn−1 . let us consider again the nth-order diﬀerence equation of (32).46) On the basis of these deﬁnitions. .. . ε(t). then the function ρ(z) will have a minimum value over the real line which is greater than zero if the roots are complex and no greater than zero if they are real. .

we get (3. the equation can also be written as (3. Example. z(t − 1) ζ(t) With the use of the lag operator. Then the equation becomes (3.24). ζ(t) cos ω where η(t) is a white-noise sequence. by virtue of the inclusion of the damping factor κ ∈ [0. in view of the deﬁnitions under (47).49) ξ1 (t) = −α1 ξ1 (t − 1) + · · · + αn ξn (t − 1) + ε(t). 2 L2 z(t) κ sin ωL 1 − κ cos ωL ζ(t) 1 − 2κ cos ωL + κ A special case arises when (3. z(t) ζ(t) On premultiplying the equation by the inverse of the matrix on the LHS. An example of a system which is not in a canonical form is provided by the following matrix equation: (3.55) (1 − 2κ cos ωL + κ2 L2 )y(t) = ε(t).51) 1 − κ cos ωL −κ sin ωL κ sin ωL 1 − κ cos ωL y(t) υ(t) = .G. This is just a second-order diﬀerence equation with a white-noise forcing function. that the leading equation of the system.S. is precisely the equation under (32). Here it is manifest. 45 . cos ω On deﬁning ε(t) = − sin ωη(t) we may write the ﬁrst of these equations as (3. and. which is (3.D. 1).54) 1 y(t) = z(t) 1 − 2κ cos ωL + κ2 L2 − sin ω η(t). POLLOCK : MODELS AND METHODS The matrix in this structure is sometimes described as the companion form.50) y(t) cos ω =κ z(t) sin ω − sin ω cos ω y(t − 1) υ(t) + .52) 1 y(t) 1 − κ cos ωL −κ sin ωL υ(t) = .53) υ(t) − sin ω = η(t). it represents a generalisation of the equation to be found under (2.

. .D.}. β. 0. . . . this can be rewritten as (3.}. on the input side.59) β = β 1 + φz + φ2 z 2 + · · · .} 46 . . β. β + βφ. βφ. 1 − φz where the RHS comes from a familiar power-series expansion. . 0. 1.G. 1 − φL 1 − φL (1 − φL)y(t) = βx(t) + ε(t) The latter is the so-called rational transfer-function form of the equation.62) x(t) = {. .58) y(t) = 1 β x(t) + ε(t). . β + βφ + βφ2 . The mapping of this sequence through the transfer function would result in an output sequence of (3. equivalently. βφ2 . . 1. βφ.56) y(t) = φy(t − 1) + x(t)β + ε(t). .61) r(t) = {. . . Then the transfer function which is associated with the signal x(t) becomes (3.60) x(t) = {.S. βφ2 . The operator L within the transfer functions or ﬁlters can be replaced by a complex number z. as (3. With the use of the lag operator. We may imagine that the input sequence is zero-valued up to a point in time when it assumes a constant unit value: (3. That is to to say. POLLOCK : TIME SERIES AND FORECASTING Transfer Functions Consider again the simple dynamic model of equation (5): (3. 0. . the signal is a unit-impulse sequence of the form (3. . .}. . . . . . . which has zero values at all but one instant. 1. . then its mapping through the transfer function would result in an output sequence of (3. . . if we imagine that. 0.63) s(t) = {.} of the coeﬃcients of the expansion constitutes the impulse response of the transfer function. . 0.57) or. 0. The sequence {β. 1. . Another important concept is the step response of the ﬁlter.

The rational function associated with x(t) has a series expansion β(z) = ω(z) α(z) = ω0 + ω 1 z + ω 2 z 2 + · · · . the step response converges to a value (3. These various concepts apply to models of any order. from the point when the step occurs in x(t).D. . β(L) = β0 + β1 L + · · · + βk Lk are polynomials of the lag operator.68) The method of ﬁnding the coeﬃcients of the series expansion of the transfer function in the general case can be illustrated by the second-order case: (3. β + βφ.G. are simply the partial sums of the impulse-response sequence. The gain of the transfer function is deﬁned by (3. α(L) α(L) α(L)y(t) = β(L)x(t) + ε(t).66) = 1 − φ1 L − · · · − φp Lp . POLLOCK : MODELS AND METHODS whose elements.S. and the sequence of the coeﬃcients of this expansion constitutes the impulseresponse function. The partial sums of the coeﬃcients constitute the stepresponse function.67) y(t) = β(L) 1 x(t) + ε(t).70) β0 + β1 z = ω0 + ω1 z + ω 2 z 2 + · · · . . β + βφ + βφ2 .69) γ= β(1) β0 + β1 + · · · + βk = .64) γ= β 1−φ which is described as the steady-state gain or the long-term multiplier of the transfer function.65) where α(L) = 1 + α1 L + · · · + αp Lp (3. 1 − φ1 z − φ 2 z 2 47 . This sequence of partial sums {β.} is described as the step response. Given that |φ| < 1. α(1) 1 + α1 + · · · + αp (3. . Consider the equation (3. The transfer-function form of the model is simply (3.

ω0 = β0 .81L2 ). then the sequence will converge monotonically to zero whereas. (3. β1 = ω1 − φ1 ω0 . . which has been presented as the solution to a homogeneous diﬀerence equation. ω1 = β1 + φ1 ω0 . we ﬁnd that β0 = ω0 . it is apparent that the coeﬃcients of the denominator polynomial 1−φ1 z −φ2 z 2 serve to determine the period and the damping factor of a complex impulse response. represents the impulse response of the transfer function (1 + 2L)/(1 − 1. then the sequence will converge in the manner of a damped sinusoid. it is possible 48 . If the roots of these equations are real. . which serves to generate the elements of the impulse response. . . if the roots are complexvalued.D. ωn = φ1 ωn−1 + φ2 ωn−2 . equivalently.72) The necessary and suﬃcient condition for the convergence of the sequence {ωi } is that the roots of the primary polynomial equation 1 − φ1 z − φ2 z 2 = 0 should lie outside the unit circle or.73) ω(n) = φ1 ω(n − 1) + φ2 ω(n − 2). is nothing but a second-order homogeneous diﬀerence equation. It seems that all four coeﬃcients must be present if a secondorder transfer function is to have complete ﬂexibility in modelling a dynamic response. It is clear that the equation (3. 0 = ω2 − φ1 ω1 − φ2 ω0 .S. ω2 = φ1 ω1 + φ2 ω0 . it is of interest to consider the response of a transfer function to a signal which is a simple sinusoid. The coeﬃcients in the numerator polynomial β0 + β1 z serve to determine the initial amplitude of the response and its phase lag.71) β0 + β1 z = 1 − φ1 z − φ2 z 2 ω0 + ω 1 z + ω 2 z 2 + · · · .69L + 0. . 0 = ωn − φ1 ωn−1 − φ2 ωn−2 . and by equating the coeﬃcients of the same powers of z on the two sides of the equation. As we have indicated in a previous lecture. Figure 2. In fact. . Then.G. POLLOCK : TIME SERIES AND FORECASTING We rewrite this equation as (3. The Frequency Response In many applications within forecasting and time-series analysis. by performing the multiplication on the RHS. that the roots of the auxiliary equation z 2 − φ1 z − φ2 = 0—which are the inverses of the former roots—should lie inside the unit circle. In the light of this result.

81L2 ).The gain of the transfer function (1 + 2L2 )/(1 − 1.69L + 0.81L2 ). POLLOCK : MODELS AND METHODS 50 40 30 20 10 0 −π −π/2 0 π/2 π Figure 3.69L + 0.G.S. 49 .D.The phase diagram of the transfer function (1 + 2L2 )/(1 − 1. π −π −π −π/2 0 π/2 π Figure 4.

Also there is a phase eﬀect whereby the peak of the sinusoid is displaced by a time delay of θ/ω periods. First there is a gain eﬀect whereby the amplitude of the sinusoid has been increased or diminished by a factor of ρ. α and θ = tan−1 It can be seen from (75) that the eﬀect of the ﬁlter upon the signal is twofold.S. The trigonometrical identity cos(A − B) = cos A cos B + sin A sin B enables us to write this as y(t) = (3. it is possible. π]. POLLOCK : TIME SERIES AND FORECASTING to represent a ﬁnite sequence as a sum of sine and cosine functions whose frequencies are integer multiples of a fundamental frequency. The output is y(t) = γ(L) cos(ωt) (3.74) g = j=0 γj cos ω[t − j] .G. It follows that the eﬀect of a transfer function upon stationary signals can be characterised in terms of its eﬀect upon the sinusoidal functions.D. Here we have deﬁned g g α= (3. Figures 3 and 4 represent the two eﬀects of a simple rational transfer function on the set of sinusoids whose frequencies range from 0 to π. More generally. to represent an arbitrary stationary stochastic process as a combination of an inﬁnite number of sine and cosine functions whose frequencies range continuously in the interval [0. α2 + β 2 β= j=0 γj sin(ωj). 50 .76) ρ= j=0 γj cos(ωj). as we shall see later.75) j γj cos(ωj) cos(ωt) + j γj sin(ωj) sin(ωt) = α cos(ωt) + β sin(ωt) = ρ cos(ωt − θ). Consider therefore the consequences of mapping the signal x(t) = cos(ωt) through the transfer function γ(L) = γ0 + γ1 L + · · · + γg Lg . β .

for any 51 . no notational distinction is made between these random variables and their realised values. Such linear combinations are described as Fourier sums or Fourier series. The spectral representation is rooted in the basic notion of Fourier analysis which is that well-behaved functions can be approximated over a ﬁnite interval. The condition of weak stationarity requires only that the elements of the time series should have a common ﬁnite expected value and that the autocovariance of two elements should depend only on their temporal separation. By passing white noise through a linear ﬁlter. onto the real line or into a subset thereof. In its strong form. In order to analyse a statistical time series. described as the index set. to any degree of accuracy. a sequence whose elements are serially correlated can be generated. each with a zero mean and the same ﬁnite variance. virtually every stationary stochastic process may be depicted as the product of a ﬁltering operation applied to white noise. Usually. by a weighted combination of sine and cosine functions whose harmonically rising frequencies are integral multiples of a fundamental frequency. It is important nevertheless to bear the distinction in mind.LECTURE 4 Time-Series Analysis in the Frequency Domain A sequence is a function mapping from a set of integers. from which many other stationary processes may be derived. A time series is a sequence whose index corresponds to consecutive dates separated by a unit time interval. Of course. it must be assumed that the structure of the statistical or stochastic process which generates the observations is essentially invariant through time. In the statistical analysis of time series. is the so-called white-noise process which consists of a sequence of uncorrelated random variables. the notion applies to sequences as well. In fact. the elements of the sequence are regarded as a set of random variables. the condition requires that any two segments of equal length which are extracted from the time series must have identical multivariate probability density functions. A fundamental process. This result follows from the Cram´r–Wold Theorem which e will be presented after we have introduced the concepts underlying the spectral representation of a time series. The conventional assumptions are summarised in the condition of stationarity.

π]. yt+n ] and [ys . a Fourier integral suﬃces to represent an aperiodic time series generated by a stationary stochastic process. which is summarised in the Wiener–Khintchine theorem. . On the assumption that the ﬁrst and second-order moments of the distribution are ﬁnite. the condition of stationarity implies that all the elements of y(t) have the same expected value and that the covariance between any pair of elements of the sequences is a function only of their temporal separation. Thus. This is extended to provide a representation of an inﬁnite sequence in terms of an inﬁnity of trigonometrical functions whose frequencies range continuously in the interval [0. the conditions of (2) constitute the conditions of weak stationarity. associated respectively with the sine and cosine functions. ±1. which is a function f (ω). . The trigonometrical functions and their weighting functions are gathered under a Fourier–Stieltjes integral. . whereas a Fourier sum serves only to deﬁne a strictly periodic function. This is achieved by describing the stochastic processes which generate the weighting functions. Stationarity Consider two vectors of n + 1 consecutive elements from the process y(t): (4.G.1) [yt . The Fourier integral is also used to represent the underlying stochastic process. yt+1 . π]. and their common variance. t = 0. We shall approach the Fourier analysis of stochastic processes via the exact Fourier representation of a ﬁnite sequence. . a Fourier transform of the autocovariances. .D. ±2. ys ) = γ|t−s| . The relationship between the spectral density function and the sequence of autocovariances. is the so-called spectral density function. . a normal process y(t) which satisﬁes the conditions for weak stationarity is also stationary in the strict sense. conversely. ys+1 .S. . A normal process is completely characterised by its mean and its autocovariances. 52 . provides a link between the time-domain and the frequency-domain analyses. . POLLOCK : TIME SERIES AND FORECASTING number of well-behaved functions may be interpolated through the coordinates of a ﬁnite sequence. . On their own. . . There are two such weighting processes.} is strictly stationary if the joint probability density functions of the two vectors are the same for any values of t and s regardless of the size of n. ys+n ] Then y(t) = {yt . ω ∈ [0. The sequence of autocovariances may be obtained from the Fourier transform of the spectral density function and the spectral density function is. Therefore.2) E(yt ) = µ and C(yt . (4. It is remarkable that.

γn−1 0 γ0 γ1 . . . if t = s. (4. POLLOCK : THE FREQUENCY DOMAIN The Autocovariance Function The covariance between two elements yt and ys of a process y(t) which are separated by τ − |t − s| intervals of time. The stationarity conditions imply that the autocovariances of y(t) satisfy the equality (4.4) γτ = γ−τ for all values of τ .. where γ0 is the variance of the process y(t).D. is known as the autocovariance at lag τ and is denoted by γτ . . . is a polynomial in the lag operator of the form µ(L) = µ0 + µ1 L + · · · + µq Lq . .3) ρτ = γ0 . a variety of time series may be constructed whose elements display complex interdependencies. Thus E(εt ) = 0.S. is deﬁned by γτ (4. . . The autocorrelation at lag τ . The eﬀect of this ﬁlter on ε(t) is described by the equation y(t) = µ(L)ε(t) = µ0 ε(t) + µ1 ε(t − 1) + µ2 ε(t − 2) + · · · + µq ε(t − q) q (4.. . . γn−2 γ1 γ2 γ1 γ0 . The Filtering of White Noise A white-noise process is a sequence ε(t) of uncorrelated random variables 2 with mean zero and common variance σε .G. y1 . . if t = s.6) E(εt εs ) = for all t 2 σε .5) Γ= . . A ﬁnite linear ﬁlter. . 53 . . . . . (4. . The autocovariance matrix of a stationary process corresponding to the n elements y0 . . . . γn−3 . denoted by ρτ . yn−1 is given by γ γ1 γ2 .7) = i=0 µi ε(t − i). γn−1 γn−2 γn−3 .. . . By a process of linear ﬁltering. γ0 The sequences {γτ } and {ρτ } are described as the autocovariance and autocorrelation functions respectively. also called a moving-average operator. . 0.

} is deﬁned by ∞ (4. . it is required that (4. . for this to be practical. it will prove helpful to present the results in the notation of the z-transform. The z-transform of the inﬁnite sequence y(t) = {yt . An operator µ(L) = {µ0 + µ1 L + µ2 L2 + · · ·} with an indeﬁnite number of terms in rising powers of L may also be considered. .10) 2 γτ = σε j µj µj+τ . γτ = E(yt yt−τ ) (4. as is required by the condition of stationarity. POLLOCK : TIME SERIES AND FORECASTING The operator µ(L) is also be described as the transfer function which maps the input sequence ε(t) into the output sequence y(t). j The condition under equation (8) guarantees that these quantities are ﬁnite. In addition. The z-transform In the subsequent analysis. the coeﬃcients {µ0 . ±1. Here z is a complex number which may be placed on the perimeter of the unit circle provided that the series converges. the autocovariances of the ﬁltered sequence y(t) = µ(L)ε(t) may be determined by evaluating the expression |µi | < ∞.8) i 2 Given the value of σε = V {ε(t)}.11) 2 γ0 = σε j µ2 . ±2. t = 0. it follows that (4. µ2 . 2π] 54 . From equation (6).9) =E i µi εt−i j µj εt−τ −j = i j µi µj E(εt−i εt−τ −j ).12) y(z) = τ =−∞ yt z t .D. . and so the variance of the ﬁltered sequence is (4. µ1 .} must be functions of a limited number of fundamental parameters. However.S. .G. Thus z = e−iω with ω ∈ [0. .

cos(ωn t) = cos(πt) = (−1)t . . by a weighted sum of sine and cosine functions of harmonically increasing frequencies. . and it follows that sin(ω0 t) = sin(0) = 0. cos(ω0 t) = cos(0) = 1.14) yt = j=0 αj cos(ωj t) + βj sin(ωj t) . . POLLOCK : THE FREQUENCY DOMAIN If y(t) = µ0 ε(t) + µ1 ε(t − 1) + · · · + µq z q ε(t − q) = µ(L)ε(t) is a movingaverage process. . j = 0. sin(ωn t) = sin(πt) = 0. it is possible to devise an expression in the form n (4. τ =i−j = τ =−∞ γτ z τ . this is given by 2 γ(z) = σε µ(z)µ(z −1 ) 2 = σε i 2 = σε µi z i j µj z −j µi µj z i−j i j 2 σε (4. This expression is called the Fourier decomposition of yt and the set of coeﬃcients {αj . The ﬁnal equality is by virtue of equation (10). we have n = T /2. it is always possible to approximate an arbitrary analytic function deﬁned over a ﬁnite interval of the real line. to any desired degree of accuracy. 1. the elements of a ﬁnite sequence can be expressed exactly in terms of sines and cosines. . then the z-transform of the sequence of moving-average coeﬃcients is the polynomial µ(z) = µ0 + µ1 z + · · · + µq z q which has the same form as the operator µ(L). For a sample of T observations y0 .15) .G. The Fourier Representation of a Sequence According to the basic result of Fourier analysis. yT −1 . 55 (4. which may be regarded as functions mapping from the set of integers onto the real line. For the moving-average process. n} are called the Fourier coeﬃcients. . βj . When T is even.13) = τ ∞ µj µj+τ z τ j . Thus. .S.D. wherein ωj = 2πj/T is a multiple of the fundamental frequency ω1 = 2π/T . The z-transform of a sequence of autocovariances is called the autocovariance generating function. Similar results apply in the case of sequences. .

and the mapping from the sample values to the coeﬃcients constitutes a one-to-one invertible transformation. whereas.S. the eﬀects on the process of components with frequencies in excess of the Nyquist value will be confounded with those whose frequencies fall below it. . βj = dB(ωj ) where A(ω). these will not be detected when it is sampled regularly at unit intervals of time. j = 0. . By writing αj = dA(ωj ). there are T nonzero coeﬃcients amongst the set {αj . we have n = (T − 1)/2. POLLOCK : TIME SERIES AND FORECASTING Therefore.18) = cos(2π) cos(ω ∗ t) + sin(2π) sin(ω ∗ t) = cos(ω ∗ t). B(ω) are step functions with discontinuities at the points {ωj .16) yt = α0 + j=1 αj cos(ωj t) + βj sin(ωj t) + αn (−1)t . the expression (14) can be rendered as (4. To demonstrate this. and then equation (14) becomes n (4. . and therefore an alternative representation in terms of diﬀerentials is called for. consider the case where the process contains a component which is a pure cosine wave of unit amplitude and zero phase whose frequency ω lies in the interval π < ω < 2π.G. n}. it is possible to express a sequence of indeﬁnite length in terms of a sum of sine and cosine functions. In equation (16). However. When T is odd. they range from ω1 = 2π/T to ωn = π(T − 1)/T . In both cases. Let ω ∗ = 2π − ω. ω ∗ < π is described as the alias of ω > π. 56 . The frequency π is the so-called Nyquist frequency. in the limit as n → ∞. In fact.17) yt = α0 + j=1 αj cos(ωj t) + βj sin(ωj t) . . βj . Here. βj tend to vanish.19) yt = j cos(ωj t)dA(ωj ) + sin(ωj t)dB(ωj ) . . the frequencies of the trigonometric functions range from ω1 = 2π/T to ωn = π. which indicates that ω and ω ∗ are observationally indistinguishable.D. 1. equation (14) becomes n−1 (4. Then cos(ωt) = cos (2π − ω ∗ )t (4. Although the process generating the data may contain components of frequencies higher than the Nyquist frequency. . j = 0. The Spectral Representation of a Stationary Process By allowing the value of n in the expression (14) to tend to inﬁnity. in equation (17). n}. . the coeﬃcients αj . .

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

0.6 0.4 0.2 0.0 −0.2 −0.4 0 25 50 75 100 125

**0.125 0.100 0.075 0.050 0.025 0.000 0 π/4 π/2 3π/4 π
**

Figure 1. The graph of 134 observations on the monthly purchase of clothing after a logarithmic transformation and the removal of a linear trend together with the corresponding periodogram.

57

D.S.G. POLLOCK : TIME SERIES AND FORECASTING In the limit, as n → ∞, the summation is replaced by an integral to give the expression

π

(4.20)

y(t) =

0

cos(ωt)dA(ω) + sin(ωt)dB(ω) .

Here, cos(ωt) and sin(ωt), and therefore y(t), may be regarded as inﬁnite sequences deﬁned over the entire set of positive and negative integers. Since A(ω) and B(ω) are discontinuous functions for which no derivatives exist, one must avoid using α(ω)dω and β(ω)dω in place of dA(ω) and dB(ω). Moreover, the integral in equation (20) is a Fourier–Stieltjes integral. In order to derive a statistical theory for the process that generates y(t), one must make some assumptions concerning the functions A(ω) and B(ω). So far, the sequence y(t) has been interpreted as a realisation of a stochastic process. If y(t) is regarded as the stochastic process itself, then the functions A(ω), B(ω) must, likewise, be regarded as stochastic processes deﬁned over the interval [0, π]. A single realisation of these processes now corresponds to a single realisation of the process y(t). The ﬁrst assumption to be made is that the functions A(ω) and B(ω) represent a pair of stochastic processes of zero mean which are indexed on the continuous parameter ω. Thus (4.21) E dA(ω) = E dB(ω) = 0.

The second and third assumptions are that the two processes are mutually uncorrelated and that non-overlapping increments within each process are uncorrelated. Thus E dA(ω)dB(λ) = 0 for all ω, λ, (4.22) E dA(ω)dA(λ) = 0 if ω = λ, E dB(ω)dB(λ) = 0 if ω = λ. The ﬁnal assumption is that the variance of the increments is given by (4.23) V dA(ω) = V dB(ω) = 2dF (ω) = 2f (ω)dω.

We can see that, unlike A(ω) and B(ω), F (ω) is a continuous diﬀerentiable function. The function F (ω) and its derivative f (ω) are the spectral distribution function and the spectral density function, respectively. In order to express equation (20) in terms of complex exponentials, we may deﬁne a pair of conjugate complex stochastic processes: (4.24) 1 dA(ω) − idB(ω) , 2 1 dA(ω) + idB(ω) . dZ ∗ (ω) = 2 dZ(ω) = 58

D.S.G. POLLOCK : THE FREQUENCY DOMAIN Also, we may extend the domain of the functions A(ω), B(ω) from [0, π] to [−π, π] by regarding A(ω) as an even function such that A(−ω) = A(ω) and by regarding B(ω) as an odd function such that B(−ω) = −B(ω). Then we have (4.25) dZ ∗ (ω) = dZ(−ω).

**From conditions under (22), it follows that (4.26) E dZ(ω)dZ ∗ (λ) = 0 if
**

∗

ω = λ,

E{dZ(ω)dZ (ω)} = f (ω)dω.

**These results may be used to reexpress equation (20) as
**

π

y(t) =

0 π

(4.27)

=

0 π

(eiωt − e−iωt ) (eiωt + e−iωt ) dA(ω) − i dB(ω) 2 2 {dA(ω) − idB(ω)} {dA(ω) + idB(ω)} + e−iωt eiωt 2 2 eiωt dZ(ω) + e−iωt dZ ∗ (ω) .

=

0

**When the integral is extended over the range [−π, π], this becomes
**

π

(4.28)

y(t) =

−π

eiωt dZ(ω).

This is commonly described as the spectral representation of the process y(t). The Autocovariances and the Spectral Density Function The sequence of the autocovariances of the process y(t) may be expressed in terms of the spectrum of the process. From equation (28), it follows that the autocovariance yt at lag τ = t − k is given by γτ = C(yt , yk ) = E

ω

eiωt dZ(ω)

λ

e−iλk dZ(−λ)

= (4.29) =

ω ω λ

**eiωt e−iλk E{dZ(ω)dZ ∗ (λ)} eiωτ E{dZ(ω)dZ ∗ (ω)} eiωτ f (ω)dω.
**

ω

=

59

25 0.902L2 )y(t) = (1 − 1.D. The theoretical autocorrelation function of the ARMA(2.810L2 )ε(t) and (below) the corresponding spectral density function.344L + 0.50 −0.75 0. 60 . POLLOCK : TIME SERIES AND FORECASTING 1.G.00 0.691L + 0.50 0.S. 2) process (1 − 1.75 0 5 10 15 20 25 40 30 20 10 0 0 π/4 π/2 3π/4 π Figure 2.25 −0.00 −0.

To demonstrate the relationship which exists between equations (29) and (30). cT −1 in place of an indeﬁnite number of theoretical autocovariances. However. 0.41). which is γ0 . equation (30) serves as the primary deﬁnition of the spectrum. the periodogram has T empirical autocovariances c0 . . This equation shows how the variance or ‘power’ of y(t). 2π τ =1 This function is directly comparable to the periodogram of a data sequence which is deﬁned under (2. Also.D.S. In many texts. . we may substitute the latter into the former to give π γτ = (4. The inverse mapping from the autocovariances to the spectrum is given by 1 f (ω) = 2π (4. (4. if κ = τ . it diﬀers from the spectrum by a scalar factor of 4π. is attributed to the cyclical components of which the process is composed.32) −π eiω(τ −κ) dω = 2π. it can be seen that the RHS of the equation reduces to γτ . This equation indicates that the Fourier transform of the spectrum is the autocovariance function. if κ = τ .30) ∞ τ =−∞ ∞ γτ e−iωτ 1 γ0 + 2 = γτ cos(ωτ ) . . 61 .31) = From the fact that π eiωτ −π 1 2π γκ ∞ τ =−∞ π −π γτ e−iωτ dω 1 2π ∞ κ=−∞ eiω(τ −κ) dω. . This serves to show that equations (29) and (30) do indeed represent a Fourier transform and its inverse.33) γ0 = ω f (ω)dω. The essential interpretation of the spectral density function is indicated by the equation (4. POLLOCK : THE FREQUENCY DOMAIN Here the ﬁnal equalities are derived by using the results (25) and (26).G. which comes from setting τ = 0 in equation (29).

2π Canonical Factorisation of the Spectral Density Function Let y(t) be a stationary stochastic process whose spectrum is fy (ω). µ(ω) . it follows that π γτ = (4. and. the expression for the spectrum of the white-noise process becomes (4. When the variance is denoted by σε . from equation (29). eiωτ dω −π These are the same as the conditions under (6) which have served to deﬁne a 2 white-noise process. Let fε = fε (ω) be the ﬂat spectrum. On deﬁning (4.G. POLLOCK : TIME SERIES AND FORECASTING It is easy to see that a ﬂat spectrum corresponds to the autocovariance function which characterises a white-noise process ε(t). it is always possible to ﬁnd a complex function µ(ω) such that (4. it follows that π (4. Since fy (ω) ≥ 0.35) −π fε (ω)eiωτ dω π = fε = 0.39) dZε (ω) = 62 dZy (ω) .37) fy (ω) = 1 µ(ω)µ∗ (ω). the function µ(ω) may be constructed in such a way that it can be expanded as a one-sided Fourier series: ∞ (4.D. from equation (30).34) γ0 = −π fε (ω)dω = 2πfε .S.38) µ(ω) = j=0 µj e−iωj . 2π For a wide class of stochastic processes.36) fε (ω) = 2 σε . Then.

44) x(t) = ω eiωt dZx (ω). Therefore equation (38) represents a moving-average process. POLLOCK : THE FREQUENCY DOMAIN the spectral representation of the process y(t) given in equation (28).G.D. The spectrum of ε(t) is given by ∗ E{dZε (ω)dZε (ω)} ∗ dZy (ω)dZy (ω) =E µ(ω)µ∗ (ω) fy (ω) = µ(ω)µ∗ (ω) 1 . 63 .S.41) = j µj = j µj ε(t − j). The Frequency-Domain Analysis of Filtering It is a straightforward matter to derive the spectrum of a process y(t) = µ(L)x(t) which is formed by mapping the process x(t) through a linear ﬁlter. and what our analysis implies is that virtually every stationary stochastic process can be represented in this way.40) y(t) = ω eiωt µ(ω)dZε (ω).43) Hence ε(t) is identiﬁed as a white-noise process with unit variance. where we have deﬁned (4. Expanding the expression of µ(ω) and interchanging the order of integration and summation gives y(t) = ω eiωt j µj e−iωj dZε (ω) eiω(t−j) dZε (ω) ω (4. may be rewritten as (4.42) ε(t) = ω eiωt dZε (ω). Taking the spectral representation of the process x(t) to be (4. = 2π (4.

In the case of the process deﬁned in equation (7). equation (48) can be written as fy (ω) = (4. the result is specialised to give fy (ω) = |µ(ω)|2 fε (ω) (4. It follows that.46) = eiωt dZy (ω).48) = Let µ(z) = 2 σε |µ(ω)|2 . Then |µ(z)|2 = µ(z)µ(z −1 ) (4.D.50) 2 σε µ(z)µ(z −1 ) 2π 1 2 = σε µj µj+τ z τ .49) = τ j µj µj+τ z τ . 2π µj z j denote the z-transform of the sequence {µj }. ω It follows that the spectral density function fy (ω) of the ﬁltered process y(t) is given by ∗ fy (ω)dω = E{dZy (ω)dZy (ω)} ∗ = µ(ω)µ∗ (ω)E{dZx (ω)dZx (ω)} (4.G.S. when z = e−iω . j eiωt On writing µj e−iωj = µ(ω). this becomes y(t) = eiωt µ(ω)dZx (ω) ω (4. where y(t) is obtained by ﬁltering a white-noise sequence.45) = = ω eiω(t−j) dZx (ω) µj e−iωj dZx (ω).47) = |µ(ω)|2 fx (ω)dω. 2π τ j 64 . POLLOCK : TIME SERIES AND FORECASTING we have y(t) = j µj x(t − j) µj j ω (4.

The importance of this theorem is that it provides a link between the time domain and the frequency domain. can be written as (4. |µ(ω)|2 = (4. This is known as the Wiener– Khintchine theorem. The function θ(ω).54) y(t) = −π ei{ωt−θ(ω)} |µ(ω)|dZx (ω).53) j=0 µj cos(ωj) + j=0 µj sin(ωj) .S.51) ∞ τ =−∞ e−iωτ γτ ∞ 1 γ0 + 2 = γτ cos(ωτ ) . The Gain and Phase The complex-valued function µ(ω). according to equation (10). The substitution of expression (52) in equation (46) gives π (4. which is described as the gain of the ﬁlter. which is entailed in the process of linear ﬁltering. θ(ω) = arctan µj sin(ωj) µj cos(ωj) The function |µ(ω)|. indicates the extent to which the cyclical components are displaced along the time axis.D.52) where ∞ 2 ∞ 2 µ(ω) = |µ(ω)|e−iθ(ω) . γτ = σε j µj µj+τ is the autocovariance of lag τ of the process y(t). indicates the extent to which the amplitude of the cyclical components of which x(t) is composed are altered in the process of ﬁltering. Therefore. 2π τ =1 which indicates that the spectral density function is the Fourier transform of the autocovariance function of the ﬁltered sequence. 65 .G. the function fy (ω) can be written as 1 fy (ω) = 2π (4. POLLOCK : THE FREQUENCY DOMAIN 2 But. which is described as the phase displacement and which gives a measure in radians. The importance of this equation is that it summarises the two eﬀects of the ﬁlter.

they constitute the conditions of weak or 2nd-order stationarity. . Notice that. . . xτ +s ) = γ|t−s| . . xj ). γn−1 γ1 γ0 γ1 . xτ +n } denote n consecutive elements of the sequence. . The element of the sequence at the point t = τ is xτ = x(τ ). .LECTURE 5 Linear Stochastic Models Autcovariances of a Stationary Process A temporal stochastic process is simply a sequence of random variables indexed by a time subscript. . . γn−3 . . Let {xτ +1 . . . . . .. This means that any two segments of the sequence of equal length have identical probability density functions. γn−1 .2) wherein the generic element in the (i. then the two conditions are suﬃcient to establish strict stationarity. . if the elements of the sequence are normally distributed. . The latter condition means that the covariance of any two elements depends only on their temporal separation |t − s|. . On their own. . the decision on where to place the time origin is arbitrary. γn−3 . x2 . . In consequence. γ0 (5. The condition on the covariances implies that the dispersion matrix of the vector [x1 . Given that a sequence of observations of a time series represents only a segment of a single realisation of a stochastic process. xτ +2 . Some further implications of stationarity are that (5.1) E(xt ) = µ < ∞ for all t and C(xτ +t .. . γn−2 γ2 γ1 γ0 . and the argument τ can be omitted. . one might imagine that there is little chance of making valid inferences about the parameters of the process. γn−2 . . xn ] is a bisymmetric Laurent matrix of the form γ 0 γ1 Γ = γ2 . .. . 66 . Then the process is said to be strictly stationary if the joint probability distribution of the elements does not depend on τ regardless of the size of n. j)th position is γ|i−j| = C(xi . . Such a process can be denoted by x(t). .

this can be written in autoregressive form as (5.G. we have to write (5. . A moving-average process is clearly stationary since any two elements yt and ys represent the same function of the vectors [εt . . ε(t) = (1 − θL)−1 y(t) = y(t) + θy(t − 1) + θ2 y(t − 2) + · · · . 67 . The representation is available only if all the roots of the equation µ(z) = µ0 + µ1 z + · · · + µq z q = 0 lie outside the unit circle. Then. In addition to the condition of stationarity. . Moving-Average Processes The qth-order moving average process. εs−1 . . . As an example. This is an inﬁnite-order autoregressive representation of the process. εs−q ] which are identically distributed.S. The equation can be written in summary notation as y(t) = µ(L)ε(t). or MA(q) process. is deﬁned by the equation (5. where ε(t). provided that the process x(t) is stationary and provided that the statistical dependencies between widely separated elements of the sequence are weak. it is possible to estimate consistently those parameters of the process which express the dependence of proximate elements of the sequence. let us consider the ﬁrst-order moving-average process which is deﬁned by (5. where µ(L) = µ0 + µ1 L + · · · + µq Lq is a polynomial in the lag operator. εt−q ] and [εs .D.6) y(t + 1) = ε(t + 1) − θε(t) = −θ(1 − L−1 /θ)ε(t). This conclusion follows from our discussion of partial fractions.3) y(t) = µ0 ε(t) + µ1 ε(t − 1) + · · · + µq ε(t − q). The equation is normalised either by setting µ0 = 1 or by 2 setting V {ε(t)} = σε = 1. . . is a white-noise process consisting of a sequence of independently and identically distributed random variables with zero expectations. If one is prepared to make suﬃciently strong assumptions about the nature of the process.4) y(t) = ε(t) − θε(t − 1) = (1 − θL)ε(t). εt−1 . to obtain a convergent series. which has E{ε(t)} = 0. Provided that |θ| < 1. then a knowledge of such parameters may be all that is needed for a complete characterisation of the process. . POLLOCK : LINEAR STOCHASTIC MODELS However.5) Imagine that |θ| > 1 instead. it is usually required that a moving-average process should be invertible such that it can be expressed in the form of µ−1 (L)y(t) = ε(t) where the LHS embodies a convergent sum of past values of y(t).

7) ε(t) = −θ−1 (1 − L−1 /θ)−1 y(t + 1) = −θ−1 y(t + 1)/θ + y(t + 2)/θ2 + y(t − 3)/θ3 + · · · . Since ε(t) is a sequence of independently and identically distributed random variables with zero expectations. Consider γτ = E(yt yt−τ ) =E (5. 2 γq = σε µ0 µq . 2 σε .9) Therefore (5. 0 1 q (5. . Now let τ = 0. (5.8) = i j i µi εt−i j µj εt−τ −j µi µj E(εt−i εt−τ −j ). if i = τ + j.D. .11) 2 γ1 = σε (µ0 µ1 + µ1 µ2 + · · · + µq−1 µq ). It is straightforward to generate the sequence of autocovariances from a knowledge of the parameters of the moving-average process and of the variance of the white-noise process. γτ = 0 for all τ > q.G.S. . if i = τ + j. would have no reasonable meaning. . an expression such as this.10) 2 γτ = σε j E(εt−i εt−τ −j ) = 0. q. . . γτ = 0 if τ > 1. This gives 2 γ0 = σε (µ2 + µ2 + · · · + µ2 ). The ﬁrst-order moving-average process y(t) = ε(t) − θε(t − 1) has the following autocovariances: 2 γ0 = σε (1 + θ2 ). µj µj+τ . it follows that (5. Normally. which embodies future values of y(t). 68 .12) 2 γ1 = −σε θ. POLLOCK : TIME SERIES AND FORECASTING where L−1 ε(t) = ε(t + 1). This gives (5. Also. 1. .

±2. . 0 1 + θ2 −θ 1 + θ2 −θ . ±1. .16) 2 γ(z) = σε µ(z)µ(z −1 ). This is denoted by (5. Autoregressive Processes The pth-order autoregressive process. .D. y2 . τ j = τ = i − j.13) D(y) = σε . . . (5. . the dispersion matrix of a qth-order moving-average process has q subdiagonal and q supradiagonal bands of nonzero elements and zero elements elsewhere. 0 2 0 −θ 1 + θ2 .. . . . . It is also helpful to deﬁne an autocovariance generating function which is a power series whose coeﬃcients are the autocovariances γτ for successive values of τ . . The generating function is also called the z-transform of the autocovariance function. yT ] of T consecutive elements from a ﬁrstorder moving-average process..} and γτ = γ−τ . . .G.17) α0 y(t) + α1 y(t − 1) + · · · + αp y(t − p) = ε(t). By referring to the expression for the autocovariance of lag τ of a movingaverage process given under (10).S. . .15) = i j µi µj z i−j µi µj+τ z τ . Consider the convolution µ(z)µ(z −1 ) = i µi z i j µj z −j (5. . 69 . . . or AR(p) process. . . . The autocovariance generating function of the qth-order moving-average process can be found quite readily. . . is deﬁned by the equation (5.. 0 . it can be seen that the autocovariance generating function is just (5. with τ = {0. for a vector y = [y1 . 1 + θ2 In general.14) γ(z) = τ γτ z τ . 0 0 0 . POLLOCK : LINEAR STOCHASTIC MODELS Thus.. the dispersion matrix is −θ 0 . ..

it can be represented in moving-average form as (5. yT ] of T consecutive elements from a ﬁrst-order autoregressive process. the dispersion matrix has the form .21) = 2 σε φτ . φT −2 φ 2 2 σε φ φ 1 . . T −1 T −2 T −3 φ φ φ . (5. . . . φT −3 . .. .. and the result under (9) indicates that 2 γτ = σε φj φj+τ j (5. . This condition enables us to write the autoregressive process as an inﬁnite-order moving-average process in the form of y(t) = α−1 (L)ε(t). . . where α(L) = α0 + α1 L + · · · + αp Lp . . let us consider the ﬁrst-order autoregressive process which is deﬁned by (5. φT −1 1 φ φ2 1 φ . . Provided that the process is stationary with |φ| < 1. For the process to be stationary. . . 1 70 . . .G. The autocovariances of the process can be found by using the formula of (10) which is applicable to moving-average process of ﬁnite or inﬁnite order. . As an example.S. . although it would 2 be possible to set σε = 1 instead. . . .. 1 − φ2 For a vector y = [y1 .18) ε(t) = y(t) − φy(t − 1) = (1 − φL)y(t). The equation can be written in summary notation as α(L)y(t) = ε(t). .19) y(t) = (1 − φL)−1 ε(t) = ε(t) + φε(t − 1) + φ2 ε(t − 2) + · · · . Thus γτ = E(yt yt−τ ) =E (5.20) = i j i φi εt−i j φj εt−τ −j φi φj E(εt−i εt−τ −j ). . y2 .22) D(y) = 1 − φ2 . the roots of the equation α(z) = α0 + α1 z + · · · + αp z p = 0 must lie outside the unit circle.D. . POLLOCK : TIME SERIES AND FORECASTING This equation is invariably normalised by setting α0 = 1.

γp−2 α2 = 0 . . . . . . . . . if τ > 0. γ2 (5. . γp−1 are known. γp from the values α1 . . if τ = 0. . γp−1 α1 0 γ1 γ1 γ0 .24) i αi E(yt−i yt−τ ) = E(εt yt−τ ).26) i αi γτ −i = 2 σε . . γp 1 σε 0 γ0 γ1 . on setting E(yt−i yt−τ ) = γτ −i . . . For an example of the two uses of the Yule–Walker equations. . Therefore. . . . we ﬁnd that (5. . . if τ > 0. . we may consider again the function α(z) = i αi z . . By letting τ = 0. . . The second of these is a homogeneous diﬀerence equation which enables us to generate the sequence {γp . POLLOCK : LINEAR STOCHASTIC MODELS To ﬁnd the autocovariance generating function for the general pth-order i autoregressive process. . let us consider the second-order autoregressive process.23) γ(z) = 2 σε .28) γ2 2 α0 α1 α2 γ0 σε α1 α0 + α2 0 γ1 = 0 . Since an autoregressive process may be treated as an inﬁnite-order movingaverage process. 0. . p in (26). it follows that (5. γ1 .} once p starting values γ0 . α(z)α(z −1 ) For an alternative way of ﬁnding the autocovariances of the pth-order process. . . . 0. we generate a set of p + 1 equations which can be arrayed in matrix form as follows: γ 2 γ1 γ2 . . . . γ0 These are called the Yule–Walker equations. . = α2 α1 α0 γ2 0 71 .. .27) . and they can be used either for 2 generating the values γ0 . αp .S. σε or vice versa. 0 αp γp γp−1 γp−2 . . γp+1 . . consider multiplying i αi yt−i = εt by yt−τ and taking expectations to give (5. .. . . . . if τ = 0. γ1 .G.25) E(εt yt−τ ) = 2 σε .. equation (24) gives (5. Taking account of the normalisation α0 = 1. . 1. . we have γ2 α0 α2 α1 α0 0 0 γ1 γ0 γ1 γ2 γ1 γ0 γ1 α1 = 0 α2 α1 α0 0 γ0 0 0 α2 α1 α0 γ2 γ1 γ0 α2 γ1 (5.D. . In that case.

constitutes the partial autocorrelation function. Then. . .} of such coeﬃcients. .. .G. . y(t − r + 1) are also taken into account. The Partial Autocorrelation Function Let αr(r) be the coeﬃcient associated with y(t − r) in an autoregressive process of order r whose parameters correspond to the autocovariances γ0 . . . . given α0 . ..29) wherein r (5.. by extending the set of rth-order Yule–Walker equations to which these values correspond. for.D. . γ1 .. α2 . let us imagine that we already have the values α0(r) = 1. if αr(r) = 0 and if αp(p) = 0 for all p > r. αr(r) . . The sequence of partial autocorrelations may be computed eﬃciently via the recursive Durbin–Levinson Algorithm which uses the coeﬃcients of the AR model of order r as the basis for calculating the coeﬃcients of the model of order r + 1. γr γr−1 . . This would have rendered the equations more easily intelligible. Notice also how the matrix following the ﬁrst equality is folded across the axis which divides it vertically to give the matrix which follows the second equality. αr(r) indicates the role in explaining the variance of y(t) which is due to y(t − r) when y(t − 1). α2 and σε . . . γ1 . Pleasing eﬀects of this sort often arise in time-series analysis. we can ﬁnd γ0 . Then the sequence {αr(r) . . To derive the algorithm. .. . . Its role in identifying the order of an autoregressive process is evident. 0 γ1 αr(r) g γ0 0 (5. γ0 γ1 2 1 γr+1 σ(r) γr α1(r) 0 . In eﬀect. . It is worth recalling at 2 this juncture that the normalisation σε = 1 might have been chosen instead of α0 = 1. . we can ﬁnd σε and α1 . Con2 versely. γ2 .. γ2 .. 2.S. we can derive the system γ 0 γ1 . . . . . . it represents an alternative way of conveying the information which is present in the sequence of autocorrelations. . γr . then it is clearly implied that the process has an order of r. . . Much of the theoretical importance of the partial autocorrelation function is due to the fact that. = . . . γr γr+1 γ1 γ0 . . POLLOCK : TIME SERIES AND FORECASTING 2 Given α0 = 1 and the values for γ0 . γr−1 γr . . . γ1 . when γ0 is added. r = 1.30) g= j=0 αj(r) γr+1−j 72 with α0(r) = 1.. α1 . whose index corresponds to models of increasing orders. . α1(r) . ..

αr(r+1) αr(r) α1(r) 2 2 σ(r+1) = σ(r) 1 − (αr+1(r+1) )2 . .34) αj(r) γr+1−j j=0 α1(r) αr(r) α1(r+1) . . . . . 0 γ1 α1(r) 2 σ(r) γ0 1 (5. . . . and there is scope for devising a recursive procedure. .D. . . = . .. ... . = . γ0 γ1 2 1 γr+1 σ(r) + cg γr α1(r) + cαr(r) 0 . . . . . .33) c=− g 2 . .. .G. . .35) α1(1) = −γ1 /γ0 and 73 2 σ(1) = γ0 1 − (α1(1) )2 ... .S. . The starting values for the recursion are (5. . . . POLLOCK : LINEAR STOCHASTIC MODELS The system can also be written as γ 0 γ1 .. . γr γr+1 γ1 γ0 .31) The two systems of equations (29) and (31) can be combined to give γ 0 γ1 (5. Thus the solution of the Yule–Walker system of order r + 1 is easily derived from the solution of the system of order r. . σ(r) then the ﬁnal element in the vector on the RHS becomes zero and the system becomes the set of Yule–Walker equations of order r + 1.... from the last element αr+1(r+1) = c through to the variance term 2 σ(r+1) is given by 1 = 2 σ(r) r αr+1(r+1) (5. γr γr+1 γ1 γ0 . .. .. γr γr−1 .32) . = . γ0 γ1 0 γr+1 g γr αr(r) 0 . . ... γr−1 γr . . . . + αr+1(r+1) .. γr γr−1 . γr−1 γr .. 0 γ1 αr(r) + cα1(r) 2 g + cσ(r) γ0 c If we take the coeﬃcient of the combination to be (5. . . . . .. . The solution of the equations.. . .

it can be seen that the autocovariance generating function for the autoregressive moving-average process is (5. which is referred to as the ARMA(p. . it follows that (5.38) i αi γτ −i = i µi δi−τ .D. . This gives (5. The equation is normalised by setting α0 = 1 and by setting either µ0 = 1 2 or σε = 1. Since εt−i is uncorrelated with yt−τ whenever it is subsequent to the latter.36) α0 y(t) + α1 y(t − 1) + · · · + αp y(t − p) = µ0 ε(t) + µ1 ε(t − 1) + · · · + µq ε(t − q). . . and by noting the form of the autocovariance generating function for such a process which is given by equation (16). By considering the moving-average form of the process. q) process. γp+1 . the process can be represented by the equation y(t) = α−1 (L)µ(L)ε(t) which corresponds to an inﬁnite-order moving-average process. Given the q +1 nonzero values δ0 . provided the roots of the equation µ(z) = 0 lie outside the unit circle. Provided that the roots of the equation α(z) = 0 lie outside the unit circle. δq . the equations can be solved recursively to obtain the subsequent values {γp . the process can be represented by the equation µ−1 (L)α(L)y(t) = ε(t) which corresponds to an inﬁnite-order autoregressive process. . is deﬁned by the equation (5. which is of some theoretical interest. δ1 . Since the index i in the RHS of the equation (38) runs from 0 to q. POLLOCK : TIME SERIES AND FORECASTING Autoregressive Moving Average Processes The autoregressive moving-average process of orders p and q.G. does not provide a practical means of ﬁnding the autocovariances. . 74 .}. where γτ −i = E(yt−τ yt−i ) and δi−τ = E(yt−τ εt−i ).S.37) 2 γ(z) = σε µ(z)µ(z −1 ) . A more summary expression for the equation is α(L)y(t) = µ(L)ε(t). it follows that δi−τ = 0 if τ > i. . .39) i αi γi−τ = 0 if τ > q. γp−1 for the autocovariances. To ﬁnd these. γ1 . let us consider multiplying the equation i αi yt−i = i µi εt−i by yt−τ and taking expectations. α(z)α(z −1 ) This generating function. . and p initial values γ0 . Conversely. . .

44) 2 δ 2 δ1 δ0 0 0 σε α2 µ2 When the latter equations are written as α0 0 δ0 µ0 0 2 α1 α0 0 δ1 = σε µ1 . .46) α2 µ2 0 α1 α0 γ2 0 δ2 they can be solved for γ0 . Example. α2 and σε are known.43) γ2 γ1 γ0 α2 0 0 δ0 µ2 Multiplying by εt .45) α2 α1 α0 δ2 µ2 they can be solved recursively for δ0 . α1 . (5. µτ σε − α0 i=1 and.40) i 2 αi δτ −i = µτ σε . This gives (5. where δτ −i = E(yt−i εt−τ ). consider multiplying the equation i αi yt−i = i µi εt−i by εt−τ and taking expectations. γ1 and γ2 . (5. . . yt−1 and yt−2 and taking expectations gives α0 δ 0 δ1 δ2 µ0 γ0 γ1 γ2 γ1 γ0 γ1 α1 = 0 δ0 δ1 µ1 . The equation may be rewritten as (5. when we adopt the 2 normalisation α0 = µ0 = 1. we get δ0 = σε . POLLOCK : LINEAR STOCHASTIC MODELS To ﬁnd the requisite values δ0 . δ1 and δ2 on the assumption that that 2 the values of α0 .D. 75 . . δ1 . Consider the ARMA(2.42) α0 yt + α1 yt−1 + α2 yt−2 = µ0 εt + µ1 εt−1 + µ2 εt−2 . 1. . Notice that. . q. . γ4 . by setting τ = 0.} of the autocovariances. δq . . .47) α0 γτ + α1 γτ −1 + α2 γτ −2 = 0. δ1 . δq . . εt−1 and εt−2 and taking expectations gives 2 0 α0 σε 0 µ0 δ0 0 0 2 δ1 δ0 0 α1 = 0 σε 0 µ1 . . Multiplying by yt . . (5.41) δτ = 1 2 δτ −i . Thus the starting values are obtained which enable the equation (5.S. .G. τ >2 to be solved recursively to generate the succeeding values {γ3 . we can generate recursively the required values δ0 . . When the equations (43) are rewritten as γ0 µ0 µ1 µ2 α1 α2 δ0 α0 α1 α0 + α2 0 γ1 = µ1 µ2 0 δ1 . . (5. 2) model which gives the equation (5.

meteorology [9]. the control of continuous process plants [20]. we have in mind a sequence of random variables y(t) = {yt . seismology [21]. . ys ) = E{(yt − µ)(ys −µ)} = γ|t−s| which depend only on the temporal separation τ = |t−s| of the dates t and s and not on their absolute values. When we speak of a weakly stationary or covariance-stationary process. in the main.}. which have a common ﬁnite expected value E(xt ) = µ and a set of autocovariances C(yt . we review the deﬁnitions of mathematical expectations and covariances. communications engineering and signal processing [16]. so that the behaviour during one epoch is the same as it would be during any other. The University of London The methods to be presented in this lecture are designed for the purpose of analysing series of statistical observations taken at regular intervals in time. neurology and electroencephalography [1]. There are two distinct yet broadly equivalent modes of time-series analysis which may be pursued. The methods have a wide range of applications. This is a way of expressing the notion that the events of the past have a diminishing eﬀect upon the present as they recede in time. and they lead inevitably towards the construction of structural or parametric models of the autoregressive moving-average type for 1 .G.S. oceanography [11]. The Frequency Domain and the Time Domain The methods apply. and economics [10]. On the one hand are the time-domain methods which have their origin in the classical theory of correlation. [25]. . Pollock Queen Mary and Westﬁeld College. and this list is by no means complete.THE METHODS OF TIME-SERIES ANALYSIS by D. ±2. In an appendix to the paper. Such methods deal preponderantly with the autocovariance functions and the cross-covariance functions of the series. representing the potential observations of the process. We also commonly require of such a process that lim(τ → ∞)γτ = 0 which is to say that the correlation between increasingly remote elements of the sequence tends to zero. ±1. We can cite astronomy [18]. to what are described as stationary or non-evolutionary time series. Such series manifest statistical properties which are invariant throughout time. 1. . t = 0.

where αj = ρj cos θj and βj = ρj sin θj . it follows that the variance of y(t) is expressible as V {y(t)} = 1 2 2 2 j ρj + σε where σε = V {ε(t)} is the variance of the noise.THE METHODS OF TIME-SERIES ANALYSIS single series and of the transfer-function type for two or more causally related series. In essence. and where ε(t) is a sequence of independently and identically distributed random variables which we call a white-noise process. and since cosine functions at diﬀerent frequencies are uncorrelated. Harmonic Analysis The astronomers are usually given credit for being the ﬁrst to apply the methods of Fourier analysis to time series. is just one half. to whatever degree of accuracy is desired. Their endeavours could be described as the search for hidden periodicities within astronomical data. √ 2 2 The factor ρj = (αj + βj ) is called the amplitude of the jth periodic component. Many of the methods which are used to estimate the parameters of these models can be viewed as sophisticated variants of the method of linear regression. 2 The periodogram is simply a device for determining how much of the variance of y(t) is attributable to any given harmonic component. The relevant methods were developed over a long period of time. who in 1889 propounded the technique of periodogram analysis. we should probably credit Sir Arthur Schuster [17]. 2. these frequency-domain methods envisaged a model underlying the observations which takes the form of y(t) = (1) = j j ρj cos(ωj t − θj ) + ε(t) αj cos(ωj t) + βj sin(ωj t) + ε(t). On the other hand are the frequency-domain methods of spectral analysis. which is also called its mean-square deviation. Thus the model depicts the series y(t) as a weighted sum of perfectly regular periodic components upon which is superimposed a random component. over a ﬁnite interval. and it indicates the importance of that component within the sum. The Dutchman Buys-Ballot [6] propounded eﬀective computational procedures for the statistical analysis of astronomical data in 1847. by taking a weighted sum of sine and cosine functions of harmonically increasing frequencies. with being the progenitor of the modern methods for analysing time series in the frequency domain. Typical examples were the attempts to uncover periodicities within the activities recorded by the Wolfer sunspot index and in the indices of luminosity of variable stars. any analytic function can be approximated. Lagrange [13] suggested methods for detecting hidden periodicities in 1772 and 1778. Its value at 2 . Since the variance of a cosine function. These are based on an extension of the methods of Fourier analysis which originate in the idea that. However.

then it can be shown that 2I(ωj )/T is a consistent estimator of ρ2 in the sense that it converges to the latter in probability as the size T of the j sample of the observations on y(t) increases. Graph of a sine function with small random ﬂuctuations superimposed. calculated from a sample y0 .S. 0 10 20 30 40 50 60 70 80 90 Figure 1. 3 . The graph of a sine function. = 2 If y(t) does indeed comprise only a ﬁnite number of well-deﬁned harmonic components. . is given by 2 T 2 2 I(ωj ) = (2) yt cos(ωj ) t + t yt sin(ωj ) T 2 a (ωj ) + b2 (ωj ) . yT −1 comprising T observations on y(t). . POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS ωj = 2πj/T .D. .G. 0 10 20 30 40 50 60 70 80 90 Figure 2. .

given a suﬃcient number of periods. and. 4 . We should not quote this passage without mentioning that Yule proceeded to question whether the hypothesis underlying periodogram analysis. however large the errors. and superpose on the ordinates small random errors. the graph becomes more irregular. leaving the suggestion of periodicity still clear to the eye. 150 100 50 0 1750 150 1760 1770 1780 1790 1800 1810 1820 1830 100 50 0 1840 1850 1860 1870 1880 1890 1900 1910 1920 Figure 3. the suggestion of periodicity more obscure. periodogram analysis is applicable to such a curve. was an appropriate hypothesis for all cases. and we have only suﬃciently to increase the “errors” to mask completely any appearance of periodicty. But. should yield a close approximation to the period and amplitude of the underlying harmonic function. which postulates the equation under (1). Wolfer’s Sunspot Numbers 1749–1924. If the errors are increased in magnitude. the only eﬀect is to make the graph somewhat irregular.THE METHODS OF TIME-SERIES ANALYSIS The process by which the ordinates of the periodogram converge upon the squared values of the harmonic amplitudes was well expressed by Yule [24] in a seminal article of 1927: If we take a curve representing a simple harmonic function of time.

Then any deviations from perfectly harmonic motion which might be recorded must be the result of errors of observation which could be all but eliminated if a long sequence of observations were subjected to a periodogram analysis. This led to the suggestion that what was being observed was actually a two-star system wherein the larger star periodically masked the smaller brighter star. Other applications of the method of periodogram analysis were even less successful. such.D. Somewhat less successful were the attempts of Arthur Schuster himself [18] in 1906 to substantiate the claim that there is an eleven-year cycle in the activity recorded by the Wolfer sunspot index. A classic expression of disbelief was made by Slutsky [19] in another article of 1927: Suppose we are inclined to believe in the reality of the strict periodicity of the business cycle. showed that the series recording the brightness or magnitude of the star T.S. for example. reproduces the same sinusoidal wave which rises and falls on the surface of the social ocean with the regularity of day and night? 3. Autoregressive and Moving-Average Models The next major episode in the history of the development of time-series analysis took place in the time domain. Wherein lies the source of this regularity? What is the mechanism of causality which. Then we should encounter another diﬃculty.G. in 1924. The motion is now aﬀected not by superposed ﬂuctuations but by true disturbances. In a wonderfully ﬁgurative exposition. and one application which was a signiﬁcant failure was its use by William Beveridge [2. we ﬁnd a rejection of the model with deterministic harmonic components in favour of models more ﬁrmly rooted in the notion of random causes. decade after decade. as the eight-year period postulated by Moore [14]. Next. The periodogram of this data had so many peaks that at least twenty possible hidden periodicities could be picked out. and this seemed to be many more than could be accounted for by plausible explanations within the realm of economic history. In both articles. 3] in 1921 and 1922 to analyse a long series of European wheat prices. 5 . and it began with the two articles of 1927 by Yule [24] and Slutsky [19] from which we have already quoted. Yule enjoined the reader to imagine that the regular swing of the pendulum is interrupted by small boys who get into the room and start pelting the pendulum with peas sometimes from one side and sometimes from the other. Such experiences seemed to point to the inappropriateness to economic circumstances of a model containing perfectly regular cycles. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS A highly successful application of periodogram analysis was that of Whittaker and Robinson [22] who. Yule invited his readers to imagine a pendulum attached to a recording device and left to swing. Ursa Major over 600 days could be ﬁtted almost exactly by the sum of two harmonic functions with periods of 24 and 29 days.

the pendulum will still manifest a deceptively regular motion which is liable. in such a case. Given the initial values y−1 and y−2 . The general solution of the corresponding homogeneous diﬀerence equation has a damping factor of ρ = 0. he estimated the values φ1 = 1.95. This sequence will show a regular pattern of behaviour whose nature depends on the parameters φ1 and φ2 .655. Its haphazard inﬂuence has replaced the steady force of the falling weights. to be misinterpreted as the eﬀect of an underlying mechanism. Yule contrives a perfect analogy for the autoregressive time-series model. . the white noise has actually become the engine which drives the pendulum by striking it randomly in one direction and another. . instead of masking the regular periodicity of the pendulum. In Figure 4.96o The angular velocity indicates a period of 10. Now. is now the damping factor which is responsible for the attenuation of the swing as the time t elapses. the general solution to the diﬀerence equation will take the form of (4) y(t) = αρt cos(ωt − θ). The homogeneous diﬀerence equation which corresponds to the latter has the same value of ω as before. once more. this equation can be used recursively to generate an ensuing sequence {y0 . and this increase accounts for the greater regularity of the second series.6 years which is a little shorter than the 11-year period obtained by Schuster in his periodogram analysis of the same data. In his article of 1927. Nevertheless. which has a value between 0 and 1.809 and an angular velocity of ω = 33. then the sequence of values will show a damped sinusoidal behaviour just as a clock pendulum will which is left to swing without the assistance of the falling weights. Yule attempted to explain the Wolfer index in terms of the second-order autoregressive model of equation (5). . If these parameters are such that the roots of the quadratic equation z 2 − φ1 z − φ2 = 0 are complex and less than unity in modulus. From the empirical autocovariances of the sample represented in Figure 3. To explain the analogy. if the sequence of observations is short and contains insuﬃcient contrary evidence. y1 . In fact.}.576y(t − 1) − 0. Its damping factor has the value ρ = 0.903y(t − 2) + ε(t).THE METHODS OF TIME-SERIES ANALYSIS In this example. The autoregressive model which Yule was proposing takes the form of (5) y(t) = φ1 y(t − 1) + φ2 y(t − 2) + ε(t). where the modulus ρ. we show a series which has been generated artiﬁcially from the Yule’s equation together with a series generated by the equation y(t) = 1. a white-noise sequence.343 and φ2 = −0. 6 . let us begin by considering a homogeneous second-order diﬀerence equation of the form (3) y(t) = φ1 y(t − 1) + φ2 y(t − 2). where ε(t) is.

343y(t − 1) − 0.655y(t − 2) + ε(t). the two authors grasped opposite ends of the same pole. A series generated by the equation y(t) = 1. Neither of our two series accurately mimics the sunspot index. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS 0 10 20 30 40 50 60 70 80 90 Figure 4.G.576y(t − 1) − 0. 0 10 20 30 40 50 60 70 80 90 Figure 5. For ten years. To relieve this constraint. A series generated by Yule’s equation y(t) = 1.S. A transformed series could be more closely mimicked. Slutsky’s paper was available only in its 7 . The contributions to time-series analysis made by Yule [24] and Slutsky [19] in 1927 were complementary: in fact. although the second series seems closer to it than the series generated by Yule’s equation. we might apply to Wolf’s numbers yt a transformation of the form log(yt + λ) or of the more general form (yt + λ)κ−1 . An obvious feature of the sunspot index which is not shared by the artiﬁcial series is the fact that the numbers are constrained to be nonnegative.D. such as has been advocated by Box and Cox [4].903y(t − 2) + ε(t).

he asked. He discriminated between coherent series whose elements were serially correlated and incoherent or purely random series of the sort which we have described as white noise. but it seems probable that an especially prominent role is played in nature by the process of moving summation with weights of one kind or another. and in much the same manner. and it is readily compared to an autoregressive process of the sort depicted under (5). whereas the autoregressive process depends upon a linear combination of the function y(t) with its own lagged values. a purely random series obtained by the People’s Commissariat of Finance in drawing the numbers of a government lottery loan. a ﬁnite-order moving-average process is just an inﬁnite-order autoregressive process. Slutsky posed the same question as did Yule. that a deﬁnite structure of a connection between chaotically random elements could form them into a system of more or less regular waves? Slutsky proceeded to demonstrate this possibility by methods which were partly analytic and partly inductive. he declared that their origin may be extremely varied. Slutsky was able to generate a series which closely mimicked an index. conversely. Was it possible. This is nowadays called a qth-order movingaverage process. The general form of Slutsky’s moving summation can be expressed by writing (6) y(t) = µ0 ε(t) + µ1 ε(t − 1) + · · · + µq ε(t − q). The more general pth-order autoregressive process can be expressed by writing (7) α0 y(t) + α1 y(t − 1) + · · · + αp y(t − p) = ε(t). and by repeatedly taking moving summations. by this process coherent series are obtained from other coherent series or from incoherent series. Thus. By taking.THE METHODS OF TIME-SERIES ANALYSIS original Russian version. of the English business cycle from 1855 to 1877. where ε(t) is a white-noise process. of a distinctly undulatory nature. the moving-average process depends upon a similar combination of the function ε(t) with its lagged values. 8 . as his basis. The aﬃnity of the two sorts of process is further conﬁrmed when it is recognised that an autoregressive process of ﬁnite order is equivalent to a moving-average process of inﬁnite order and that. but its contents became widely known within a much shorter period. As to the coherent series.

we assume that dA(ω) and dB(ω) have a common variance so that (11) V {dA(ω)} = V {dB(ω)} = dG(ω). and by deleting the term ε(t). we obtain an expression in the form of π (8) y(t) = 0 cos(ωt)dA(ω) + sin(ωt)dB(ω) . We may therefore assume that. a fruitful interpretation can be given to these functions if we consider the observable sequence y(t) = {yt . For. 9 and . then the generalised functions A(ω) and B(ω) will reﬂect the unique peculiarities of that realisation and will not be amenable to any systematic description. which means that A(ω) and B(ω) are stochastic processes—indexed on the frequency parameter ω rather than on time—which are uncorrelated in nonoverlapping intervals. whose only regularities are statistical ones. t = 0. E{dA(ω)} = E{dB(ω)} = 0 (9) E{dA(ω)dB(ω)} = 0. if we are talking of only a single realisation of the process y(t). .} to be a particular realisation which has been drawn from an inﬁnite population representing all possible realisations of the process. ±1. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS 4. . Moreover. if this population is subject to statistical regularities. whose eﬀect is now absorbed by the integrand. Generalised Harmonic Analysis The next step to be taken in the development of the theory of time series was to generalise the traditional method of periodogram analysis in such a way as to overcome the problems which arise when the model depicted under (1) is clearly inappropriate. we have (10) E{dA(ω)dA(λ)} = E{dB(ω)dB(λ)} = 0. it would not seem possible to describe a covariance-stationary process. we assume that. Here we write dA(ω) and dB(ω) rather than α(ω)dω and β(ω)dω because there can be no presumption that the functions A(ω) and B(ω) are continuous. as a linear combination of perfectly regular periodic components. However. . this expression is devoid of any statistical interpretation.G. However any diﬃculties which we might envisage can be overcome if we are prepared to accept a description which is in terms of a non-denumerable inﬁnity of periodic components. ±2. Moreover. for any two values ω and λ in their domain. for any value of ω. At ﬁrst sight. then it is reasonable to regard dA(ω) and dB(ω) as mutually uncorrelated random variables with well-deﬁned distributions which depend upon the parameters of the population. Finally.S. As it stands.D. to express the discontinuous nature of the generalised functions. Thus. on replacing the so-called Fourier sum within equation (1) by a Fourier integral.

it therefore follows from (8) that the variance of y(t) is expressible as π V {y(t)} = (12) = 0 0 π cos2 (ωt)V {dA(ω)} + sin2 (ωt)V {dB(ω)} dG(ω).655y(t − 2) + ε(t) which generated the series in Figure 4. The new function g(ω). is directly analogous to the function expressing the squared amplitude which is associated with each component in the simple harmonic model discussed in our earlier sections. which is called the spectral density function or the spectrum. If none of these components contributes more than an inﬁnitesimal amount to the total variance. then the function G(ω) is absolutely continuous. tells us how much of the variance is attributable to the periodic components whose frequencies range continuously from 0 to ω. The function G(ω). A series of a more regular nature would be generated if the spectrum were more narrowly concentrated around its modal value. and we can write dG(ω) = g(ω)dω under the integral of equation (11). Given the assumption of the mutual uncorrelatedness of dA(ω) and dB(ω). which is called the spectral distribution.THE METHODS OF TIME-SERIES ANALYSIS 5 4 3 2 1 0 π/4 π/2 3π/4 π Figure 6. 10 .343y(t − 1) − 0. The spectrum of the process y(t) = 1.

which for many years inhibited its widespread use. it bases itself upon the ordinates of a periodogram which has been ﬁtted to the observations on y(t). by passing a curve through the mid points of the tops of the bars. Moreover. Instead of basing itself on raw sample observations as does the method of density-function estimation. Even more remarkable is the way in which 11 .G. Instead. A disadvantage of the procedure. A more sophisticated estimation procedure would not group the observations into the ﬁxed intervals of a histogram. which aims at converging upon the true function as the number of observations increases.S. a consistent method of estimation. Then. the problems posed by the estimation of the spectral density are similar to those posed by the estimation of a continuous probability density function of unknown functional form. a modiﬁed version of the traditional periodogram analysis is suﬃcient for the purpose of estimating the spectral density. diminishing it suﬃciently slowly as the sample size increases for the number of sample points falling within any interval to increase without bound. In fact.D. the single realisation contains all the information which is necessary for the estimation of the spectral density function. it was not until the mid 1960’s that wholly practical computational methods were developed. provided that y(t) is a stationary process. Indeed. and provided that the statistical dependencies between widely separated elements are weak. would vary the width of the moving interval with the size of the sample. This procedure for spectral estimation is therefore called smoothing the periodogram. The Equivalence of the Two Domains It is remarkable that such a simple technique as smoothing the periodogram should provide a theoretical resolution to the problems encountered by Beveridge and others in their attempts to detect the hidden periodicities in economic and astronomical data. Smoothing the Periodogram It might be imagined that there is little hope of obtaining worthwhile estimates of the parameters of the population from which the single available realisation y(t) has been drawn. In some respects. we could construct an envelope that might approximate to the sought-after density function. lies in the fact that calculating the periodogram by what would seem to be the obvious methods by can be vastly time-consuming. we might set about our task by constructing a histogram or bar chart to show the relative frequencies with which the observations that have been drawn from the distribution fall within broad intervals. 6. instead it would record the number of observations falling within a moving interval. A common method for estimating the spectral density is very similar to the one which we have described for estimating a probability density function. However. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS 5. It is fruitless to attempt directly to estimate the ordinates of such a function.

once more. the equation (12) which neatly expresses the way in which the variance γ0 = V {y(t)} of the series y(t) is attributable to the constituent harmonic components. by writing. . The relationship is invertible in the sense that it is equally possible to express each of the autocovariances as a function of the spectral density: π (14) γτ = ω=0 cos(ωτ )g(ω)dω. . τ = 0. . then cos(ωτ ) = 1. we demonstrate the identity T −1 (15) I(ωj ) = 2 t=1−T cτ cos(ωj τ ) . 1. in one direction. . T − 1} where cτ = (yt − y )(yt−τ − y )/T . The relationship is expressed. . and we obtain. for g(ω) is simply the expected value of the squared amplitude of the component at frequency ω. in the ¯ ¯ appendix. . 1.THE METHODS OF TIME-SERIES ANALYSIS the generalised harmonic analysis that gave rise to the concept of the spectral density of a time series should prove itself to be wholly conformable with the alternative methods of time-series analysis in the time domain which arose largely as a consequence of the failure of the traditional methods of periodogram analysis.}. cτ = c−τ . . a problem which appears to be intractable from the point of view of one of the domains of time-series analysis becomes quite manageable when 12 . 2. τ = 0. . The synthesis of the two branches of time-series analysis was achieved independently and almost simultaneously in the early 1930’s by Norbert Wiener [23] in America and A. . The Wiener–Khintchine theorem indicates that there is a one-to-one relationship between the autocovariance function of a stationary process and its spectral density function. The upshot of the Wiener–Khintchine theorem is that many of the techniques of time-series analysis can.} is the sequence of the autocovariances of the series y(t). (13) 1 g(ω) = 2π ∞ γτ cos(ωτ ) τ =−∞ . 1. . γτ = γ−τ . Khintchine [12] in Russia. where g(ω) is the spectral density function and {γτ . An analogous relationship holds between the periodogram I(ωj ) deﬁned in (2) and the sample autocovariance function {cτ . If we set τ = 0. τ = 0. Often. Thus. We have stated the relationships of the Wiener–Khintchine theorem in terms of the theoretical spectral density function g(ω) and the true autocovariance function {γτ . 2. be expressed in two mathematically equivalent ways which may diﬀer markedly in their conceptual qualities. in theory.

. . τ = 0. . the operation which we perform upon the sample autocovariances is simple. A good example is provided by the matter of spectral estimation. Together with the task of devising equivalent procedures for smoothing the periodogram. then an eﬀect which is much the same as that of smoothing the periodogram can be achieved. 1. . Moreover.G. T − 1}. 40 30 20 10 0 π/4 π/2 3π/4 π Figure 7. it may be relatively straightforward to calculate the weighted autocovariance function. The periodogram of Wolfer’s Sunspot Numbers 1749–1924. T − 1} in (15) is replaced by a modiﬁed sequence {wτ cτ . . 1. 1. of course. if the sequence of autocovariances {cτ . . it came to be known as spectral carpentry. . . Given that there are diﬃculties in computing all T of the ordinates of the periodogram when the sample size is large.D. The task of devising appropriate sets of weights provided a major research topic in time-series analysis in the 1950’s and early 1960’s. In fact. . τ = 0. T − 1} incorporating a specially devised set of declining weights {wτ . . though there is. no guarantee that it will be easy to perform. we are impelled to look for a method of spectral estimation which depends not upon smoothing the periodogram but upon performing some equivalent operation upon the sequence of autocovariances. For. . The fact that there is a one-to-one correspondence between the spectrum and the sequence of autocovariances assures us that this equivalent operation must exist. 13 .S. τ = 0. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS translated into the other domain. .

4 0. 7.2 0 π/4 π/2 3π/4 π Figure 8. 14 . allied with advances in computer technology. A major development in the frequency domain occurred when Cooley and Tukey [7] described an algorithm which greatly reduces the eﬀort involved in computing the periodogram. it seems that time-series analysis reached its maturity in the 1970’s when signiﬁcant developments occurred in both of its domains. These authors developed the timedomain methodology by collating some of its major themes and by applying it to such important functions as forecasting and control. Many of the current practitioners of time-series analysis have learnt their skills in recent years during a time when the subject has been expanding rapidly. The Fast Fourier Transform. The contemporaneous developments in the time domain were inﬂuenced by an important book by Box and Jenkins [5].8 0.THE METHODS OF TIME-SERIES ANALYSIS 0. They demonstrated how wide had become the scope of time-series analysis by applying it to problems as diverse as the forecasting of airline passenger numbers and the analysis of combustion processes in a gas furnace. The spectrum of the sunspot numbers calculated from the autocovariances using Parzen’s [15] system of weights. They also adapted the methodology to the computer. One might be surprised to hear. has enabled the routine analysis of extensive sets of data. and it has transformed the procedure of smoothing the periodogram into a practical method of spectral estimation. The Maturing of Time-Series Analysis In retrospect. it is diﬃcult for them to gauge the signiﬁcance of the recent practical advances. as this algorithm has come to be known. Lacking a longer perspective. for example.6 0.

D. . Nowadays. to grapple with problems which are nowadays almost routine. t = 0.G. to which the idea of the harmonic decomposition of a function is central. d’Alembert (1717–1783). with very limited success. The probability distribution function is deﬁned by the expression F (x∗ ) = P {x < x∗ } which denotes the probability that x assumes a value less than x∗ . If y(t) = {yt . comprising 300 ordinates. computations of this order are performed on a routine basis using microcomputers containing specially designed chips which are dedicated to the purpose. then E(yt ) = µ is the same value for all t. ±1. Mathematical Appendix Mathematical Expectations The mathematical expectation or the expected value of a random variable y is deﬁned by ∞ (i) E(x) = x=−∞ xdF (x). some of which are displayed in London’s Science Museum. These devices. is an inheritance from Euler (1707–1783). was the most extensive calculation of its type to date. The eﬀect of the revolution in digital electronic computing upon the practicability of time-series analysis can be gauged by inspecting the purely mechanical devices (such as the Henrici–Conradi and Michelson–Stratton harmonic analysers invented in the 1890’s) which were once used. The parts of time-series analysis which bear a truly 20thcentury stamp are the time-domain models which originate with Slutsky and Yule and the computational technology which renders the methods of both domains practical. If F (x) is continuous function.} is a stationary stochastic process. . then we can write dF (x) = f (x)dx in equation (i). . The rapidity of the recent developments also belies the fact that time-series analysis has had a long history. Lagrange (1736–1813) and Fourier (1768–1830). The search for hidden periodicities was a dominant theme of 19th century science. 15 . It has been transmogriﬁed through the reﬁnements of Wiener’s Generalised Harmonic Analysis which has enabled us to understand how cyclical phenomena can arise out of the aggregation of random causes. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS that as late as 1971 Granger and Hughes [8] were capable of declaring that Beveridge’s calculation of the Periodogram of the Wheat Price Index. The function f (x) = dF (x)/dx is called the probability density function. also serve to remind us that many of the developments of applied mathematics which startle us with their modernity were foreshadowed many years ago.S. ±2. The frequency domain of time-series analysis. where F (x) is the probability distribution function of x.

It follows that (v) γτ = E(yt − µ)E(yt−τ − µ) = 0 for all τ = 0. then their joint probability density function is the product of their individual probability density functions so that f (yt . . which is denoted by V {y(t)} = γ0 and deﬁned by (iv) γ0 = E (yt − µ)2 . ¯ ¯ t=τ The periodogram and the autocovariance function The periodogram is deﬁned by (vii) 2 I(ωj ) = T T −1 2 T −1 2 cos(ωj t)(yt − y ) ¯ t=0 + t=0 sin(ωj t)(yt − y ) ¯ . yt−τ ) = f (yt )f (yt−τ ). yT −1 is a sample of T values generated by the process. It is formally the autocovariance of lag zero. . . . . . then we may estimate γτ from the sample autocovariance or empirical autocovariance of lag τ : (vi) 1 cτ = T T −1 (yt − y )(yt−τ − y ).THE METHODS OF TIME-SERIES ANALYSIS If y0 . If y0 . yT is a sample from the process. . If yt and yt−τ are statistically independent. 16 . . The variance. and if τ < T . t=0 Autocovariances The autocovariance of lag τ of the a stationary stochastic process y(t) is deﬁned by (iii) γτ = E{(yt − µ)(yt−τ − µ)}. The autocovariance of lag τ provides a measure of the relatedness of the elements of the sequence y(t) which are separated by τ time periods. is a measure of the dispersion of the elements of y(t). then we may estimate µ from the sample mean (ii) 1 y= ¯ T T −1 yt .

[7] Cooley.D. W. 17 . Sir W. M.” Economic Journal. “Weather and Harvest Cycles. References [1] Alberts. Jenkins (1970). by using the identity cos(A) cos(B) + sin(A) sin(B) = cos(A − B). we can ¯ ¯ (x) I(ωj ) = 2 τ =1−T cos(ωj τ )cτ . “An Analysis of Transformations. t cos(ωj t) = 0 for all j. [2] Box. 429–452. “Physiological Mechanisms of Tremor and Rigidity in Parkinsonism. Hence the above expression has the same value as the expression in (2). L. E.” Mathematics of Computation. Forecasting and Control. Time Series Analysis. ¯ ¯ t s t (yt Next. Series B.. D. 26. G. H. 31. and G. and D. 19. “Wheat Prices and Rainfall in Western Europe. Tukey (1965). Sir W. Expanding the expression in (vii) gives I(ωj ) = (viii) + 2 T 2 T cos(ωj t) cos(ωj s)(yt − y )(ys − y ) ¯ ¯ t s sin(ωj t) sin(ωj s)(yt − y )(ys − y ) . (1922). ¯ by construction.” Utrecht. E. 26. which appears in the text as equation (15). [6] Buys–Ballot. J.G. ¯ ¯ t s and.” Conﬁnia Neurologica. C. [2] Beveridge. [3] Beveridge. “Les Changements Periodiques de Temperature. W. we can rewrite this as (ix) I(ωj ) = 2 T cos(ωj [t − s])(yt − y )(ys − y ) . R. P.” Journal of the Royal Statistical Society. (1921). and J. E. W. P. 412–478. Wright and B. G.” Journal of the Royal Statistical Society. 297–301. 211–243. H. Cox (1964). Holden–Day: San Francisco. [4] Box. (1847).S. W. H. “An Algorithm for the Machine Calculation of Complex Fourier Series. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS The identity t cos(ωj t)(yt − y ) = t cos(ωj t)yt follows from the fact that. Feinstein (1965). on deﬁning τ = t − s and writing cτ = reduce the latter expression to T −1 − y )(yt−τ − y )/T . 85. 318–327.

[25] Yuzuriha. Series A.THE METHODS OF TIME-SERIES ANALYSIS [8] Granger. Rosenblatt.” Math. (1963). (1968). 13–41. T. (1957).) 395–422. Hughes (1971).” Blackie and Sons: London. [18] Schuster. “A New Look at Some Old Data: The Beveridge Wheat Price Series. 1283–1289. (ed. “The Autocorrelation Curves of Schizophrenic Brain Waves and the Power Spectrum. 329–348. E. [17] Schuster. H. 383.” Technometrics. John Wiley and Sons: New York.” Philosophical Transactions of the Royal Society. 117–258. and S. 105–146. Exports and other Economic Variables. W. U.” [14] Moore. H. J. (1906). 413–428. “An Application of Stochastic and Dynamic Models for the Control of a Papermaking Process. “The Calculus of Observations. [13] Lagrange. (ed. [16] Rice. 134.” Psych. MacDonald.” Review of Geophysics. (1937). (1914). (1772. 129–174. 206. G.. and E. [24] Yule. 6. J. L. 18 . “Bispectrum of Ocean Waves. W. T. L. W. (1934). G.”Journal of the Royal Statistical Society. (1971). 14 481–496. M. 134. John Wiley and Sons: New York. Wu (1972). “ Data Analysis and the Frontiers of Geophysics.” Science. Series A. [9] Groves. 3. M. and A. “Economic Cycles: Their Laws and Cause. “On a Method of Investigating Periodicities in Disturbed Series with Special Reference to Wolfer’s Sunspot Numbers. U. “Generalised Harmonic Analysis. “Noise in FM Receivers.” Acta Mathematica. J. Hannan.” Philosophical Transactions of the Royal Society. O. (1927).) 125–139. Sir A. “On the Periodicities of Sunspots. 28. E. “Korrelationstheorie der Stationaren Stochastischen Prozesse. Neurol. [23] Wiener.” Econometrica. 911–924. Munk and G. 1–64.. 148. Series A.” In Time Series Analysis.”Journal of the Royal Statistical Society. [15] Parzen. 604–615. “Time-Series Regression of Sea Level on Weather. 35.” Terrestrial Magnetism. Jap. N. Sir A. 26. E. [19] Slutsky. 69–100. (1930). (1898). [11] Hassleman. “On the Investigation of Hidden Periodicities with Application to a Supposed Twenty-Six Day Period of Meteorological Phenomena. 89. and G. 109.” In Time Series Analysis. Rosenblatt. [10] Gudmundson. Robinson (1924). E. [12] Khintchine. “Time-Series Analysis of Imports. O. C.” Macmillan: New York. “The Summation of Random Causes as the Source of Cyclical Processes. W. Ann. (1965). A. “On Consistent Estimates of the Spectrum of a Stationary Time Series. [21] Tukey. K.” Annals of Mathematical Statistics. A Treatise on Numerical Mathematics. S. G. 5. (1963). [22] Whittaker. (1960). 1778). “Oeuvres. [20] Tee.

Pollock Department of Economics Queen Mary College University of London Mile End Road London E1 4 NS Tel : +44-71-975-5096 Fax : +44-71-975-5500 19 .G. The paper describes the developments which led to the synthesis of the two branches of time-series analysis and it indicates how this synthesis was achieved.THE METHODS OF TIME-SERIES ANALYSIS by D. it is clear that there are many advantages to be derived from pursuing the two modes of analysis concurrently. It remains true that the majority of time-series analysts operate principally in one or other of the two domains. On the other hand are the frequency-domain methods of spectral analysis which are based on an extension of the methods of Fourier analysis. Such specialisation is often inﬂuenced by the academic discipline to which the analyst adheres. The University of London This paper describes some of the principal themes of time-series analysis and it gives an historical account of their development. Pollock Queen Mary and Westﬁeld College. Address for correspondence: D. There are two distinct yet broadly equivalent modes of time-series analysis which may be pursued. However.S. On the one hand there are the time-domain methods which have their origin in the classical theory of correlation.S. and they lead inevitably towards the construction of structural or parametric models of the autoregressive moving-average type.G.

. . yt−p . Proof. However. yt−1 . . This proposition may be stated formally: (1) Let y = y (x) be the conditional expectation of y given x which is ˆ ˆ also expressed as y = E(y|x). We may be interested in predicting values of this process several periods into the future on the basis of its observed history. . and if the form of the joint distribution of x and y is known. ˆ ˆ where π = π(x) is any other function of x.} representing the entire past. if y is statistically related to another random variable x whose value can be observed. it is also useful to consider an inﬁnite information set It = {yt . . the latter is always a ﬁnite set {yt . . . yt−1 . In practice. then the minimum-mean-square-error prediction is simply the expected value E(y). .LECTURE 7 Forecasting with ARMA Models Minimum Mean-Square Error Prediction Imagine that y(t) is a stationary stochastic process with E{y(t)} = 0. in developing the theory of prediction. Then E{(y − y )2 } ≤ E{(y − π)2 }. We shall denote the prediction of yt+h which is made at the time t by yt+h|t or by yt+h when it is clear that we are predicting h steps ahead. . Nevertheless. yt−p } representing the recent past. Consider E (y − π)2 = E (y − y ) + (ˆ − π) ˆ y 2 (2) = E (y − y )2 + 2E (y − y )(ˆ − π) + E (ˆ − π)2 ˆ ˆ y y 1 . . If all of the available information on y is summarised in its ˆ marginal distribution. then the minimum-mean-square-error prediction of y is the conditional expectation E(y|x). ˆ ˆ The criterion which is commonly used in judging the performance of an estimator or predictor y of a random variable y is its mean-square error deﬁned ˆ 2 by E{(y − y ) }. . This history is contained in the so-called information set.

. The ﬁnal equality depends upon the fact that (y − y )f (y|x)∂y = ˆ E(y|x) − E(y|x) = 0. . y)∂y∂x ˆ y (y − y )f (y|x)∂y (ˆ − π)f (x)∂x ˆ y x y (3) = = 0. ˆ The deﬁnition of the conditional expectation implies that E(xy) = x y xyf (x. . . yt−p }. This generalisation indicates that the minimum-mean-square-error prediction of yt+h given the information in {yt . yt−p }. . .S. yt−p } is the conditional expectation E(yt+h |yt . . ˆ it may be described as an orthogonality condition. In lieu of precise knowledge. and so the problem of predicting yt+h becomes a matter of forming a linear 2 . . . yt−1 . .D. y When the equation E(xy) = E(xˆ) is rewritten as y (5) E x(y − y ) = 0. . .G. ˆ The proposition of (1) is readily generalised to accommodate the case where. and the assertion is proved. there is E (y − y )(ˆ − π) = ˆ y x y (y − y )(ˆ − π)f (x. in place of the scalar x. y)∂y∂x x x y (4) = yf (y|x)∂y f (x)∂x = E(xˆ). y) = f (y|x)f (x) which expresses the joint probability density function of x and y as the product of the conditional density function of y given x and the marginal density function of x. In order to determine the conditional expectation of yt+h given {yt . yt−p ). yt−1 . This condition indicates ˆ that the prediction error y − y is uncorrelated with x. . there is a vector x = [x1 . . yt−1 . Here the second equality depends upon the factorisation f (x. . In that case. it follows that the conditional expectation of yt+h is a linear function of {yt . Therefore E{(y − π)2 } = E{(y − y )2 } + E{(ˆ − π)2 } ≥ ˆ y 2 E{(y − y ) }. we need to known the functional form of the joint probability density function all of these variables. . . . we should not using the information of x eﬃciently in forming y . POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS Within the second term. The result is intuitively appealing. xp ] . yt−1 . for. if the error were correlated with x. . . we are often prepared to assume that the distribution is normal.

φp+1 which ˆ minimise p+1 E (yt+h − yt+h ) ˆ (6) 2 =E yt+h − j=1 2 φj yt−j+1 φi φj γi−j . In the case of an optimal predictor which combines previous values of the series. . . which is to say that asset prices will follow random walks. A ﬁnancial market is eﬃcient if the prices of the traded assets constitute optimal forecasts of their discounted future returns. . . γ1 . which consist of interest and dividend payments and of capital gains.. . . .. . . Moreover. .D. the criterion of minimum-mean-square-error linear prediction is satisﬁed by forming yt+h = φ1 yy + φ2 yt−1 + · · · + φp+1 yt−p from the values φ1 . . φp+1 γh+p . γp γ1 γ0 . . they are nothing but the Yule–Walker equations. POLLOCK : FORECASTING regression. .S.. γp−1 φ2 γh+1 .G. it follows from the orthogonality principle that the forecast errors are uncorrelated with the previous predictions. . .. . it should not be possible for someone who is appraised only of the past history of asset prices to reap speculative proﬁts on a systematic and regular basis. γ0 Notice that. i j = γ0 − 2 j φj γh+j−1 + wherein γi−j = E(εt−i εt−j ). . Even if we are not prepared to assume that the joint distribution of the variables in normal. . . . According to the hypothesis. . . we may be prepared. In that case. the changes in asset prices will be uncorrelated with the past or present price levels. γp φ1 γh . . for the one-step-ahead prediction of yt+1 . yt−p }. . = . to base the prediction of y upon a linear function of {yt . 3 . This is a linear least-squares regression problem which leads to a set of p + 1 orthogonality conditions described as the normal equations: p (7) E (yt+h − yt+h )yt−j+1 = γh+j−1 − ˆ i=1 φi γi−j =0 . In matrix terms. p + 1.. . yt−1 . . nevertheless. these are γ (8) 0 j = 1. . A result of this sort is familiar to economists in connection with the socalled eﬃcient-markets hypothesis. . γp−1 .

εt−2 .G. α(L) We shall continue to assume. Here the ﬁrst term on the RHS embodies disturbances subsequent to the time t when the forecast is made. and the second term embodies disturbances which are within the information set {εt . yt−2 . . it follows that the mean square of the error in the forecast h periods ahead is given by h−1 ∞ 2 ψi i=0 (12) E (yt+h − yt+h ) ˆ 2 = 2 σε + 2 σε i=h (ψi − ρi )2 .} = It comprising all values that have been taken by the variable up to the present time t. . and so the optimal forecast is given by (13) yt+h|t = {ψh εt + ψh+1 εt−1 + · · ·}. . εt+h which lie in the future when the 4 . given that ε(t) is a white-noise process. the mean-square error is minimised by setting ρi = ψi . . . εt−1 . based on the information set.}. .S. Let us write the realisations of equation (9) as (10) yt+h = {ψ0 εt+h + ψ1 εt+h−1 + · · · + ψh−1 εt+1 } + {ψh εt + ψh+1 εt−1 + · · ·}. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS Forecasting with ARMA Models So far. . . . for the sake of simplicity. εt+2 . . This equivalence implies that the forecasts may be expressed in terms {yt } or in terms {εt } or as a combination of the elements of both sets. Let us now deﬁne a forecasting function. . yt−1 . Knowing the parameters in ψ(L) enables us to recover the sequence {εt .} from the sequence {yt . so either of these constitute the information set. εt−2 . . We are greatly assisted in the business of developing practical forecasting procedures if we can assume that y(t) is generated by an ARMA process such that (9) y(t) = µ(L) ε(t) = ψ(L)ε(t).D. which takes the form of (11) yt+h|t = {ρh εt + ρh+1 εt−1 + · · ·}. . . which generates the true value of yt+h . εt−1 . yt−1 .} and vice versa. yt−2 . Clearly. . ˆ This might have been derived from the equation y(t + h) = ψ(L)ε(t + h). that the forecasts are based on the information contained in the inﬁnite set {yt . we have avoided making speciﬁc assumptions about the nature of the process y(t). ˆ Then. simply by putting zeros in place of the unobserved disturbances εt+1 . .

there are not only the lagged sequences {ε(t). For. It is helpful. On the LHS.S. To derive this. Notice that. Then the forecasting function can be expressed as y (t + h|t) = L−h ψ(L) ˆ (18) = ψ(L) Lh 5 + ε(t). if i = j. to have a functional notation for describing the process which generates the h-steps-ahead forecast. . The optimal forecast of (5) may also be derived by specifying that the forecast error should be uncorrelated with the disturbances up to the time of making the forecast. the mean-square error of the forecast tends to the value of (14) 2 V y(t) = σε 2 ψi as the lead time h of the forecast increases. To demonstrate this result anew. If the covariance in (15) is to be equal to zero for all values of i ≥ 0. and we could not be generating optimal forecasts. . let us consider the covariance between the forecast error and the disturbance εt−i : h ˆ E (yt+h − yt+h )εt−i = k=1 ψh−k E(εt+k εt−i ) ∞ (15) + j=0 (ψh+j − ρh+j )E(εt−j εt−i ) 2 = σε (ψh+i − ρh+i ).} but also the sequences ε(t + h) = L−h ε(t). The notation provided by Whittle (1963) is widely used. . POLLOCK : FORECASTING forecast is made. which means that the forecasting function must be the one that has been speciﬁed already under (13). which are associated with negative powers of L which serve to shift a sequence forwards in time. . Let {L−h ψ(L)}+ be deﬁned as the part of the operator containing only nonnegative powers of L. . sometimes. then. we would not be using the information eﬃciently. if i = j. ε(t + 1) = L−1 ε(t).G. Here the ﬁnal equality follows from the fact that (16) E(εt−j εt−i ) = 2 σε . if the forecast errors were correlated with some of the elements of the information set. ψ(L) . + 1 y(t). ε(t−1). 0. as we have noted before. then we must have ρi = ψi for all i. . let us begin by writing (17) y(t + h) = L−h ψ(L) ε(t).D. . This is nothing but the variance of the process y(t). on the assumption that the process is stationary.

yt−1 . the forecasts may be generated using a recursion based on the equation (23) y(t) = − α1 y(t − 1) + α2 y(t − 2) + · · · + αp y(t − p) + µ0 ε(t) + µ1 ε(t − 1) + · · · + µq ε(t − q). 6 . On taking expectations of y(t) and ε(t) conditional on It . In practice. . The function which generates the sequence of forecasts h steps ahead is given by y (t + h|t) = ˆ (20) L−h 1 + (φ − θ)L 1 − φL ε(t) + = φh−1 = φh−1 (φ − θ) ε(t) 1 − φL (φ − θ) y(t). E(εt+k |It ) = 0 if k > 0. yt−2 . εt−2 . In this notation. if j ≥ 0.S. 1 − θL When θ = 0. .G.}. . E(εt−j |It ) = εt−j if j ≥ 0. the forecast h periods ahead is h ∞ E(yt+h |It ) = (22) = j=0 k=1 ∞ ψh−k E(εt+k |It ) + j=0 ψh+j E(εt−j |It ) ψh+j εt−j . . we ﬁnd that ˆ E(yt+k |It ) = yt+k|t (21) E(yt−j |It ) = yt−j if k > 0. .} or equally the values of {yt . εt−1 . 1) process represented by the equation (19) (1 − φL)y(t) = (1 − θL)ε(t). Consider an ARMA (1. this gives the simple result that y (t + h|t) = φh y(t). .D. ˆ Generating The Forecasts Recursively We have already seen that the optimal (minimum-mean-square-error) forecast of yt+h can be regarded as the conditional expectation of yt+h given the information set It which comprises the values of {εt . POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS Example.

then the general solution will embody a linear time trend which is the asymptote to which the forecasts will tend. In general. POLLOCK : FORECASTING By taking the conditional expectation of this function. the forecasting function becomes a pth-order homogeneous diﬀerence equation in y. q. yt+h = −{α1 yt+h−1 + · · · + αp yt+h−p } when ˆ ˆ ˆ p. It can be from (27) that. (25) (26) and (27) ˆ ˆ yt+h = −{α1 yt+h−1 + · · · + αp yt+h−p } ˆ + µh εt + · · · + µq εt+h−q if p < h ≤ q.S. If one of the roots of α(L) = 0 is unity. The p values of y(t) from t = r = max(p. Hence the forecast will tend to a nonzero constant. 1. It may be assumed that none of the roots of α(L) = 0 lie inside the unit circle. if d of the roots are unity. Consider (28) yt+h|t+1 = {ψh−1 εt+1 + ψh εt + ψh+1 εt−1 + · · ·} and ˆ yt+h|t = {ψh εt + ψh+1 εt−1 + ψh+2 εt−2 + · · ·}. If two of the roots are unity. ˆ ˆ 7 . then the process would be radically unstable. then the general solution will comprise a polynomial in t of order d − 1. then yt+h will converge to ˆ zero as h increases. The behaviour of the forecast function beyond the reach of the starting values can be characterised in terms of the roots of the autoregressive operator. for h > p. q.D. and the general solution of the homogeneous equation of (27) will include a constant term which represents the product of the unit root with an coeﬃcient which is determined by the starting values. If all of the roots are less than unity. q) model. It can be seen that (29) yt+h|t+1 = yt+h|t + ψh−1 εt+1 . if there were roots inside the circle. q) to t = r − p + 1 serve as the starting values for the equation. q < h. ˆ The ﬁrst of these is the forecast for h − 1 periods ahead made at time t + 1 whilst the second is the forecast for h periods ahead made at time t. The forecasts can be updated easily once the coeﬃcients in the expansion of ψ(L) = µ(L)/α(L) have been obtained. we get (24) ˆ yt+h = −{α1 yt+h−1 + · · · + αp yt+h−p } ˆ + µh εt + · · · + µq εt+h−q when 0 < h ≤ p. for. ˆ yt+h = −{α1 yt+h−1 + · · · + αp yt+h−p } if ˆ q < h ≤ p.G. then we have an ARIMA(p.

The long-term forecast is y = c1 which is the asymptote to which the forecasts ¯ tend as the lead period h increases. The function generating the one-step-ahead forecasts can be written as (35) y (t + 1|t) = ˆ (1 − θ) y(t) 1 − θL = (1 − θ) y(t) + θy(t − 1) + θ 2 y(t − 2) + · · · . yt = c1 + c2 . These constants are found by solving the equations (33) The solutions are (34) c1 = yt − φyt−1 1−φ and c2 = φ (yt − yt−1 ). when analysed carefully. The later is ˆ also the prediction error of the one-step-ahead forecast made at time t. φ−1 yt−1 = c1 + c2 φ−1 . wherein φ ∈ (0. Ad-hoc Methods of Forecasting There are some time-honoured methods of forecasting which. The solution of the homogeneous diﬀerence equation (31) ˆ 1 − (1 + φ)L + φL2 y (t + h|t) = 0. we may consider the Integrated Autoregressive (IAR) Process deﬁned by (30) 1 − (1 + φ)L + φL2 y(t) = ε(t). ˆ where c1 and c2 are constants which reﬂect the initial conditions. which deﬁnes the forecast function. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS where εt+1 = yt+1 − yt+1 is the current disturbance at time t + 1. Exponential Smoothing. Two of the leading examples are provided by the method of exponential smoothing and the Holt–Winters trend-extrapolation method. This depends upon taking a weighted average of past values of the time series with the weights following a geometrically declining pattern. Example. 1). For an example of the analytic form of the forecast function.S. reveal themselves to be the methods which are appropriate to some simple ARIMA models which might be suggested by a priori reasoning. 8 .D. is (32) y (t + h|t) = c1 + c2 φh .G. The roots of the auxiliary equation z 2 − (1 + φ)z + φ = 0 are z = 1 and z = φ. A common forecasting procedure is exponential smoothing.

The algorithm may also be expressed in error-correction form. = 1 − θL On setting φ = 1. This gives y (t + 1|t) = φy(t) − θε(t) ˆ = φy(t) − θ (37) (1 − φL) y(t) 1 − θL {(1 − θL)φ − (1 − φL)θ} y(t) = 1 − θL (φ − θ) y(t). The Holt–Winters Method.D. . t = 0. let us consider the ARMA(1. 1) model to an IMA(1. The Holt–Winters algorithm is useful in extrapolating local linear trends. 1) model y(t) − φy(t − 1) = ε(t) − θε(t − 1). . 1. The method of exponential smoothing corresponds to the optimal forecasting procedure for the ARIMA(0. which converts the ARMA(1. . ±2. we get (36) y (t + 1|t) = θˆ(t|t − 1) + (1 − θ)y(t). likewise formed at time t. ˆ ˆ is the estimate of an intercept or levels parameter formed at time t and (40) ˆ ˆ α ˆ βt = µ(ˆ t − αt−1 ) + (1 − µ)βt−1 is the estimate of the slope parameter. 1] are the smoothing parameters.} which is made at time t is given by (38) where (39) ˆ αt = λyt + (1 − λ)(ˆ t−1 + βt−1 ) ˆ α = λyt + (1 − λ)ˆt|t−1 y ˆ yt+h|t = αt + βt h. The prediction h periods ahead of a series y(t) = {yt . To see this. we obtain precisely the forecasting function of (35). 1) model (1 − L)y(t) = (1 − θL)ε(t). ±1. 1) model. POLLOCK : FORECASTING On multiplying both sides of this equation by 1 − θL and rearranging. which is better described as an IMA(1. 1) model. Let (41) ˆ ˆ ˆ et = yt − yt|t−1 = yt − αt−1 − βt−1 9 .G. ˆ y which shows that the current forecast for one step ahead is a convex combination of the previous forecast and the value which actually transpired. µ ∈ (0.S. The coeﬃcients λ.

ˆ which. ˆ λµ β(t) The solution of the latter is (47) 1 α(t) ˆ = ˆ β(t) (1 − L)2 1−L L 0 1−L λ e(t). ˆ ˆ When the latter is drafted into equation (40). becomes (43) ˆ αt − αt−1 = λet + βt−1 . In order reveal the underlying nature of this method. we get an analogous expression for the slope parameter: (44) ˆ ˆ ˆ βt = µ(λet + βt−1 ) + (1 − µ)βt−1 ˆ = λµet + βt−1 . λµ Therefore. on rearranging.S.D. β(t This can be rearranged to give (46) 1−L 0 −L 1−L α(t) ˆ λ = e(t). 2) model of the form (49) (I − L)2 y(t) = µ0 ε(t) + µ1 ε(t − 1) + µ2 ε(t − 2) 10 . it follows that ˆ y (t + 1|t) = α(t) + β(t) ˆ ˆ (48) = (λ + λµ)e(t) + λe(t − 1) .G. Then the formula for the levels parameter can be given as (42) ˆ αt = λet + yt|t−1 ˆ ˆ = λet + αt−1 + βt−1 . (1 − L)2 This can be recognised as the forecasting function of an IMA(2. it is helpful to combine the two equations (42) and (44) in a simple state-space model: (45) α(t) ˆ 1 1 = ˆ 0 1 β(t) α(t − 1) ˆ λ ˆ − 1) + λµ e(t). POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS be the error at time t arising from the prediction of yt on the basis of information available at time t − 1. from (38).

There are various arguments which suggest that an IMA(2. + ε(t) 2 β(t − 1) ζ(t − 1) 2 ζ(t − 1) 2 ζ(t − 1) + 11 η(t) + 2 ε(t) . where ν(t) and η(t) are mutually independent white-noise processes. and using the notation = 1 − L. β(t) = β(t − 1) + ζ(t). Slightly more elaborate models with the same outcome have also been proposed. y(t) = ξ(t) + η(t). 2) model might be a natural model to adopt. Thus the so-called structural model consists of the equations y(t) = µ(t) + ε(t). The resulting model may be expressed in two equations (51) (I − L)2 ξ(t) = ν(t). Combining the equations. (1 − L)2 The Local Trend Model. The simplest of these arguments arises from an elaboration of a second-order random walk which adds an ordinary white-noise disturbance to the tend. (53) µ(t) = µ(t − 1) + β(t − 1) + η(t). Here the numerator ν(t)+ 2 η(t) = {ν(t)+η(t)}−2η(t−1)+η(t−2) constitutes an second-order MA process.S. .D. + + + η(t) η(t) η(t) .G. POLLOCK : FORECASTING for which (50) y (t + 1|t) = ˆ µ1 ε(t) + µ2 ε(t − 1) . gives y(t) = (52) = ν(t) 2 + η(t) 2 2 ν(t) + η(t) . Working backwards from the ﬁnal equation gives β(t) = µ(t) = (54) = y(t) = = ζ(t) .

the numerator constitutes a second-order MA process. ˆ (57) ηt+2|t = φˆt+1|t . The ﬁrst part of the function is the extrapolation of the global linear trend.D.G. |φ| < 1.S. Consider the limiting case when φ → 1. ˆ This reminds us that. I − φL The formation of an h-step-ahead prediction is straightforward. by assumption. Equivalent Forecasting Functions Consider a model which combines a global linear trend with an autoregressive disturbance process: (55) y(t) = γ0 + γ1 t + ε(t) . ˆ ˆ becomes negligible when h becomes large. The operator in this case is the AR(1) operator I − φL. in place of an AR(1) disturbance process. we have to consider a random-walk process. Notice that the analytic solution of the associated diﬀerence equation is just (58) ηt+h|t = φh ηt . We know that the forecast function of a random walk consists of nothing more than a constant 12 . ˆ etc. it is clear that the contribution of the disturbance part to the overall forecast function (59) yt+h|t = zt+h|t + ηt+h|t . Now. whenever we can express the forecast function in terms of a linear recursion. ˆ η η ηt+3|t = φˆt+2|t . The following iterative scheme is provides a recursive solution to the problem of generating the forecasts: ηt+1|t = φηt . The second part is the prediction associated with the AR(1) disturbance term η(t) = (I − φL)−1 ε(t). POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS Once more. Since. This takes the form of (56) zt+h|t = γ0 + γ1 (t + h) = zt + γ1 h where zt = γ0 + γ1 t. we can also express it in an analytic form embodying the roots of a polynomial lag operator. for we can separate the forecast function into two additive parts.

It is intuitively clear that.S. then y(t) will be associated with a linear forecast function. This is the equation of a process which is described as random walk with drift. Setting φ = 1 turns the operator I − φL into the diﬀerence operator I − L = . if the random walk process z(t) = ε(t) is associated with a constant forecast function. POLLOCK : FORECASTING function. so equation (60) with φ = 1 can also be written as (61) y(t) = γ1 + ε(t). 13 . Yet another way of expressing the process is via the equation y(t) = y(t − 1) + γ1 + ε(t). Another way of looking at the problem depends upon writing equation (55) as (60) (I − φL) y(t) − γ0 − γ1 t = ε(t). random walk and unit root models—and models with global polynomial trends come together.D. Finally.G. we should notice that the model of random walk with drift has the same linear forecast function as the model (62) 2 y(t) = ε(t) which has two unit roots in the AR operator. On adding this constant to the linear function zt+h|t = γ0 + γ1 (t + h) we continue to have a simple linear forecast function. The purpose of this example has been to oﬀer a limiting case where models with local stochastic trends—ie. But γ0 = 0 and γ1 t = γ1 . and if z(t) = y(t) − γ0 − γ1 t.

to be used more often in selecting models. In practice. the notion that there is an underlying ARMA process is a ﬁction. and the leading p + q + 1 elements of the autocovariance function. There are two other functions. given these orders. We shall describe these functions and their use separately from the spectral density function which ought. These are the partial autocorrelation function and the spectral density function. Their basic tools were the sample autocorrelation function and the partial autocorrelation function. When the data come from the real world.LECTURE 8 The Identiﬁcation of ARIMA Models As we have established in a previous lecture. prominent in time-series analysis. This hazard is revealed by sampling experiments. we might be able to discern the orders p and q of its autoregressive and moving-average operators and. from which it is possible to recover the parameters of an ARMA process. Even when the data are truly generated by an ARMA process. The empirical versions of the three functions which are used in a modelbuilding exercise may diﬀer considerably from their theoretical counterparts. The Autocorrelation Functions The techniques of model identiﬁcation which are most commonly used were propounded originally by Box and Jenkins (1972). the sampling errors which aﬀect the empirical functions can lead one to identify the wrong model. in theory. there is a one-to-one correspondence between the parameters of an ARMA(p. Given the true autocovariances of a process. The fact that spectral density function is often overlooked is probably due to 1 . and the business of model identiﬁcation becomes more doubtful. the business of identifying the model and of recovering its parameters can be conducted on the basis of any of them. including the variance of the disturbance. we should then be able to deduce the values of the parameters. and the choice amongst alternative models must be made partly with a view their intended uses. the process is assisted by taking account of all three functions. and. perhaps. Then there may be no such thing as the correct model. The appearance of each of these functions gives an indication of the nature of the underlying process to which they belong. q) model.

there might be more than one model of the same series. in view of this algorithm. it is envisaged that the cycle might have to be repeated several times and that. at the end. one should be wary of giving too much credence to the empirical autocorrelations at lag values which are signiﬁcantly high in relation to the size of the sample. the diﬀerence between the two quantities vanishes as the sample size increases. that the information in {cτ } is equivalent to the information contained jointly in {pτ } and c0 .D. . as the value of the lag increases. . The Durbin–Levinson algorithm provides an eﬃcient way of computing the sequence {pτ } of partial autocorrelations from the sequence of {cτ } of autocovariances. model estimation and model checking. τ = 0. . y1 . yt−τ +1 .G. Indeed. (yt − y )(yt−τ − y ) ¯ ¯ t=τ is the empirical autocovariance at lag τ and c0 is the sample variance. The Methodology of Box and Jenkins The model-building methodology of Box and Jenkins. 2 . we deﬁne the sample autocorrelation function to be the sequence of values (1) wherein (2) 1 cτ = T T −1 rτ = cτ /c0 . The partial autocorrelation measures the dependence between yt and yt−τ after the eﬀect of the intervening values has been removed. Autocorrelation function (ACF). . In view of the diﬃculties of selecting an appropriate model. The sample partial autocorrelation pτ at lag τ is simply the correlation between the two sets of residuals obtained from regressing the elements yt and yt−τ on the set of intervening values y1 . In plotting the sequence {rτ }. One should note that. It involves a cycle comprising the three stages of model selection. we shall omit the value of r0 which is invariably unity. The sample partial autocorrelation pτ is virtually the same quantity as the estimated coeﬃcient of lag τ obtained by ﬁtting an autoregressive model of order τ to the data. . POLLOCK : ECONOMIC FORECASTING 1992/3 an unfamiliarity with frequency-domain analysis on the part of many model builders. T − 1. relies heavily upon the two functions {rt } and {pt } deﬁned above. Therefore the sample autocorrelation function {rt } and the sample partial autocorrelation function {pt } are equivalent in terms of their information content. . yT −1 of T observations. y2 . Partial autocorrelation function (PACF). Moreover. . . It can be seen. . the number of observations comprised in the empirical autocovariance diminishes until the ﬁnal element cT −1 = T −1 (y0 − y )(yT −1 − y ) is reached which comprises only the ﬁrst and ¯ ¯ last mean-adjusted observations. . . . Given a sample y0 .S. 1. in interpreting the plot.

75 0. 3 .50 0.G.S.25 0.00 0.5 16.0 17.00 −0.5 0 1. POLLOCK : ARIMA IDENTIFICATION 18.00 0 1.D.25 0.50 0.5 17.50 0 5 10 15 20 25 5 10 15 20 25 50 100 150 Figure 1.25 −0.0 15.5 18. The concentration readings from a chemical process with the autocorrelation function and the autocorrelation function of the diﬀerences.0 16.75 0.00 0.

A variety of techniques of trend removal.G. wherein d = (I − L)d is the dth power of the diﬀerence operator. Moving-average processes.D. The inverse operator −1 is the summing or integrating operator. which accounts for the fact that the model depicted by equation (3) is described an autoregressive integrated moving-average model. Box and Jenkins were inclined to believe that many empirical series can be modelled adequately by supposing that some suitable diﬀerence of the process is stationary. we need some scale by which to judge the signiﬁcance of the values of its elements. which is taken before embarking on the cycle. However. In that case. then it must be removed. q) model. q) equation (3) α(L) d y(t) = µ(L)ε(t). When such a function is ﬁtted. one may examine the autocorrelation sequence of the residual or processed series. Thus the process generating the observed series y(t) might be modelled by the ARIMA(p. To judge whether the corresponding sample autocorrelation function {rτ } shows evidence of a truncation. and it is appropriate to apply the diﬀerence operator. which include the ﬁtting of parametric curves and of spline functions. Those of a mixed autoregressive moving-average model are not so easily unravelled. The characteristics of pure autoregressive and pure moving-average process are easily spotted. is to examine the time plot of the data and to judge whether or not it could be the outcome of a stationary process. the autoregressive and moving-average orders are selected by examining the sample autocorrelations and sample partial autocorrelations. The theoretical autocorrelation function {ρτ } of a pure moving-average process of order q has ρτ = 0 for all τ > q. have been discussed in previous lectures. An example is provided by Figure 1 where a comparison is made between the autocorrelation function of the original series and that of its diﬀerences. Although the original series does not appear to embody a systematic trend. either by trend removal or by diﬀerencing. it is to the sequence of residuals that the ARMA model is applied. The corresponding partial autocorrelation function {πτ } is liable to decay towards zero gradually. POLLOCK : ECONOMIC FORECASTING 1992/3 Reduction to stationarity. the diﬀerenced series z(t) = d y(t) will be described by a stationary ARMA(p. The sequence corresponding to a stationary process should converge quite rapidly to zero as the value of the lag increases. The ﬁrst step. Once the degree of diﬀerencing has been determined. it does drift in a haphazard manner which suggests a random walk. 4 . An empirical autocorrelation function which exhibits a smooth pattern of signiﬁcant values at high lags indicates a nonstationary series. If a trend is evident in the data.S. d. To determine whether stationarity has been achieved.

50 0.75 0. POLLOCK : ARIMA IDENTIFICATION 4 3 2 1 0 −1 −2 −3 −4 −5 0 1. The theoretical values correspond to the solid bars.G.90L + 0.00 −0.D. The graph of 120 observations on a simulated series generated by the MA(2) process y(t) = (1 + 0.00 0.50 0.25 0.25 −0.00 −0.75 0.00 0.25 0 1.25 0.50 −0.S. 5 .75 0 5 10 15 20 25 5 10 15 20 25 25 50 75 100 Figure 2.81L2 )ε(t) together with the theoretical and empirical ACF’s (middle) and the theoretical and empirical PACF’s (bottom).

the autocovariance function of an ARMA(p. 6 . On its own. In the case of a mixed ARMA(p. It is the partial autocorrelation function which serves most clearly to identify a pure AR process. the standard deviation of rτ is approximately (4) 1 2 2 2 √ 1 + 2(r1 + r2 + · · · + rq ) T 1/2 for τ > q. Mixed processes.√ their standard deviations for all √ lags greater that p are approximated by 1/ T . In particular. neither the theoretical autocorrelation function not the theoretical partial autocorrelation function have any abrupt cutoﬀs. The signiﬁcance of the values of the partial autocorrelations is judged by the fact that. These bounds are represented by the dashed horizontal lines on the accompanying graphs. That is to say (5) ρτ = −(α1 ρτ −1 + · · · + αp ρτ −p ) for all τ ≥ p.S. In general. q) process is not easily distinguished from that of a pure AR process. then the presence of complex roots in the operator α(L) is indicated.96/ T are also plotted on the graph of the partial autocorrelation function. its elements γτ satisfy the same diﬀerence equation as that of a pure AR model for all values of τ > max(p. 237]. The theoretical autocorrelation function {ρτ } of a pure autoregressive process of order p obeys a homogeneous diﬀerence equation based upon the autoregressive operator α(L) = 1 + α1 L + · · · + αp Lp . A simpler measure of the scale √ of the autocorrelations is provided by the limits of ±1. Autoregressive processes. we may use a result of Bartlett [1946] which shows that.96/ T which are the approximate 95% conﬁdence bounds for the autocorrelations of a white-noise sequence. POLLOCK : ECONOMIC FORECASTING 1992/3 As a guide to determining whether the parent autocorrelations are in fact zero after lag q. The result is also given by Fuller [1976.G. Indeed. p.D. One can expect the empirical autocovariance function of a pure AR process to be of the same nature as its theoretical parent. the sequence generated by this equation will represent a mixture of damped exponential and sinusoidal functions. for a sample of size T . q). Thus the bounds of ±1. all elements of the sample partial autocorrelation function are expected to be close to zero for lags greater than p. for a pth order process. q) process. Likewise. If the sequence is of a sinusoidal nature. there is little that can be inferred from either of these functions or from their empirical counterparts beyond the fact that neither a pure MA model nor a pure AR model would be inappropriate. The theoretical partial autocorrelations function {πτ } of a AR(p) process has πτ = 0 for all τ > p. which corresponds to the fact that they are simply estimates of zero-valued parameters.

25 0.75 0. 7 .00 −0.50 0 1.25 −0.75 −1.G. The graph of 120 observations on a simulated series generated by the AR(2) process (1 − 1. POLLOCK : ARIMA IDENTIFICATION 15 10 5 0 −5 −10 −15 0 1.81L2 )y(t) = ε(t) together with the theoretical and empirical ACF’s (middle) and the theoretical and empirical PACF’s (bottom).50 −0.00 0.75 0.00 −0.00 0.D.S. The theoretical values correspond to the solid bars.69L + 0.25 −0.25 0.00 0 5 10 15 20 25 5 10 15 20 25 25 50 75 100 Figure 3.50 0.50 0.

8 .90L + 0.75 0.G.50 0 1.25 −0.00 0.50 0.00 −0.00 −0. The theoretical values correspond to the solid bars.25 0.75 0.00 0 5 10 15 20 25 5 10 15 20 25 20 50 75 100 Figure 4.25 −0.00 0. POLLOCK : ECONOMIC FORECASTING 1992/3 40 30 20 10 0 −10 −20 −30 0 1.75 −1.25 0. The graph of 120 observations on a simulated series generated by the ARMA(2.D.50 0.69L + 0.81L2 )y(t) = (1 + 0.50 −0.S. 2) process (1 − 1.81L2 )ε(t) together with the theoretical and emprical ACF’s (middle) and the theoretical and empirical PACF’s (bottom).

Therefore the principle of parametric parsimony is less persuasive than it might be in an econometric context. One can understand this feature of mixed models by recognising that the sum of a pure AR(p) process an a white-noise process is an ARMA(p. Mixed models are also favoured by the fact that the sum of any two mutually independent autoregressive process gives rise to an ARMA process. and. to approximate a pure process of a high order by a more parsimonious mixed model. it might be appropriate. = α(L)ρ(L) α(L)ρ(L) where µ(L)ζ(t) = ρ(L)ε(t) + α(L)η(t) constitutes a moving-average process of order max(p. suﬃcient to sustain the estimation of pure autoregressive models of high order. POLLOCK : ARIMA IDENTIFICATION There is good reason to regard mixed models as more appropriate in practice than pure models of either variety. p) process. 9 . Then their sum will be y(t) + z(t) = (6) η(t) ε(t) + α(L) ρ(L) µ(L)ζ(t) ρ(L)ε(t) + α(L)η(t) = . wherein ε(t) and η(t) are mutually independent white-noise processes. which has its parameters in the numerator. there may be some justiﬁcation for pure AR models. Here there is often abundant data.D. mixed models would seem to be called for often. In the context of electrical and mechanical engineering. r). Let y(t) and z(t) be autoregressive processes of orders p and r respectively which are described by the equations α(L)y(t) = ε(t) and ρ(L)z(t) = η(t). In economics. pure AR models perform poorly whenever the data is aﬀected by errors of observation.G. Indeed.S. where the data series are highly aggregated. a mixed model is liable to be more robust. sometimes. there is the fact that a rational transfer function is far more eﬀective in approximating an arbitrary impulse response than is an autoregressive transfer function. in this respect. For a start. However. or a moving-average transfer function. whose parameters are conﬁned to the denominator.

Thus we obtain a sample spectrum in the form of (4) f r (ω) = 1 cτ cos(ωτ ) . we must set cτ = 0 when τ > T − 1. the autocovariances are not estimable since 1 (3) cT −1 = (y0 − y )(yT −1 − y ) ¯ ¯ T comprises the ﬁrst and the last elements of the sample. c0 + 2 2π τ =1 T −1 The sample spectrum deﬁned in this way is just 1/4π times the periodogram of the sample which is given by T −1 I(ωj ) = 2 c0 + 2 τ =1 cτ cos(ωj τ ) 2 2 (5) = t yt cos(ωj t) + t yt sin(ωj t) T 2 2 αj + βj . Notice that. π]. The obvious way to estimate this function is to replace the unknown autocovariances {γτ } by the corresponding empirical moments {cτ } where (2) 1 cτ = T T −1 (yt−τ − y )(yt − y ) ¯ ¯ t=τ if τ ≤ T − 1. 2π τ =1 ∞ ω ∈ [0. beyond a lag of τ = T − 1. and therefore. = 2 1 .LECTURE 9 Nonparametric Estimation of the Spectral Density Function The Spectrum and the Periodogram The spectral density of a stochastic process is deﬁned by (1) 1 f (ω) = γ0 + 2 γτ cos(ωτ ) .

. βn .D. . . (αn−1 . . yT −1 . . . in the case where T is even. Thus we ﬁnd that 2 . and this cannot happpen in the present case. . . upon which the method is directly based. it does not result in consistent estimates. For a set of parameters to be estimated consistently. we require that the amount of the relevant information which is available should increase with the size of the sample. . which they must be on the assumption that y(t) = ε(t). . from a total of T observations. or T T π ωj = 0. From the ordinary theory of linear regression. These conclusions can be illustrated quite simply in the case where y(t) = ε(t) is a white-noise sequence with a uniform spectrum f (ω) = σ 2 /2π over the range {−π ≤ ω ≤ π}. in some cases. β1 ). where n = T /2. requires us to determine the T coeﬃcients α0 . . . π when T is even. it follows that. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS where (6) αj = 1 T yt cos ωj t and βj = t 1 T yt sin ωj t. T (7) Although this method of estimating the spectrum via the periodogram may result. . 2 . 2 . The variance of a chi-square distribution of k degrees of freedom is just 2k. . then 1 2 αj σ2 2 cos2 (ωj t) + βj t t sin2 (ωj t) = = (9) T 2 α2 + βj 2σ 2 j Ij σ2 has a chi-square distribution of two degrees of freedom.G. the periodogram has just n ordinates which correspond to the values (T − 1) π ωj = 0. . the Fourier decomposition of the sample y0 . . in unbiased estimates of the corresponding ordinates of the spectral density function. .S. π when T is odd. (α1 . The values of αj and βj which characterize the sample spectrum and the periodogram are precisely the ones which would result from ﬁtting the regression model (8) y(t) = αj cos(ωj t) + βj sin(ωj t) + ε(t) to the to the data y0 . . if the population values which are estimated by αj and βj are in fact zero. βn−1 ). t As we have deﬁned it above. This is hardly suprising when we recall that. . yT −1 . .

this comprises a further m adjacent values falling on either side. which 3 . we are denied the advantage of increasing the acuity or resolution of our estimation. . we deﬁne m (12) f (π) = µ0 f (π) + 2 k=1 s r µk f r (π − ωk ). Spectrum Averaging One way of improving the properties of the estimate of f (ωj ) is to comprise within the estimator several adjacent values from the periodogram. Some obvious problems arise in deﬁning values of the estimate towards the boundaries of the set of frequencies {ωj . µm−1 . 0 ≤ ωj ≤ π}. so that narrow peaks in the spectrum. This means that. These problems can be overcome by treating the spectrum as symmetric about the points 0 and π so that. If Q is kept constant. The estimate f s (ωj ) comprises a total of M = 2m + 1 ordinates of the periodogram which span an interval of Q = 4mπ/T radians.G. .S. This number of radians Q is the so-called bandwidth of the estimator. this value does not diminish as T increases. in spite of the increasing sample size. for example.D. A further consequence of using the periodogram directly to estimate the spectrum is that the estimators of f (ωj ) and f (ωk ) will be uncorrelated for all j = k. µ1−m . . 4π 2 Clearly. whence it follows that the variance of the spectral estimate f r (ωj ) = Ij /4π is (10) V {f r (ωj )} = σ4 = f 2 (ωj ). This follows from the orthogonality of the sine and cosine functions which serve as a basis for the Fourier decomposition of the sample. The fact that adjacent values of the estimated spectrum are uncorrelated means that it will have a particularly volatile appearance. In addition to the value of the periodogram at the point ωj . They deﬁne what is known as a spectral window. POLLOCK : SPECTRAL ESTIMATION V (Ij /σ 2 ) = 4. µm } should sum to unity as well being symmetric in the sense that µ−k = µk . then M increases at the same rate as T . Thus we may deﬁne a new estimator in the form of k=m (11) f (ωj ) = k=−m s µk f r (ωj−k ). . The set of weights {µ−m .

so that diminishing weights are given to the values of cτ as τ increases. . . as M → ∞. .S. we should allow M to increase at a slower rate than T so that. mT −1 . 1. . 1 2 R ≤ τ ≤ R. If the weights are zero-valued beyond mR . 4 .D. and we may retain some of the disadvantages of the original periodogram. . τ = 0. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS have been smoothed over. R and the Parzen window deﬁned by mτ = 1 − 6 (15) τ R τ mτ = 2 1 − R 2 +6 3 τ R 3 . Amongst those which are used nowadays are the Tukey–Hanning window deﬁned by (14) mτ = 1 πτ 1 + cos 2 R . 0 ≤ τ ≤ 1 R. . c1 . f (ω) = 2π τ =1 w T −1 The series of weights deﬁne what is described as a lag window. . then the size of the bandwith will decrease with T . . then our revised estimator for the spectrum takes the form of (13) 1 m0 c0 + 2 mτ cτ cos(ωτ ) . m1 . we will have Q → 0. 2 . Ideally. . . A wide variety of lag windows have been deﬁned. cT −1 are denoted by m0 . If the series of weights associated with the the autocovariances c0 . if we maintain the value of M . Consider the smoothed periodogram deﬁned by m (16) f s (ωj ) = k=−m µk f r (ωj−k ). Weighting in the Time Domain An alternative approach to spectral estimation is to give diﬀerential weighting to the estimated autocovariances comprised in our formula for the sample spectrum. then we describe R as the truncation point. This seems reasonable since the precision of these estimates decreases as τ increases.G. . may escape detection. Conversely. The Relationship between Smoothing and Weighting It would be suprising if we were unable to interpret the method of smoothing the periodogram in terms of an equivalent method of weighting the autocovariance function and vice versa. .

ωk = 2πk T is the ﬁnite Fourier transform of the sequence of weights {µ−m . π]. n} instead of over the interval ω = [0. it follows that f r (ωj−k ) = f r (ωj − ωk ). .D. . . where ωk = 2πk/T . j = 1. . . µm−1 . Therefore. The ﬁnal expression under (18) would be the same as our expression for the spectral estimator given under (13) were it not for the fact that we have deﬁned the present function over the set of values {ωj . . . and for the fact that we have used a complex exponential expression instead of a cosine. we get f s (ωj ) = k 1 2π 1 2π T −1 cτ exp(−iωj−k τ ) τ =1−T cτ exp(−i[ωj − ωk ]τ ) τ µk 1 2π cτ exp(−i[ωj − ωk ]τ ) τ (18) 1 = 2π 1 = 2π µk exp(iωk τ ) cτ exp(−iωj τ ) τ k mτ cτ exp(−iωj τ ) τ where m (19) mτ = k=−m µk eiωk τ .S.G. on substituting f r (ωj−k ) = (17) = into (16). τ =1−T 5 . . It is also possible to demonstrate an inverse relationship whereby a spectral estimator which depends upon weighting the autocovariance function is equivalent to another estimator which smooths the periodogram. Consider a spectral estimator in the form of (20) f w (ω0 ) = 1 2π T −1 mτ cτ exp(−iω0 τ ). µm } which deﬁne the spectral window. µ1−m . POLLOCK : SPECTRAL ESTIMATION Given that the ordinates of the original periodogram I(ωj ) corrrespond to the points ωj deﬁned in (7).

to sustain this interpretation. . . as we have done in (5). Notice that. and this requires us to set m0 = 1. However. on setting ω = 0 in (21). π]. we get f w (ω0 ) = (23) = ω 1 2π u(ω)eiωτ dω cτ e−iω0 τ τ ω u(ω) τ cτ ei(ω− ω0 )τ dω = ω u(ω)f r (ω0 − ω)dω. . j = 1. 6 . n}. we must deﬁne the periodogram not just at n frequency points {ωj . but over the entire interval [−π. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS where (21) mτ = ω u(ω)eiωτ dω has an inverse Fourier transform given by (22) u(ω) = 1 2π ∞ τ =−∞ mτ e−iωτ On substituting the expression for mτ from (21) into (20). to a technique of smoothing the periodogram.D. The latter is exactly the value by which we would expect to weight the estimated variance c0 within the formula in (13) which deﬁnes the spectral estimator f w (ω). we get (24) m0 = ω u(ω)dω It is desirable that the weighting function should integrate to unity over the relevant range.S.G. This shows that the technique of weighting the autocovariance function corresponds. . in general.

However. However. then we should have a relationship of the form (2) z(t) = z(t − 12) + ε(t) or. In some cases a stochastic trend seems to be more appropriate. invariably. if y(t) were 1 . to perform the preliminary task of eliminating a trend from the data before determining the seasonal pattern from the residuals. Once a stochastic unit-root model has been adopted for the trend. In the simplest circumstances. This is ostensibly an autoregressive model with an operator in the form of 12 12 = 1 − L . it is interesting to note in passing that. equivalently. it is not always possible to ﬁnd an analytic function which serves the purpose. we might ﬁnd that the diﬀerence between yt and yt−12 is a small random quantity. Once the trend has been extracted from the original series y(t) by diﬀerencing. It has proved necessary. for the sake of argument. 12 y(t) = ε(t). which means that the seasons have a cycle of s = 12 months. we would expect to ﬁnd a strong relationship between the values of observations taken in the same month of successive years. Imagine. that the period between successive observations is one month. If the sequence of the twelve-period diﬀerences were white noise. In most of the cases which we have analysed.LECTURE 10 Seasonal Models and Seasonal Adjustment So far we have relied upon the method of trigonometrical regression for building models which can be used for forecasting seasonal economic time series. it seems natural to model the pattern of seasonal ﬂuctuations in the same manner by using autoregressive operators with complex-valued roots of unit modulus. Such a trend is generated by an autoregressive operator with units roots. The General Multiplicative Seasonal Model Let (1) z(t) = d y(t) be a de-trended series which exhibits seasonal behaviour with a periodicity of s periods. the trend has been modelled quite successfully by a simple analytic function such as a quadratic.

One might propose to model this pattern using a second ARMA of the form (7) α(L)η(t) = µ(L)ε(t). A model of this sort has been described by Box and Jenkins as the general multiplicative seasonal model. POLLOCK: TIME SERIES AND FORECASTING generated by a regression model in the form of 6 (3) y(t) = j=0 ρj cos(ωj − θj ) + η(t). By combining equations (1) (6) and (7). where α(z) is a polynomial of degree p and µ(z) is a polynomial of degree q. By a simple analogy with the ordinary ARMA model. In eﬀect. If η(t) were a white-noise sequence of independently and identically distributed random variables. then we should have (4) (1 − L12 )y(t) = η(t) − η(t − 12) = ζ(t). 2 .S. and. If there is a connection between successive months within the year. then there would be no connection between the twelve time series. ζt−j ) = σ 2 . if the disturbance sequence η(t) were white noise. The various components of our analysis can now be assembled. where ωj = πj/6 = j × 30◦ .D. To denote such a model in a summary fashion. 0. then there should be a pattern of serial correlation amongst the elements of the disturbance process η(t). It can be imagined that a more complicated relationship stretches over the years which connects the months of the calender. where Φ(z) is a polynomial of degree P and Θ(z) is a polynomial of degree Q. then the residual term ζ(t) = η(t) − η(t − 12) would show the following pattern of correlation: (5) C(ζt . if j mod 12 = 0. we can derive the following general model for the sequence y(t): (8) Φ(L12 )α(L) D 12 d y(t) = Θ(L12 )µ(L)ε(t). we can devise a model of the form (6) Φ(L12 ) D 12 z(t) = Θ(L12 )η(t). this model is applied to twelve separate time series—one for each month of the year—whose observations are seperated by yearly intervals.G. otherwise.

Finally. to one more diﬀerencing operation than is intended. where ωj = πj/6 = j × 30◦ . even with 12 in place of D . Therefore. there is a danger that the original sequence y(t) will be subjected. in the general version of the model.S. D. it can be see that the generic quadratic factor has the form of (11) 1 − 2 cos(ωj )L + L2 = (1 − eiωj L)(1 − e−iωj L). A cursory inspection of equation (9) indicates that the ﬁrst-order diﬀerence operator = I − L is indeed one of the factors of 12 = I − L12 . Although. there is a redundancy in the 12 notation to we should draw attention. Therefore. The factorisation may be demonstrated in three stages. The next step is to recognise that (10) (I + L2 + L4 + · · · + L10 ) √ √ = (1 − 3L + L2 )(I − L + L2 )(I + L2 )(I + L + L2 )(1 + 3L + L2 ). Moreover. it is easy to see that (9) I − L12 = (I − L)(I + L + L2 + · · · + L11 ) = (I − L)(I + L2 + L4 + · · · + L10 )(I + L). Factorisation of The Seasonal Diﬀerence Operator The equation under (8) should be regarded as a portmanteau in which a collection of simpliﬁed models can be placed.G. POLLOCK : SEASONALITY they describe it as an ARIMA (P. inadvertently. 3 . Q) × (p. d. it is unusual to ﬁnd values other that D = 0. To begin. q) model. 1. if the sequence y(t) has been reduced to stationarity already by the application of d ﬁrst-order diﬀerencing operations. unless this factor is eliminated. The factorisation of the seasonal diﬀerence operator also helps to explain how the seasonal ARMA model can give rise to seemingly regular cycles of the appropriate duration. The profusion of symbols in equation (8) tends to suggest a model which is too complicated to be of practical use. the seasonal diﬀerence operator 12 is raised to the power D. The twelve factors of the operator D = I − L12 contain the so-called 12 twelfth-order roots of unity which are the solutions of the algebraic equation 1 = z 12 . This redundancy arises from the fact that the seasonal diﬀerence operator D already contains the operator = I −L as 12 one of its factors. Figure 1 shows the disposition of the twelfth roots of unity around the unit circle in the complex plane.D. then its subsequent diﬀerencing via the operator 12 is unnecessary and is liable to destroy some of the characteristics of the sequence which ought to be captured by the ARIMA model.

. j = 0. The graph of the sequence generated by a model with ωj = ω1 = π/6 = 30◦ is given in Figure 2. . will be harmonically related in the manner of the trigonometrical functions comprised by equation (3) which also provides a model for a seasonal time series. 5. imagine that the white-noise sequences εj (t). It follows that a good representation of a seasonal economic time series can be obtained by taking a weighted combination of the stochastic sequences.G. . . Also included in this set should be the sequences y0 (t) and y6 (t) generated by the ﬁrst-order equations (13) (I − L)y0 (t) = ε0 (t) and (I + L)y6 (t) = ε6 (t). For simplicity. 6 are mutually independent and that their variances can take a variety of values. + 2 I − L j=1 I − 2 cos(ωj )L + L I −L 4 5 . POLLOCK: TIME SERIES AND FORECASTING Im i −1 1 Re −i Figure 1. . Consider a simple second-order autoregressive model with complex-valued roots of unit modulus: (12) I − 2 cos(ωj )L + L2 yj (t) = εj (t). . . Now consider generating the full set of stochastic sequences yj (t) for j = 1. Such a model can gives rise to quite regular cycles whose average duration is 2π/ωj periods. These sequences. . which resemble trigonometrical functions. The 12th roots of unity inscribed in the unit circle.S. Then the sum of the stochastic sequences will be given by 6 y(t) = (14) = j=0 yj (t) εj (t) ε0 (t) ε6 (t) + .D.

The terms on the RHS of this expression can be combined. The AIRPASS model takes the form of (16) (I − L12 )(I − L)y(t) = (1 − θL12 )(1 − µL)ε(t). 12 y(t) The equation of this model is contained within the portmanteau equation of the general multiplicative model given under (8). or. Notice how the unit-root autoregressive operators I −L12 and I −L are coupled with the moving-average operators I − θL12 and I − µL respectively. 1. 5 . 1) model which Box and Jenkins ﬁtted to the logarithms of the AIRPASS data. although it represents a simpliﬁcation of the general model. Their common denominator is simply the operator 12 = I − L12 . However. 1. which contain only a few parameter. The numerator is a sum of 7 mutually independent moving-average process.S. equivalently. These serve to enhance the regularity of the stochastic cycles and to smooth the trend. it still contains a number of parameters which is liable to prove excessive. This also amounts to an MA(11) process which can be denoted by η(t) = θ(L)ε(t). POLLOCK : SEASONALITY 25 20 15 10 5 0 −5 −10 −15 −20 −25 0 20 40 60 80 Figure 2.G.D. The graph of 84 observations on a simulated series generated by the AR(2) process (1 − 1. is the ARIMA (0. 1)×(0. Thus the combination of the harmonically related unit-root AR(2) processes gives rise to a seasonal process in the form of (15) y(t) = θ(L) ε(t) I − L12 = θ(L)ε(t).732L + L2 )y(t) = ε(t). each with an order of 10 or 11. A typical model.

wherein ν(t) and η(t) are generated by two mutually independent white-noise processes. are ξ(t) = ξ(t − 1) + ν(t). y(t) = ξ(t) + η(t). Also there is a tendency for the phases of the cycles to drift without limit. the seasonal economic series and the series generated by equations such as (16) are. A somewhat diﬀerent forecasting rule is associated with the model deﬁned by the equation (17) (I − L12 )y(t) = (1 − θL12 )ε(t) This equation is analogous to the simple IMA(1. fundamentally. in their ability to forecast the seasonal series. in the long run. In generating forecasts from the model. on the amplitude of the cycles.S. Then the forecasts may be obtained iteratively from a homogeneous diﬀerence equation in which the initial conditions are simply the values of y(t) observed over the preceding twelve months. of very diﬀerent natures. whose combination gives rise to (18). instead. For that purpose the trigonometrical model seems more appropriate. we need only replace the elements of ε(t) which lie in the future by their zero-valued expectations.G.D. it occurs invariably at Christmas time. The two equations. If the latter were a feature of the monthly time series of consumer expenditures. for example. The simplest of the seasonal unit-root models is the one which is speciﬁed by equation (2). 6 (19) . The advantage of unit-root seasonal models does not lie in the realism with which they describe the processes which generate the economic data series. we observe the most recent annual cycle and we extrapolate its form exactly year-in year-out into the indeﬁnite future. In eﬀect. Their advantage lies. The later equation was obtained by combining a ﬁrst-order random walk with a white-noise error of observation. In the case of the series generated by a unit-root stochastic diﬀerence equation. This is a twelfth-order diﬀerence equation with a white-noise forcing function. POLLOCK: TIME SERIES AND FORECASTING Forecasting with Unit-Root Seasonal Models Although their appearances are superﬁcially similar. there is no bound. then we could not expect the annual boom in sales to occur at a deﬁnite time of the year. 1) equation in the form of (18) (I − L)y(t) = (1 − θL)ε(t) which was considered at the beginning of the course. In fact.

S. POLLOCK : SEASONALITY 5 0 −5 −10 −15 0 20 40 60 Figure 3. is generated by combining the following the equations which are analogous to these under (19): (20) ξ(t) = ξ(t − 12) + ν(t). In the latter case. The reference cycle is constructed by taking a geometrically weighted combination of all past annual cycles. which represents the seasonal model which was used by Box and Jenkins. The forecast itself was obtained from a geometrically-weighted combination of all past values of the sequence y(t) which represent erroneous observations on the random-walk process ξ(t). y(t) = ξ(t) + η(t). Equation (17). Here ν(t) and η(t) continue to represent a pair of independent white-noise processes. the forecast function 7 .D. The sample trajectory and the forecast function of the nonstationary 12th-order process y(t) = y(t − 12) + ε(t).G. The analogy with the IMA model is perfect! It is interesting to compare the forecast function of a stochastic unit-root seasonal model of (17) with the forecast function of the corresponding trigonometrical model represented by (3). The forecasts for the seasonal model of (17) are obtained by extrapolating a so-called annual reference cycle into the future so that it applies in every successive year. The procedure for forecasting the IMA model consisted of extrapolating into the indeﬁnite future a constant value yt+1|t which represents the oneˆ step-ahead forecast made at time t.

POLLOCK: TIME SERIES AND FORECASTING depends upon a reference cycle which is the average of all of the annual cycles which are represented by the data set from which the regression parameters have been computed. 8 . in forming its average of previous annual cycles. However. it is not diﬃcult to contrive a regression model which has the same feature. The stochastic model seems to have the advantage that.G.S.D. it gives more weight to recent years.

Sign up to vote on this title

UsefulNot useful- Statistical Analysis Software Packages
- Statistical Analysis
- Analysis of Experimental Data
- statistical analysis
- Statistical Analysis My Ppt
- 2008 - Claeskens, Hjort.pdf
- Experimental Design and Data Analysis for Biologists - Quinn & Keough - Cambridge 2002
- Algebraic and Geometric Methods in Statistics
- Edexcel A2 Psychology Online 9780340966846
- Lee, Nelder & Pawitan 2006 - Generalized Linear Models With Random Effects
- Statistics for Management-For VTU
- Livingstone, Data Analysis
- Nonparametric Statistical Methods Using R
- Physics Jun 2009 Actual Exam Paper Unit 2
- 2009 Freedman Statistical Models RevEd
- Clustering for Data Mining
- Advanced R
- Making Sense of Data a Practical Guide to Exploratory Data Analysis and Data Mining
- Edexcel Core Mathematics 3
- Edexcel A2 Chemistry
- Research Design and Statistical Analysis
- Edexcel A2 Chemistry Questions and Answers
- Concepts of Physics by HC Verma Volume1-
- 1 Trends
- 1 Trends
- Time Series Lecture Notes
- Documentation on R Programming
- Forecasting
- Modelling Financial Time Series With S-Plus - Book
- Modeling Financial Time Series With S-PLUS
- A Short Course of Time-Series Analysis and Forecasting by D S G Pollock

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd