A First Course on

Time Series Analysis
Examples with SAS
Chair of Statistics, University of W¨ urzburg
February 1, 2006
A First Course on
Time Series Analysis— Examples with SAS
by Chair of Statistics, University of W¨ urzburg.
Version 2006.Feb.01
Copyright c 2006 Michael Falk.
Editors Michael Falk, Frank Marohn, Ren´e Michel, Daniel Hof-
mann, Maria Macke
Programs Bernward Tewes, Ren´e Michel, Daniel Hofmann
Layout and Design Peter Dinges
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2 or any later version
published by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included in the
section entitled ”GNU Free Documentation License”.
SAS and all other SAS Institute Inc. product or service names are registered trade-
marks or trademarks of SAS Institute Inc. in the USA and other countries. Windows
is a trademark, Microsoft is a registered trademark of the Microsoft Corporation.
The authors accept no responsibility for errors in the programs mentioned of their
consequences.
Preface
The analysis of real data by means of statistical methods with the aid
of a software package common in industry and administration usually
is not an integral part of mathematics studies, but it will certainly be
part of a future professional work.
The practical need for an investigation of time series data is exempli-
fied by the following plot, which displays the yearly sunspot numbers
between 1749 and 1924. These data are also known as the Wolf or
W¨ olfer (a student of Wolf) Data. For a discussion of these data and
further literature we refer to Wei (1990), Example 5.2.5.
Plot 1: Sunspot data
The present book links up elements from time series analysis with a se-
lection of statistical procedures used in general practice including the
iv
statistical software package SAS (Statistical Analysis System). Conse-
quently this book addresses students of statistics as well as students of
other branches such as economics, demography and engineering, where
lectures on statistics belong to their academic training. But it is also
intended for the practician who, beyond the use of statistical tools, is
interested in their mathematical background. Numerous problems il-
lustrate the applicability of the presented statistical procedures, where
SAS gives the solutions. The programs used are explicitly listed and
explained. No previous experience is expected neither in SAS nor in a
special computer system so that a short training period is guaranteed.
This book is meant for a two semester course (lecture, seminar or
practical training) where the first two chapters can be dealt with in
the first semester. They provide the principal components of the
analysis of a time series in the time domain. Chapters 3, 4 and 5
deal with its analysis in the frequency domain and can be worked
through in the second term. In order to understand the mathematical
background some terms are useful such as convergence in distribution,
stochastic convergence, maximum likelihood estimator as well as a
basic knowledge of the test theory, so that work on the book can start
after an introductory lecture on stochastics. Each chapter includes
exercises. An exhaustive treatment is recommended.
Due to the vast field a selection of the subjects was necessary. Chap-
ter 1 contains elements of an exploratory time series analysis, in-
cluding the fit of models (logistic, Mitscherlich, Gompertz curve) to
a series of data, linear filters for seasonal and trend adjustments
(difference filters, Census X − 11 Program) and exponential filters
for monitoring a system. Autocovariances and autocorrelations as
well as variance stabilizing techniques (Box–Cox transformations) are
introduced. Chapter 2 provides an account of mathematical mod-
els of stationary sequences of random variables (white noise, mov-
ing averages, autoregressive processes, ARIMA models, cointegrated
sequences, ARCH- and GARCH-processes, state-space models) to-
gether with their mathematical background (existence of stationary
processes, covariance generating function, inverse and causal filters,
stationarity condition, Yule–Walker equations, partial autocorrela-
tion). The Box–Jenkins program for the specification of ARMA-
models is discussed in detail (AIC, BIC and HQC information cri-
v
terion). Gaussian processes and maximum likelihod estimation in
Gaussian models are introduced as well as least squares estimators as
a nonparametric alternative. The diagnostic check includes the Box–
Ljung test. Many models of time series can be embedded in state-
space models, which are introduced at the end of Chapter 2. The
Kalman filter as a unified prediction technique closes the analysis of a
time series in the time domain. The analysis of a series of data in the
frequency domain starts in Chapter 3 (harmonic waves, Fourier fre-
quencies, periodogram, Fourier transform and its inverse). The proof
of the fact that the periodogram is the Fourier transform of the empiri-
cal autocovariance function is given. This links the analysis in the time
domain with the analysis in the frequency domain. Chapter 4 gives
an account of the analysis of the spectrum of the stationary process
(spectral distribution function, spectral density, Herglotz’s theorem).
The effects of a linear filter are studied (transfer and power transfer
function, low pass and high pass filters, filter design) and the spectral
densities of ARMA-processes are computed. Some basic elements of
a statistical analysis of a series of data in the frequency domain are
provided in Chapter 5. The problem of testing for a white noise is
dealt with (Fisher’s κ-statistic, Bartlett–Kolmogorov–Smirnov test)
together with the estimation of the spectral density (periodogram,
discrete spectral average estimator, kernel estimator, confidence in-
tervals).
This book is consecutively subdivided in a statistical part and a SAS-
specific part. For better clearness the SAS-specific part, including
the diagrams generated with SAS, is between two horizontal bars,
separating it from the rest of the text.
1 /* This is a sample comment. */
2 /* The first comment in each program will be its name. */
3
4 Program code will be set in typewriter -font.
5
6 Extra -long lines will be broken into smaller lines with
→continuation marked by an arrow and indentation.
7 (Also , the line -number is missing in this case.)
Program 2: Sample program
vi
In this area, you will find a step-by-step expla-
nation of the above program. The keywords will
be set in typewriter-font. Please note that
SAS cannot be explained as a whole this way.
Only the actually used commands will be men-
tioned.
Contents
1 Elements of Exploratory Time Series Analysis 1
1.1 The Additive Model for a Time Series . . . . . . . . . 2
1.2 Linear Filtering of Time Series . . . . . . . . . . . . 16
1.3 Autocovariances and Autocorrelations . . . . . . . . 34
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Models of Time Series 45
2.1 Linear Filters and Stochastic Processes . . . . . . . 45
2.2 Moving Averages and Autoregressive Processes . . 58
2.3 Specification of ARMA-Models: The Box–Jenkins Pro-
gram . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.4 State-Space Models . . . . . . . . . . . . . . . . . . 105
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3 The Frequency Domain Approach of a Time Series 127
3.1 Least Squares Approach with Known Frequencies . 128
3.2 The Periodogram . . . . . . . . . . . . . . . . . . . . 135
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4 The Spectrum of a Stationary Process 151
4.1 Characterizations of Autocovariance Functions . . . 152
4.2 Linear Filters and Frequencies . . . . . . . . . . . . 158
viii Contents
4.3 Spectral Densities of ARMA-Processes . . . . . . . 167
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5 Statistical Analysis in the Frequency Domain 179
5.1 Testing for a White Noise . . . . . . . . . . . . . . . 179
5.2 Estimating Spectral Densities . . . . . . . . . . . . . 188
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Bibliography 215
Index 217
SAS-Index 222
GNU Free Documentation Licence 225
Chapter
1
Elements of Exploratory
Time Series Analysis
A time series is a sequence of observations that are arranged according
to the time of their outcome. The annual crop yield of sugar-beets and
their price per ton for example is recorded in agriculture. The newspa-
pers’ business sections report daily stock prices, weekly interest rates,
monthly rates of unemployment and annual turnovers. Meteorology
records hourly wind speeds, daily maximum and minimum tempera-
tures and annual rainfall. Geophysics is continuously observing the
shaking or trembling of the earth in order to predict possibly impend-
ing earthquakes. An electroencephalogram traces brain waves made
by an electroencephalograph in order to detect a cerebral disease, an
electrocardiogram traces heart waves. The social sciences survey an-
nual death and birth rates, the number of accidents in the home and
various forms of criminal activities. Parameters in a manufacturing
process are permanently monitored in order to carry out an on-line
inspection in quality assurance.
There are, obviously, numerous reasons to record and to analyze the
data of a time series. Among these is the wish to gain a better under-
standing of the data generating mechanism, the prediction of future
values or the optimal control of a system. The characteristic property
of a time series is the fact that the data are not generated indepen-
dently, their dispersion varies in time, they are often governed by a
trend and they have cyclic components. Statistical procedures that
suppose independent and identically distributed data are, therefore,
excluded from the analysis of time series. This requires proper meth-
ods that are summarized under time series analysis.
2 Elements of Exploratory Time Series Analysis
1.1 The Additive Model for a Time Series
The additive model for a given time series y
1
, . . . , y
n
is the assump-
tion that these data are realizations of random variables Y
t
that are
themselves sums of four components
Y
t
= T
t
+Z
t
+S
t
+R
t
, t = 1, . . . , n. (1.1)
where T
t
is a (monotone) function of t, called trend, and Z
t
reflects
some nonrandom long term cyclic influence. Think of the famous
business cycle usually consisting of recession, recovery, growth, and
decline. S
t
describes some nonrandom short term cyclic influence like
a seasonal component whereas R
t
is a random variable grasping all
the deviations from the ideal non-stochastic model y
t
= T
t
+Z
t
+S
t
.
The variables T
t
and Z
t
are often summarized as
G
t
= T
t
+Z
t
, (1.2)
describing the long term behavior of the time series. We suppose in
the following that the expectation E(R
t
) of the error variable exists
and equals zero, reflecting the assumption that the random deviations
above or below the nonrandom model balance each other on the av-
erage. Note that E(R
t
) = 0 can always be achieved by appropriately
modifying one or more of the nonrandom components.
Example 1.1.1. (Unemployed1 Data). The following data y
t
, t =
1, . . . , 51, are the monthly numbers of unemployed workers in the
building trade in Germany from July 1975 to September 1979.
MONTH T UNEMPLYD
July 1 60572
August 2 52461
September 3 47357
October 4 48320
November 5 60219
December 6 84418
January 7 119916
February 8 124350
March 9 87309
1.1 The Additive Model for a Time Series 3
April 10 57035
May 11 39903
June 12 34053
July 13 29905
August 14 28068
September 15 26634
October 16 29259
November 17 38942
December 18 65036
January 19 110728
February 20 108931
March 21 71517
April 22 54428
May 23 42911
June 24 37123
July 25 33044
August 26 30755
September 27 28742
October 28 31968
November 29 41427
December 30 63685
January 31 99189
February 32 104240
March 33 75304
April 34 43622
May 35 33990
June 36 26819
July 37 25291
August 38 24538
September 39 22685
October 40 23945
November 41 28245
December 42 47017
January 43 90920
February 44 89340
March 45 47792
April 46 28448
May 47 19139
June 48 16728
July 49 16523
August 50 16622
September 51 15499
Listing 1.1.1a: Unemployed1 Data.
1 /* unemployed1_listing .sas */
2 TITLE1 ’Listing ’;
3 TITLE2 ’Unemployed1 Data ’;
4
5 /* Read in the data (Data -step) */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt ’;
8 INPUT month $ t unemplyd;
4 Elements of Exploratory Time Series Analysis
9
10 /* Print the data (Proc -step) */
11 PROC PRINT DATA = data1 NOOBS;
12 RUN;QUIT;
Program 1.1.1: Listing of Unemployed1 Data.
This program consists of two main parts, a DATA
and a PROC step.
The DATA step started with the DATA statement
creates a temporary dataset named data1. The
purpose of INFILE is to link the DATA step to
a raw dataset outside the program. The path-
name of this dataset depends on the operat-
ing system; we will use the syntax of MS-DOS,
which is most commonly known. INPUT tells
SAS how to read the data. Three variables are
defined here, where the first one contains char-
acter values. This is determined by the $ sign
behind the variable name. For each variable
one value per line is read from the source into
the computer’s memory.
The statement PROC procedurename
DATA=filename; invokes a procedure that is
linked to the data from filename. Without the
option DATA=filename the most recently cre-
ated file is used.
The PRINT procedure lists the data; it comes
with numerous options that allow control of the
variables to be printed out, ’dress up’ of the dis-
play etc. The SAS internal observation number
(OBS) is printed by default, NOOBS suppresses
the column of observation numbers on each
line of output. An optional VAR statement deter-
mines the order (from left to right) in which vari-
ables are displayed. If not specified (like here),
all variables in the data set will be printed in the
order they were defined to SAS. Entering RUN;
at any point of the program tells SAS that a unit
of work (DATA step or PROC) ended. SAS then
stops reading the program and begins to exe-
cute the unit. The QUIT; statement at the end
terminates the processing of SAS.
A line starting with an asterisk * and ending
with a semicolon ; is ignored. These comment
statements may occur at any point of the pro-
gram except within raw data or another state-
ment.
The TITLE statement generates a title. Its print-
ing is actually suppressed here and in the fol-
lowing.
The following plot of the Unemployed1 Data shows a seasonal compo-
nent and a downward trend. The period from July 1975 to September
1979 might be too short to indicate a possibly underlying long term
business cycle.
1.1 The Additive Model for a Time Series 5
Plot 1.1.2a: Unemployed1 Data.
1 /* unemployed1_plot.sas */
2 TITLE1 ’Plot ’;
3 TITLE2 ’Unemployed1 Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt ’;
8 INPUT month $ t unemplyd;
9
10 /* Graphical Options */
11 AXIS1 LABEL=(ANGLE =90 ’unemployed ’);
12 AXIS2 LABEL=(’t’);
13 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.4 W=1;
14
15 /* Plot the data */
16 PROC GPLOT DATA=data1;
17 PLOT unemplyd*t / VAXIS=AXIS1 HAXIS=AXIS2;
18 RUN; QUIT;
Program 1.1.2: Plot of Unemployed1 Data.
Variables can be plotted by using the GPLOT pro-
cedure, where the graphical output is controlled
by numerous options.
The AXIS statements with the LABEL options
control labelling of the vertical and horizontal
axes. ANGLE=90 causes a rotation of the label
of 90

so that it parallels the (vertical) axis in
this example.
The SYMBOL statement defines the manner in
which the data are displayed. V=DOT C=GREEN
I=JOIN H=0.4 W=1 tell SAS to plot green dots
of height 0.4 and to join themwith a line of width
6 Elements of Exploratory Time Series Analysis
1. The PLOT statement in the GPLOT procedure
is of the form PLOT y-variable*x-variable /
options;, where the options here define the
horizontal and the vertical axes.
Models with a Nonlinear Trend
In the additive model Y
t
= T
t
+R
t
, where the nonstochastic component
is only the trend T
t
reflecting the growth of a system, and assuming
E(R
t
) = 0, we have
E(Y
t
) = T
t
=: f(t).
A common assumption is that the function f depends on several (un-
known) parameters β
1
, . . . , β
p
, i.e.,
f(t) = f(t; β
1
, . . . , β
p
). (1.3)
However, the type of the function f is known. The parameters β
1
, . . . , β
p
are then to be estimated from the set of realizations y
t
of the random
variables Y
t
. A common approach is a least squares estimate
ˆ
β
1
, . . . ,
ˆ
β
p
satisfying

t
_
y
t
−f(t;
ˆ
β
1
, . . . ,
ˆ
β
p
)
_
2
= min
β
1
,...,β
p

t
_
y
t
−f(t; β
1
, . . . , β
p
)
_
2
, (1.4)
whose computation, if it exists at all, is a numerical problem. The
value ˆ y
t
:= f(t;
ˆ
β
1
, . . . ,
ˆ
β
p
) can serve as a prediction of a future y
t
.
The observed differences y
t
− ˆ y
t
are called residuals. They contain
information about the goodness of the fit of our model to the data.
In the following we list several popular examples of trend functions.
The Logistic Function
The function
f
log
(t) := f
log
(t; β
1
, β
2
, β
3
) :=
β
3
1 +β
2
exp(−β
1
t)
, t ∈ R, (1.5)
with β
1
, β
2
, β
3
∈ R \ {0} is the widely used logistic function.
1.1 The Additive Model for a Time Series 7
Plot 1.1.3a: The logistic function f
log
with different values of β
1
, β
2
, β
3
1 /* logistic.sas */
2 TITLE1 ’Plots of the Logistic Function ’;
3
4 /* Generate the data for different logistic functions */
5 DATA data1;
6 beta3 =1;
7 DO beta1= 0.5, 1;
8 DO beta2 =0.1, 1;
9 DO t=-10 TO 10 BY 0.5;
10 s = COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3
→|| ’)’);
11 f_log=beta3 /(1+ beta2*EXP(-beta1*t));
12 OUTPUT;
13 END;
14 END;
15 END;
16
17 /* Graphical Options */
18 SYMBOL1 C=GREEN V=NONE I=JOIN L=1;
19 SYMBOL2 C=GREEN V=NONE I=JOIN L=2;
20 SYMBOL3 C=GREEN V=NONE I=JOIN L=3;
21 SYMBOL4 C=GREEN V=NONE I=JOIN L=33;
22 AXIS1 LABEL=(H=2 ’f’ H=1 ’log ’ H=2 ’(t)’);
8 Elements of Exploratory Time Series Analysis
23 AXIS2 LABEL=(’t’);
24 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’, b’ H=1 ’2’ H=2 ’,b’
→ H=1 ’3’ H=2 ’)=’);
25
26 /*Plot the functions */
27 PROC GPLOT DATA=data1;
28 PLOT f_log*t=s / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
29 RUN; QUIT;
Program 1.1.3: Generating plots of the logistic function f
log
.
A function is plotted by computing its values at
numerous grid points and then joining them.
The computation is done in the DATA step,
where the data file data1 is generated. It con-
tains the values of f log, computed at the grid
t = −10, −9.5, . . . , 10 and indexed by the vector
s of the different choices of parameters. This
is done by nested DO loops. The operator ||
merges two strings and COMPRESS removes the
empty space in the string. OUTPUT then stores
the values of interest of f log, t and s (and the
other variables) in the data set data1.
The four functions are plotted by the GPLOT pro-
cedure by adding =s in the PLOT statement.
This also automatically generates a legend,
which is customized by the LEGEND1 statement.
Here the label is modified by using a greek
font (F=CGREEK) and generating smaller letters
of height 1 for the indices, while assuming a
normal height of 2 (H=1 and H=2). The last fea-
ture is also used in the axis statement. For each
value of s SAS takes a new SYMBOL statement.
They generate lines of different line types (L=1,
2, 3, 33).
We obviously have lim
t→∞
f
log
(t) = β
3
, if β
1
> 0. The value β
3
often
resembles the maximum impregnation or growth of a system. Note
that
1
f
log
(t)
=
1 +β
2
exp(−β
1
t)
β
3
=
1 −exp(−β
1
)
β
3
+ exp(−β
1
)
1 +β
2
exp(−β
1
(t −1))
β
3
=
1 −exp(−β
1
)
β
3
+ exp(−β
1
)
1
f
log
(t −1)
= a +
b
f
log
(t −1)
. (1.6)
This means that there is a linear relationship among 1/f
log
(t). This
can serve as a basis for estimating the parameters β
1
, β
2
, β
3
by an
appropriate linear least squares approach, see Exercises 1.2 and 1.3.
In the following example we fit the logistic trend model (1.5) to the
population growth of the area of North Rhine-Westphalia (NRW),
which is a federal state of Germany.
1.1 The Additive Model for a Time Series 9
Example 1.1.2. (Population1 Data). Table 1.1.1 shows the popu-
lation sizes y
t
in millions of the area of North-Rhine-Westphalia in
5 years steps from 1935 to 1980 as well as their predicted values ˆ y
t
,
obtained from a least squares estimation as described in (1.4) for a
logistic model.
Year t Population sizes y
t
Predicted values ˆ y
t
(in millions) (in millions)
1935 1 11.772 10.930
1940 2 12.059 11.827
1945 3 11.200 12.709
1950 4 12.926 13.565
1955 5 14.442 14.384
1960 6 15.694 15.158
1965 7 16.661 15.881
1970 8 16.914 16.548
1975 9 17.176 17.158
1980 10 17.044 17.710
Table 1.1.1: Population1 Data
As a prediction of the population size at time t we obtain in the logistic
model
ˆ y
t
:=
ˆ
β
3
1 +
ˆ
β
2
exp(−
ˆ
β
1
t)
=
21.5016
1 + 1.1436 exp(−0.1675 t)
with the estimated saturation size
ˆ
β
3
= 21.5016. The following plot
shows the data and the fitted logistic curve.
10 Elements of Exploratory Time Series Analysis
Plot 1.1.4a: NRW population sizes and fitted logistic function.
1 /* population1.sas */
2 TITLE1 ’Population sizes and logistic fit ’;
3 TITLE2 ’Population1 Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\population1.txt ’;
8 INPUT year t pop;
9
10 /* Compute parameters for fitted logistic function */
11 PROC NLIN DATA=data1 OUTEST=estimate;
12 MODEL pop=beta3 /(1+ beta2*EXP(-beta1*t));
13 PARAMETERS beta1=1 beta2=1 beta3 =20;
14 RUN;
15
16 /* Generate fitted logistic function */
17 DATA data2;
18 SET estimate(WHERE=( _TYPE_=’FINAL ’));
19 DO t1=0 TO 11 BY 0.2;
20 f_log=beta3 /(1+ beta2*EXP(-beta1*t1));
21 OUTPUT;
22 END;
23
24 /* Merge data sets */
25 DATA data3;
26 MERGE data1 data2;
27
1.1 The Additive Model for a Time Series 11
28 /* Graphical options */
29 AXIS1 LABEL=(ANGLE =90 ’population in millions ’);
30 AXIS2 LABEL=(’t’);
31 SYMBOL1 V=DOT C=GREEN I=NONE;
32 SYMBOL2 V=NONE C=GREEN I=JOIN W=1;
33
34 /* Plot data with fitted function */
35 PROC GPLOT DATA=data3;
36 PLOT pop*t=1 f_log*t1=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
37 RUN; QUIT;
Program 1.1.4: NRW population sizes and fitted logistic function.
The procedure NLIN fits nonlinear regression
models by least squares. The OUTEST option
names the data set to contain the parame-
ter estimates produced by NLIN. The MODEL
statement defines the prediction equation by
declaring the dependent variable and defining
an expression that evaluates predicted values.
A PARAMETERS statement must follow the PROC
NLIN statement. Each parameter=value ex-
pression specifies the starting values of the
parameter. Using the final estimates of PROC
NLIN by the SET statement in combination with
the WHERE data set option, the second data
step generates the fitted logistic function val-
ues. The options in the GPLOT statement cause
the data points and the predicted function to be
shown in one plot, after they were stored to-
gether in a new data set data3 merging data1
and data2 with the MERGE statement.
The Mitscherlich Function
The Mitscherlich function is typically used for modelling the long
term growth of a system:
f
M
(t) := f
M
(t; β
1
, β
2
, β
3
) := β
1

2
exp(β
3
t), t ≥ 0, (1.7)
where β
1
, β
2
∈ R and β
3
< 0. Since β
3
is negative we have lim
t→∞
f
M
(t) =
β
1
and thus the parameter β
1
is the saturation value of the system.
The (initial) value of the system at the time t = 0 is f
M
(0) = β
1

2
.
The Gompertz Curve
A further quite common function for modelling the increase or de-
crease of a system is the Gompertz curve
f
G
(t) := f
G
(t; β
1
, β
2
, β
3
) := exp(β
1

2
β
t
3
), t ≥ 0, (1.8)
where β
1
, β
2
∈ R and β
3
∈ (0, 1).
12 Elements of Exploratory Time Series Analysis
Plot 1.1.5a: Gompertz curves with different parameters.
1 /* gompertz.sas */
2 TITLE1 ’Gompertz curves ’;
3
4 /* Generate the data for different Gompertz functions */
5 DATA data1;
6 DO beta1 =1;
7 DO beta2=-1, 1;
8 DO beta3 =0.05, 0.5;
9 DO t=0 TO 4 BY 0.05;
10 s = COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3
→|| ’)’);
11 f_g=EXP(beta1+beta2*beta3**t);
12 OUTPUT;
13 END;
14 END;
15 END;
16 END;
17
18 /* Graphical Options */
19 SYMBOL1 C=GREEN V=NONE I=JOIN L=1;
20 SYMBOL2 C=GREEN V=NONE I=JOIN L=2;
21 SYMBOL3 C=GREEN V=NONE I=JOIN L=3;
22 SYMBOL4 C=GREEN V=NONE I=JOIN L=33;
1.1 The Additive Model for a Time Series 13
23 AXIS1 LABEL=(H=2 ’f’ H=1 ’G’ H=2 ’(t)’);
24 AXIS2 LABEL=(’t’);
25 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’,b’ H=1 ’2’ H=2 ’,b’
→H=1 ’3’ H=2 ’)=’);
26
27 /*Plot the functions */
28 PROC GPLOT DATA=data1;
29 PLOT f_g*t=s / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
30 RUN; QUIT;
Program 1.1.5: Plotting the Gompertz curves.
We obviously have
log(f
G
(t)) = β
1

2
β
t
3
= β
1

2
exp(log(β
3
)t),
and thus log(f
G
) is a Mitscherlich function with parameters β
1
, β
2
, log(β
3
).
The saturation size obviously is exp(β
1
).
The Allometric Function
The allometric function
f
a
(t) := f
a
(t; β
1
, β
2
) = β
2
t
β
1
, t ≥ 0, (1.9)
with β
1
∈ R, β
2
> 0, is a common trend function in biometry and
economics. It can be viewed as a particular Cobb–Douglas function,
which is a popular econometric model to describe the output produced
by a system depending on an input. Since
log(f
a
(t)) = log(β
2
) +β
1
log(t), t > 0,
is a linear function of log(t), with slope β
1
and intercept log(β
2
), we
can assume a linear regression model for the logarithmic data log(y
t
)
log(y
t
) = log(β
2
) +β
1
log(t) +ε
t
, t ≥ 1,
where ε
t
are the error variables.
Example 1.1.3. (Income Data). Table 1.1.2 shows the (accumulated)
annual average increases of gross and net incomes in thousands DM
(deutsche mark) in Germany, starting in 1960.
14 Elements of Exploratory Time Series Analysis
Year t Gross income x
t
Net income y
t
1960 0 0 0
1961 1 0.627 0.486
1962 2 1.247 0.973
1963 3 1.702 1.323
1964 4 2.408 1.867
1965 5 3.188 2.568
1966 6 3.866 3.022
1967 7 4.201 3.259
1968 8 4.840 3.663
1969 9 5.855 4.321
1970 10 7.625 5.482
Table 1.1.2: Income Data.
We assume that the increase of the net income y
t
is an allometric
function of the time t and obtain
log(y
t
) = log(β
2
) +β
1
log(t) +ε
t
. (1.10)
The least squares estimates of β
1
and log(β
2
) in the above linear re-
gression model are (see, for example, Theorem 3.2.2 in Falk et al.
(2002))
ˆ
β
1
=

10
t=1
(log(t) −log(t))(log(y
t
) −log(y))

10
t=1
(log(t) −log(t))
2
= 1.019,
where log(t) := 10
−1

10
t=1
log(t) = 1.5104, log(y) := 10
−1

10
t=1
log(y
t
) =
0.7849, and hence

log(β
2
) = log(y) −
ˆ
β
1
log(t) = −0.7549
We estimate β
2
therefore by
ˆ
β
2
= exp(−0.7549) = 0.4700.
The predicted value ˆ y
t
corresponds to the time t
ˆ y
t
= 0.47t
1.019
. (1.11)
1.1 The Additive Model for a Time Series 15
t y
t
− ˆ y
t
1 0.0159
2 0.0201
3 -0.1176
4 -0.0646
5 0.1430
6 0.1017
7 -0.1583
8 -0.2526
9 -0.0942
10 0.5662
Table 1.1.3: Residuals of Income Data.
Table 1.1.3 lists the residuals y
t
− ˆ y
t
by which one can judge the
goodness of fit of the model (1.11).
A popular measure for assessing the fit is the squared multiple corre-
lation coefficient or R
2
-value
R
2
:= 1 −

n
t=1
(y
t
− ˆ y
t
)
2

n
t=1
(y
t
− ¯ y)
2
(1.12)
where ¯ y := n
−1

n
t=1
y
t
is the average of the observations y
t
(cf Sec-
tion 3.3 in Falk et al. (2002)). In the linear regression model with
ˆ y
t
based on the least squares estimates of the parameters, R
2
is
necessarily between zero and one with the implications R
2
= 1 iff
1

n
t=1
(y
t
− ˆ y
t
)
2
= 0 (see Exercise 1.4). A value of R
2
close to 1 is in
favor of the fitted model. The model (1.10) has R
2
equal to 0.9934,
whereas (1.11) has R
2
= 0.9789. Note, however, that the initial model
(1.9) is not linear and
ˆ
β
2
is not the least squares estimates, in which
case R
2
is no longer necessarily between zero and one and has therefore
to be viewed with care as a crude measure of fit.
The annual average gross income in 1960 was 6148 DM and the cor-
responding net income was 5178 DM. The actual average gross and
net incomes were therefore ˜ x
t
:= x
t
+ 6.148 and ˜ y
t
:= y
t
+ 5.178 with
1
if and only if
16 Elements of Exploratory Time Series Analysis
the estimated model based on the above predicted values ˆ y
t
ˆ
˜ y
t
= ˆ y
t
+ 5.178 = 0.47t
1.019
+ 5.178.
Note that the residuals ˜ y
t

ˆ
˜ y
t
= y
t
− ˆ y
t
are not influenced by adding
the constant 5.178 to y
t
. The above models might help judging the
average tax payer’s situation between 1960 and 1970 and to predict
his future one. It is apparent from the residuals in Table 1.1.3 that
the net income y
t
is an almost perfect multiple of t for t between 1
and 9, whereas the large increase y
10
in 1970 seems to be an outlier.
Actually, in 1969 the German government had changed and in 1970 a
long strike in Germany caused an enormous increase in the income of
civil servants.
1.2 Linear Filtering of Time Series
In the following we consider the additive model (1.1) and assume that
there is no long term cyclic component. Nevertheless, we allow a
trend, in which case the smooth nonrandom component G
t
equals the
trend function T
t
. Our model is, therefore, the decomposition
Y
t
= T
t
+S
t
+R
t
, t = 1, 2, . . . (1.13)
with E(R
t
) = 0. Given realizations y
t
, t = 1, 2, . . . , n, of this time
series, the aim of this section is the derivation of estimators
ˆ
T
t
,
ˆ
S
t
of the nonrandom functions T
t
and S
t
and to remove them from the
time series by considering y
t

ˆ
T
t
or y
t

ˆ
S
t
instead. These series are
referred to as the trend or seasonally adjusted time series. The data
y
t
are decomposed in smooth parts and irregular parts that fluctuate
around zero.
Linear Filters
Let a
−r
, a
−r+1
, . . . , a
s
be arbitrary real numbers, where r, s ≥ 0, r +
s + 1 ≤ n. The linear transformation
Y

t
:=
s

u=−r
a
u
Y
t−u
, t = s + 1, . . . , n −r,
1.2 Linear Filtering of Time Series 17
is referred to as a linear filter with weights a
−r
, . . . , a
s
. The Y
t
are
called input and the Y

t
are called output.
Obviously, there are less output data than input data, if (r, s) =
(0, 0). A positive value s > 0 or r > 0 causes a truncation at the
beginning or at the end of the time series; see Example 1.2.1 below.
For convenience, we call the vector of weights (a
u
) = (a
−r
, . . . , a
s
)
T
a
(linear) filter.
A filter (a
u
), whose weights sum up to one,

s
u=−r
a
u
= 1, is called
moving average. The particular cases a
u
= 1/(2s + 1), u = −s, . . . , s,
with an odd number of equal weights, or a
u
= 1/(2s), u = −s +
1, . . . , s −1, a
−s
= a
s
= 1/(4s), aiming at an even number of weights,
are simple moving averages of order 2s + 1 and 2s, respectively.
Filtering a time series aims at smoothing the irregular part of a time
series, thus detecting trends or seasonal components, which might
otherwise be covered by fluctuations. While for example a digital
speedometer in a car can provide its instantaneous velocity, thereby
showing considerably large fluctuations, an analog instrument that
comes with a hand and a built-in smoothing filter, reduces these fluc-
tuations but takes a while to adjust. The latter instrument is much
more comfortable to read and its information, reflecting a trend, is
sufficient in most cases.
To compute the output of a simple moving average of order 2s + 1,
the following obvious equation is useful:
Y

t+1
= Y

t
+
1
2s + 1
(Y
t+s+1
−Y
t−s
).
This filter is a particular example of a low-pass filter, which preserves
the slowly varying trend component of a series but removes from it the
rapidly fluctuating or high frequency component. There is a trade-off
between the two requirements that the irregular fluctuation should be
reduced by a filter, thus leading, for example, to a large choice of s in
a simple moving average, and that the long term variation in the data
should not be distorted by oversmoothing, i.e., by a too large choice
of s. If we assume, for example, a time series Y
t
= T
t
+ R
t
without
seasonal component, a simple moving average of order 2s +1 leads to
Y

t
=
1
2s + 1
s

u=−s
Y
t−u
=
1
2s + 1
s

u=−s
T
t−u
+
1
2s + 1
s

u=−s
R
t−u
=: T

t
+R

t
,
18 Elements of Exploratory Time Series Analysis
where by some law of large numbers argument
R

t
∼ E(R
t
) = 0,
if s is large. But T

t
might then no longer reflect T
t
. A small choice of
s, however, has the effect that R

t
is not yet close to its expectation.
Seasonal Adjustment
A simple moving average of a time series Y
t
= T
t
+ S
t
+ R
t
now
decomposes as
Y

t
= T

t
+S

t
+R

t
,
where S

t
is the pertaining moving average of the seasonal components.
Suppose, moreover, that S
t
is a p-periodic function, i.e.,
S
t
= S
t+p
, t = 1, . . . , n −p.
Take for instance monthly average temperatures Y
t
measured at fixed
points, in which case it is reasonable to assume a periodic seasonal
component S
t
with period p = 12 months. A simple moving average
of order p then yields a constant value S

t
= S, t = p, p +1, . . . , n−p.
By adding this constant S to the trend function T
t
and putting T

t
:=
T
t
+ S, we can assume in the following that S = 0. Thus we obtain
for the differences
D
t
:= Y
t
−Y

t
∼ S
t
+R
t
and, hence, averaging these differences yields
¯
D
t
:=
1
n
t
n
t
−1

j=0
D
t+jp
∼ S
t
, t = 1, . . . , p,
¯
D
t
:=
¯
D
t−p
for t > p,
where n
t
is the number of periods available for the computation of
¯
D
t
.
Thus,
ˆ
S
t
:=
¯
D
t

1
p
p

j=1
¯
D
j
∼ S
t

1
p
p

j=1
S
j
= S
t
(1.14)
1.2 Linear Filtering of Time Series 19
is an estimator of S
t
= S
t+p
= S
t+2p
= . . . satisfying
1
p
p−1

j=0
ˆ
S
t+j
= 0 =
1
p
p−1

j=0
S
t+j
.
The differences Y
t

ˆ
S
t
with a seasonal component close to zero are
then the seasonally adjusted time series.
Example 1.2.1. For the 51 Unemployed1 Data in Example 1.1.1 it
is obviously reasonable to assume a periodic seasonal component with
p = 12 months. A simple moving average of order 12
Y

t
=
1
12
_
1
2
Y
t−6
+
5

u=−5
Y
t−u
+
1
2
Y
t+6
_
, t = 7, . . . , 45,
then has a constant seasonal component, which we assume to be zero
by adding this constant to the trend function. Table 1.2.1 contains
the values of D
t
,
¯
D
t
and the estimates
ˆ
S
t
of S
t
.
d
t
(rounded values)
Month 1976 1977 1978 1979
¯
d
t
(rounded) ˆ s
t
(rounded)
January 53201 56974 48469 52611 52814 53136
February 59929 54934 54102 51727 55173 55495
March 24768 17320 25678 10808 19643 19966
April -3848 42 -5429 – -3079 -2756
May -19300 -11680 -14189 – -15056 -14734
June -23455 -17516 -20116 – -20362 -20040
July -26413 -21058 -20605 – -22692 -22370
August -27225 -22670 -20393 – -23429 -23107
September -27358 -24646 -20478 – -24161 -23839
October -23967 -21397 -17440 – -20935 -20612
November -14300 -10846 -11889 – -12345 -12023
December 11540 12213 7923 – 10559 10881
Table 1.2.1: Table of d
t
,
¯
d
t
and of estimates ˆ s
t
of the seasonal compo-
nent S
t
in the Unemployed1 Data.
We obtain for these data
ˆ s
t
=
¯
d
t

1
12
12

j=1
¯
d
j
=
¯
d
t
+
3867
12
=
¯
d
t
+ 322.25.
20 Elements of Exploratory Time Series Analysis
Example 1.2.2. (Temperatures Data). The monthly average tem-
peratures near W¨ urzburg, Germany were recorded from the 1st of
January 1995 to the 31st of December 2004. The data together with
their seasonally adjusted counterparts can be seen in Figure 1.2.1a.
Plot 1.2.1a: Monthly average temperatures near W¨ urzburg and sea-
sonally adjusted values.
1 /* temperatures.sas */
2 TITLE1 ’Original and seasonally adjusted data ’;
3 TITLE2 ’Temperature data ’;
4
5 /* Read in the data and generate SAS -formatted date */
6 DATA temperatures;
7 INFILE ’c:\data\temperatures.txt ’;
8 INPUT temperature;
9 date=INTNX(’month ’,’01jan95 ’d,_N_ -1);
10 FORMAT date yymon.;
11
12 /* Make seasonal adjustment */
13 PROC TIMESERIES DATA=temperatures OUT=series SEASONALITY =12
→OUTDECOMP=deseason;
14 VAR temperature;
15 DECOMP /MODE=ADD;
16
1.2 Linear Filtering of Time Series 21
17 /* Merge necessary data for plot */
18 DATA plotseries;
19 MERGE temperatures deseason(KEEP=SA);
20
21 /* Graphical options */
22 AXIS1 LABEL=(ANGLE =90 ’temperatures ’);
23 AXIS2 LABEL=(’Date ’);
24 SYMBOL1 V=DOT C=GREEN I=JOIN H=1 W=1;
25 SYMBOL2 V=STAR C=GREEN I=JOIN H=1 W=1;
26
27 /* Plot data and seasonally adjusted series */
28 PROC GPLOT data=plotseries;
29 PLOT temperature*date=1 SA*date=2 /OVERLAY VAXIS=AXIS1 HAXIS=
→AXIS2;
30
31 RUN; QUIT;
Program 1.2.1: Seasonal adjustment of Temperatures Data.
In the data step the values for the variable
temperature are read from an external file. By
means of the function INTNX, a new variable in
a date format is generated containing monthly
data starting from the 1st of January 1995. The
temporarily created variable N , which counts
the number of cases, is used to determine the
distance from the starting value. The FORMAT
statement attributes the format yymon to this
variable, consisting of four digits for the year
and three for the month.
The SAS procedure TIMESERIES together with
the statement DECOMP computes a seasonally
adjusted series, which is stored in the file after
the OUTDECOMP option. With MODE=ADD an addi-
tive model of the time series is assumed. The
default is a multiplicative model. The original
series together with an automated time variable
(just a counter) is stored in the file specified in
the OUT option. In the option SEASONALITY the
underlying period is specified. Depending on
the data it can be any natural number.
The seasonally adjusted values can be refer-
enced by SA and are plotted together with the
original series against the date in the final step.
The Census X-11 Program
In the fifties of the 20th century the U.S. Bureau of the Census has
developed a program for seasonal adjustment of economic time series,
called the Census X-11 Program. It is based on monthly observations
and assumes an additive model
Y
t
= T
t
+S
t
+R
t
as in (1.13) with a seasonal component S
t
of period p = 12. We give a
brief summary of this program following Wallis (1974), which results
in a moving average with symmetric weights. The census procedure
22 Elements of Exploratory Time Series Analysis
is discussed in Shiskin and Eisenpress (1957); a complete description
is given by Shiskin, Young and Musgrave (1967). A theoretical justifi-
cation based on stochastic models is provided by Cleveland and Tiao
(1976).
The X-11 Program essentially works as the seasonal adjustment de-
scribed above, but it adds iterations and various moving averages.
The different steps of this program are
(1) Compute a simple moving average Y

t
of order 12 to leave essen-
tially a trend Y

t
∼ T
t
.
(2) The difference
D
t
:= Y
t
−Y

t
∼ S
t
+R
t
then leaves approximately the seasonal plus irregular compo-
nent.
(3) Apply a moving average of order 5 to each month separately by
computing
¯
D
(1)
t
:=
1
9
_
D
(1)
t−24
+ 2D
(1)
t−12
+ 3D
(1)
t
+ 2D
(1)
t+12
+D
(1)
t+24
_
∼ S
t
,
which gives an estimate of the seasonal component S
t
. Note
that the moving average with weights (1, 2, 3, 2, 1)/9 is a simple
moving average of length 3 of simple moving averages of length
3.
(4) The
¯
D
(1)
t
are adjusted to approximately sum up to 0 over any
12-months period by putting
ˆ
S
(1)
t
:=
¯
D
(1)
t

1
12
_
1
2
¯
D
(1)
t−6
+
¯
D
(1)
t−5
+· · · +
¯
D
(1)
t+5
+
1
2
¯
D
(1)
t+6
_
.
(5) The differences
Y
(1)
t
:= Y
t

ˆ
S
(1)
t
∼ T
t
+R
t
then are the preliminary seasonally adjusted series, quite in the
manner as before.
1.2 Linear Filtering of Time Series 23
(6) The adjusted data Y
(1)
t
are further smoothed by a Henderson
moving average Y
∗∗
t
of order 9, 13, or 23.
(7) The differences
D
(2)
t
:= Y
t
−Y
∗∗
t
∼ S
t
+R
t
then leave a second estimate of the sum of the seasonal and
irregular components.
(8) A moving average of order 7 is applied to each month separately
¯
D
(2)
t
:=
3

u=−3
a
u
D
(2)
t−12u
,
where the weights a
u
come from a simple moving average of
order 3 applied to a simple moving average of order 5 of the
original data, i.e., the vector of weights is (1, 2, 3, 3, 3, 2, 1)/15.
This gives a second estimate of the seasonal component S
t
.
(9) Step (4) is repeated yielding approximately centered estimates
ˆ
S
(2)
t
of the seasonal components.
(10) The differences
Y
(2)
t
:= Y
t

ˆ
S
(2)
t
then finally give the seasonally adjusted series.
Depending on the length of the Henderson moving average used in
step (6), Y
(2)
t
is a moving average of length 165, 169 or 179 of the
original data. Observe that this leads to averages at time t of the
past and future seven years, roughly, where seven years is a typical
length of business cycles observed in economics (Juglar cycle)
2
.
The U.S. Bureau of Census has recently released an extended version
of the X-11 Program called Census X-12-ARIMA. It is implemented
in SAS version 8.1 and higher as PROC X12; we refer to the SAS
online documentation for details.
2
http://www.drfurfero.com/books/231book/ch05j.html
24 Elements of Exploratory Time Series Analysis
We will see in Example 4.2.4 that linear filters may cause unexpected
effects and so, it is not clear a priori how the seasonal adjustment filter
described above behaves. Moreover, end-corrections are necessary,
which cover the important problem of adjusting current observations.
This can be done by some extrapolation.
Plot 1.2.2a: Plot of the Unemployed1 Data y
t
and of y
(2)
t
, seasonally
adjusted by the X-11 procedure.
1 /* unemployed1_x11.sas */
2 TITLE1 ’Original and X11 seasonal adjusted data ’;
3 TITLE2 ’Unemployed1 Data ’;
4
5 /* Read in the data and generated SAS -formatted date */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt ’;
8 INPUT month $ t upd;
9 date=INTNX(’month ’,’01jul75 ’d, _N_ -1);
10 FORMAT date yymon.;
11
12 /* Apply X-11- Program */
13 PROC X11 DATA=data1;
14 MONTHLY DATE=date ADDITIVE;
1.2 Linear Filtering of Time Series 25
15 VAR upd;
16 OUTPUT OUT=data2 B1=upd D11=updx11;
17
18 /* Graphical options */
19 AXIS1 LABEL=(ANGLE =90 ’unemployed ’);
20 AXIS2 LABEL=(’Date ’) ;
21 SYMBOL1 V=DOT C=GREEN I=JOIN H=1 W=1;
22 SYMBOL2 V=STAR C=GREEN I=JOIN H=1 W=1;
23 LEGEND1 LABEL=NONE VALUE=(’original ’ ’adjusted ’);
24
25 /* Plot data and adjusted data */
26 PROC GPLOT DATA=data2;
27 PLOT upd*date=1 updx11*date=2
28 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
29 RUN; QUIT;
Program 1.2.2: Application of the X-11 procedure to the Unemployed1
Data.
In the data step values for the variables month,
t and upd are read from an external file, where
month is defined as a character variable by
the succeeding $ in the INPUT statement. By
means of the function INTNX, a date variable
is generated, see Program 1.2.1 (tempera-
tures.sas).
The SAS procedure X11 applies the Census X-
11 Program to the data. The MONTHLY state-
ment selects an algorithm for monthly data,
DATE defines the date variable and ADDITIVE
selects an additive model (default: multiplica-
tive model). The results for this analysis for the
variable upd (unemployed) are stored in a data
set named data2, containing the original data
in the variable upd and the final results of the
X-11 Program in updx11.
The last part of this SAS program consists of
statements for generating the plot. Two AXIS
and two SYMBOL statements are used to cus-
tomize the graphic containing two plots, the
original data and the by X11 seasonally ad-
justed data. A LEGEND statement defines the
text that explains the symbols.
Best Local Polynomial Fit
A simple moving average works well for a locally almost linear time
series, but it may have problems to reflect a more twisted shape.
This suggests fitting higher order local polynomials. Consider 2k +
1 consecutive data y
t−k
, . . . , y
t
, . . . , y
t+k
from a time series. A local
polynomial estimator of order p < 2k + 1 is the minimizer β
0
, . . . , β
p
satisfying
k

u=−k
(y
t+u
−β
0
−β
1
u −· · · −β
p
u
p
)
2
= min . (1.15)
26 Elements of Exploratory Time Series Analysis
If we differentiate the left hand side with respect to each β
j
and set
the derivatives equal to zero, we see that the minimizers satisfy the
p + 1 linear equations
β
0
k

u=−k
u
j

1
k

u=−k
u
j+1
+· · · +β
p
k

u=−k
u
j+p
=
k

u=−k
u
j
y
t+u
for j = 0, . . . , p. These p + 1 equations, which are called normal
equations, can be written in matrix form as
X
T
Xβ = X
T
y (1.16)
where
X =
_
_
_
_
1 −k (−k)
2
. . . (−k)
p
1 −k + 1 (−k + 1)
2
. . . (−k + 1)
p
.
.
.
.
.
.
.
.
.
1 k k
2
. . . k
p
_
_
_
_
(1.17)
is the design matrix, β = (β
0
, . . . , β
p
)
T
and y = (y
t−k
, . . . , y
t+k
)
T
.
The rank of X
T
X equals that of X, since their null spaces coincide
(Exercise 1.11). Thus, the matrix X
T
X is invertible iff the columns
of X are linearly independent. But this is an immediate consequence
of the fact that a polynomial of degree p has at most p different roots
(Exercise 1.12). The normal equations (1.16) have, therefore, the
unique solution
β = (X
T
X)
−1
X
T
y. (1.18)
The linear prediction of y
t+u
, based on u, u
2
, . . . , u
p
, is
ˆ y
t+u
= (1, u, . . . , u
p
)β =
p

j=0
β
j
u
j
.
Choosing u = 0 we obtain in particular that β
0
= ˆ y
t
is a predictor of
the central observation y
t
among y
t−k
, . . . , y
t+k
. The local polynomial
approach consists now in replacing y
t
by the intercept β
0
.
Though it seems as if this local polynomial fit requires a great deal
of computational effort by calculating β
0
for each y
t
, it turns out that
1.2 Linear Filtering of Time Series 27
it is actually a moving average. First observe that we can write by
(1.18)
β
0
=
k

u=−k
c
u
y
t+u
with some c
u
∈ R which do not depend on the values y
u
of the time
series and hence, (c
u
) is a linear filter. Next we show that the c
u
sum
up to 1. Choose to this end y
t+u
= 1 for u = −k, . . . , k. Then β
0
= 1,
β
1
= · · · = β
p
= 0 is an obvious solution of the minimization problem
(1.15). Since this solution is unique, we obtain
1 = β
0
=
k

u=−k
c
u
and thus, (c
u
) is a moving average. As can be seen in Exercise 1.13
it actually has symmetric weights. We summarize our considerations
in the following result.
Theorem 1.2.3. Fitting locally by least squares a polynomial of degree
p to 2k +1 > p consecutive data points y
t−k
, . . . , y
t+k
and predicting y
t
by the resulting intercept β
0
, leads to a moving average (c
u
) of order
2k + 1, given by the first row of the matrix (X
T
X)
−1
X
T
.
Example 1.2.4. Fitting locally a polynomial of degree 2 to five con-
secutive data points leads to the moving average (Exercise 1.13)
(c
u
) =
1
35
(−3, 12, 17, 12, −3)
T
.
An extensive discussion of local polynomial fit is in Kendall and Ord
(1993), Sections 3.2-3.13. For a book-length treatment of local poly-
nomial estimation we refer to Fan and Gijbels (1996). An outline of
various aspects such as the choice of the degree of the polynomial and
further background material is given in Section 5.2 of Simonoff (1996).
Difference Filter
We have already seen that we can remove a periodic seasonal compo-
nent from a time series by utilizing an appropriate linear filter. We
28 Elements of Exploratory Time Series Analysis
will next show that also a polynomial trend function can be removed
by a suitable linear filter.
Lemma 1.2.5. For a polynomial f(t) := c
0
+c
1
t +· · · +c
p
t
p
of degree
p, the difference
∆f(t) := f(t) −f(t −1)
is a polynomial of degree at most p −1.
Proof. The assertion is an immediate consequence of the binomial
expansion
(t −1)
p
=
p

k=0
_
p
k
_
t
k
(−1)
p−k
= t
p
−pt
p−1
+· · · + (−1)
p
.
The preceding lemma shows that differencing reduces the degree of a
polynomial. Hence,

2
f(t) := ∆f(t) −∆f(t −1) = ∆(∆f(t))
is a polynomial of degree not greater than p −2, and

q
f(t) := ∆(∆
q−1
f(t)), 1 ≤ q ≤ p,
is a polynomial of degree at most p − q. The function ∆
p
f(t) is
therefore a constant. The linear filter
∆Y
t
= Y
t
−Y
t−1
with weights a
0
= 1, a
1
= −1 is the first order difference filter. The
recursively defined filter

p
Y
t
= ∆(∆
p−1
Y
t
), t = p, . . . , n,
is the difference filter of order p.
The difference filter of second order has, for example, weights a
0
=
1, a
1
= −2, a
2
= 1

2
Y
t
= ∆Y
t
−∆Y
t−1
= Y
t
−Y
t−1
−Y
t−1
+Y
t−2
= Y
t
−2Y
t−1
+Y
t−2
.
1.2 Linear Filtering of Time Series 29
If a time series Y
t
has a polynomial trend T
t
=

p
k=0
c
k
t
k
for some
constants c
k
, then the difference filter ∆
p
Y
t
of order p removes this
trend up to a constant. Time series in economics often have a trend
function that can be removed by a first or second order difference
filter.
Example 1.2.6. (Electricity Data). The following plots show the
total annual output of electricity production in Germany between 1955
and 1979 in millions of kilowatt-hours as well as their first and second
order differences. While the original data show an increasing trend,
the second order differences fluctuate around zero having no more
trend, but there is now an increasing variability visible in the data.
30 Elements of Exploratory Time Series Analysis
Plot 1.2.3a: Annual electricity output, first and second order differ-
ences.
1 /* electricity_differences .sas */
2 TITLE1 ’First and second order differences ’;
3 TITLE2 ’Electricity Data ’;
4
5 /* Read in the data , compute moving average of length as 12
6 as well as first and second order differences */
7 DATA data1(KEEP=year sum delta1 delta2);
8 INFILE ’c:\data\electric.txt ’;
9 INPUT year t jan feb mar apr may jun jul aug sep oct nov dec;
10 sum=jan+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec;
11 delta1=DIF(sum);
12 delta2=DIF(delta1);
13
14 /* Graphical options */
15 AXIS1 LABEL=NONE;
16 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
1.2 Linear Filtering of Time Series 31
17
18 /* Generate three plots */
19 GOPTIONS NODISPLAY;
20 PROC GPLOT DATA=data1 GOUT=fig;
21 PLOT sum*year / VAXIS=AXIS1 HAXIS=AXIS2;
22 PLOT delta1*year / VAXIS=AXIS1 VREF =0;
23 PLOT delta2*year / VAXIS=AXIS1 VREF =0;
24 RUN;
25
26 /* Display them in one output */
27 GOPTIONS DISPLAY;
28 PROC GREPLAY NOFS IGOUT=fig TC=SASHELP.TEMPLT;
29 TEMPLATE=V3;
30 TREPLAY 1:GPLOT 2: GPLOT1 3: GPLOT2;
31 RUN; DELETE _ALL_; QUIT;
Program 1.2.3: Computation of first and second order differences for
the Electricity Data.
In the first data step, the raw data are read from
a file. Because the electric production is stored
in different variables for each month of a year,
the sum must be evaluated to get the annual
output. Using the DIF function, the resulting
variables delta1 and delta2 contain the first
and second order differences of the original an-
nual sums.
To display the three plots of sum, delta1 and
delta2 against the variable year within one
graphic, they are first plotted using the proce-
dure GPLOT. Here the option GOUT=fig stores
the plots in a graphics catalog named fig, while
GOPTIONS NODISPLAY causes no output of this
procedure. After changing the GOPTIONS back
to DISPLAY, the procedure GREPLAY is invoked.
The option NOFS (no full-screen) suppresses the
opening of a GREPLAY window. The subsequent
two line mode statements are read instead.
The option IGOUT determines the input graph-
ics catalog, while TC=SASHELP.TEMPLT causes
SAS to take the standard template catalog. The
TEMPLATE statement selects a template from
this catalog, which puts three graphics one be-
low the other. The TREPLAY statement connects
the defined areas and the plots of the the graph-
ics catalog. GPLOT, GPLOT1 and GPLOT2 are the
graphical outputs in the chronological order of
the GPLOT procedure. The DELETE statement af-
ter RUN deletes all entries in the input graphics
catalog.
Note that SAS by default prints borders, in or-
der to separate the different plots. Here these
border lines are suppressed by defining WHITE
as the border color.
For a time series Y
t
= T
t
+S
t
+R
t
with a periodic seasonal component
S
t
= S
t+p
= S
t+2p
= . . . the difference
Y

t
:= Y
t
−Y
t−p
obviously removes the seasonal component. An additional differencing
of proper length can moreover remove a polynomial trend, too. Note
that the order of seasonal and trend adjusting makes no difference.
32 Elements of Exploratory Time Series Analysis
Exponential Smoother
Let Y
0
, . . . , Y
n
be a time series and let α ∈ [0, 1] be a constant. The
linear filter
Y

t
= αY
t
+ (1 −α)Y

t−1
, t ≥ 1,
with Y

0
= Y
0
is called exponential smoother.
Lemma 1.2.7. For an exponential smoother with constant α ∈ [0, 1]
we have
Y

t
= α
t−1

j=0
(1 −α)
j
Y
t−j
+ (1 −α)
t
Y
0
, t = 1, 2, . . . , n.
Proof. The assertion follows from induction. We have for t = 1 by
definition Y

1
= αY
1
+(1−α)Y
0
. If the assertion holds for t, we obtain
for t + 1
Y

t+1
= αY
t+1
+ (1 −α)Y

t
= αY
t+1
+ (1 −α)
_
α
t−1

j=0
(1 −α)
j
Y
t−j
+ (1 −α)
t
Y
0
_
= α
t

j=0
(1 −α)
j
Y
t+1−j
+ (1 −α)
t+1
Y
0
.
The parameter α determines the smoothness of the filtered time se-
ries. A value of α close to 1 puts most of the weight on the actual
observation Y
t
, resulting in a highly fluctuating series Y

t
. On the
other hand, an α close to 0 reduces the influence of Y
t
and puts most
of the weight to the past observations, yielding a smooth series Y

t
.
An exponential smoother is typically used for monitoring a system.
Take, for example, a car having an analog speedometer with a hand.
It is more convenient for the driver if the movements of this hand are
smoothed, which can be achieved by α close to zero. But this, on the
other hand, has the effect that an essential alteration of the speed can
be read from the speedometer only with a certain delay.
1.2 Linear Filtering of Time Series 33
Corollary 1.2.8. (i) Suppose that the random variables Y
0
, . . . , Y
n
have common expectation µ and common variance σ
2
> 0. Then we
have for the exponentially smoothed variables with smoothing parame-
ter α ∈ (0, 1)
E(Y

t
) = α
t−1

j=0
(1 −α)
j
µ +µ(1 −α)
t
= µ(1 −(1 −α)
t
) +µ(1 −α)
t
= µ. (1.19)
If the Y
t
are in addition uncorrelated, then
E((Y

t
−µ)
2
) = α
2
t−1

j=0
(1 −α)
2j
σ
2
+ (1 −α)
2t
σ
2
= σ
2
α
2
1 −(1 −α)
2t
1 −(1 −α)
2
+ (1 −α)
2t
σ
2
t→∞
−→
σ
2
α
2 −α
< σ
2
. (1.20)
(ii) Suppose that the random variables Y
0
, Y
1
, . . . satisfy E(Y
t
) = µ for
0 ≤ t ≤ N −1, and E(Y
t
) = λ for t ≥ N. Then we have for t ≥ N
E(Y

t
) = α
t−N

j=0
(1 −α)
j
λ +α
t−1

j=t−N+1
(1 −α)
j
µ + (1 −α)
t
µ
= λ(1 −(1 −α)
t−N+1
) +µ
_
(1 −α)
t−N+1
(1 −(1 −α)
N−1
) + (1 −α)
t
_
t→∞
−→ λ. (1.21)
The preceding result quantifies the influence of the parameter α on
the expectation and on the variance i.e., the smoothness of the filtered
series Y

t
, where we assume for the sake of a simple computation of
the variance that the Y
t
are uncorrelated. If the variables Y
t
have
common expectation µ, then this expectation carries over to Y

t
. After
a change point N, where the expectation of Y
t
changes for t ≥ N from
µ to λ = µ, the filtered variables Y

t
are, however, biased. This bias,
which will vanish as t increases, is due to the still inherent influence
of past observations Y
t
, t < N. The influence of these variables on the
34 Elements of Exploratory Time Series Analysis
current expectation can be reduced by switching to a larger value of
α. The price for the gain in correctness of the expectation is, however,
a higher variability of Y

t
(see Exercise 1.16).
An exponential smoother is often also used to make forecasts, explic-
itly by predicting Y
t+1
through Y

t
. The forecast error Y
t+1
−Y

t
=: e
t+1
then satisfies the equation Y

t+1
= αe
t+1
+Y

t
.
1.3 Autocovariances and Autocorrelations
Autocovariances and autocorrelations are measures of dependence be-
tween variables in a time series. Suppose that Y
1
, . . . , Y
n
are square
integrable random variables with the property that the covariance
Cov(Y
t+k
, Y
t
) = E((Y
t+k
− E(Y
t+k
))(Y
t
− E(Y
t
))) of observations with
lag k does not depend on t. Then
γ(k) := Cov(Y
k+1
, Y
1
) = Cov(Y
k+2
, Y
2
) = . . .
is called autocovariance function and
ρ(k) :=
γ(k)
γ(0)
, k = 0, 1, . . .
is called autocorrelation function.
Let y
1
, . . . , y
n
be realizations of a time series Y
1
, . . . , Y
n
. The empirical
counterpart of the autocovariance function is
c(k) :=
1
n
n−k

t=1
(y
t+k
− ¯ y)(y
t
− ¯ y) with ¯ y =
1
n
n

t=1
y
t
and the empirical autocorrelation is defined by
r(k) :=
c(k)
c(0)
=

n−k
t=1
(y
t+k
− ¯ y)(y
t
− ¯ y)

n
t=1
(y
t
− ¯ y)
2
.
See Exercise 2.8 (ii) for the particular role of the factor 1/n in place
of 1/(n − k) in the definition of c(k). The graph of the function
r(k), k = 0, 1, . . . , n − 1, is called correlogram. It is based on the
assumption of equal expectations and should, therefore, be used for a
1.3 Autocovariances and Autocorrelations 35
trend adjusted series. The following plot is the correlogram of the first
order differences of the Sunspot Data. The description can be found
on page 199. It shows high and decreasing correlations at regular
intervals.
Plot 1.3.1a: Correlogram of the first order differences of the Sunspot
Data.
1 /* sunspot_correlogram */
2 TITLE1 ’Correlogram of first order differences ’;
3 TITLE2 ’Sunspot Data ’;
4
5 /* Read in the data , generate year of observation and
6 compute first order differences */
7 DATA data1;
8 INFILE ’c:\data\sunspot.txt ’;
9 INPUT spot @@;
10 date =1748+ _N_;
11 diff1=DIF(spot);
12
13 /* Compute autocorrelation function */
14 PROC ARIMA DATA=data1;
15 IDENTIFY VAR=diff1 NLAG =49 OUTCOV=corr NOPRINT;
16
17 /* Graphical options */
18 AXIS1 LABEL=(’r(k)’);
36 Elements of Exploratory Time Series Analysis
19 AXIS2 LABEL=(’k’) ORDER =(0 12 24 36 48) MINOR=(N=11);
20 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
21
22 /* Plot autocorrelation function */
23 PROC GPLOT DATA=corr;
24 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF =0;
25 RUN; QUIT;
Program 1.3.1: Generating the correlogram of first order differences
for the Sunspot Data.
In the data step, the raw data are read into the
variable spot. The specification @@ suppresses
the automatic line feed of the INPUT statement
after every entry in each row. The variable date
and the first order differences of the variable of
interest spot are calculated.
The following procedure ARIMA is a crucial one
in time series analysis. Here we just need
the autocorrelation of delta, which will be cal-
culated up to a lag of 49 (NLAG=49) by the
IDENTIFY statement. The option OUTCOV=corr
causes SAS to create a data set corr contain-
ing among others the variables LAG and CORR.
These two are used in the following GPLOT pro-
cedure to obtain a plot of the autocorrelation
function. The ORDER option in the AXIS2 state-
ment specifies the values to appear on the
horizontal axis as well as their order, and the
MINOR option determines the number of minor
tick marks between two major ticks. VREF=0
generates a horizontal reference line through
the value 0 on the vertical axis.
The autocovariance function γ obviously satisfies γ(0) ≥ 0 and, by
the Cauchy-Schwarz inequality
|γ(k)| = | E((Y
t+k
−E(Y
t+k
))(Y
t
−E(Y
t
)))|
≤ E(|Y
t+k
−E(Y
t+k
)||Y
t
−E(Y
t
)|)
≤ Var(Y
t+k
)
1/2
Var(Y
t
)
1/2
= γ(0) for k ≥ 0.
Thus we obtain for the autocovariance function the inequality
|ρ(k)| ≤ 1 = ρ(0).
Variance Stabilizing Transformation
The scatterplot of the points (t, y
t
) sometimes shows a variation of
the data y
t
depending on their height.
Example 1.3.1. (Airline Data). Plot 1.3.2a, which displays monthly
totals in thousands of international airline passengers from January
1.3 Autocovariances and Autocorrelations 37
1949 to December 1960, exemplifies the above mentioned dependence.
These Airline Data are taken from Box and Jenkins (1976); a discus-
sion can be found in Section 9.2 of Brockwell and Davis (1991).
Plot 1.3.2a: Monthly totals in thousands of international airline pas-
sengers from January 1949 to December 1960.
1 /* airline_plot.sas */
2 TITLE1 ’Monthly totals from January 49 to December 60’;
3 TITLE2 ’Airline Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\airline.txt ’;
8 INPUT y;
9 t=_N_;
10
11 /* Graphical options */
12 AXIS1 LABEL=NONE ORDER =(0 12 24 36 48 60 72 84 96 108 120 132 144)
→ MINOR=(N=5);
13 AXIS2 LABEL=(ANGLE =90 ’total in thousands ’);
14 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.2;
15
38 Elements of Exploratory Time Series Analysis
16 /* Plot the data */
17 PROC GPLOT DATA=data1;
18 PLOT y*t / HAXIS=AXIS1 VAXIS=AXIS2;
19 RUN; QUIT;
Program 1.3.2: Plotting the airline passengers
In the first data step, the monthly passenger to-
tals are read into the variable y. To get a time
variable t, the temporarily created SAS variable
N is used; it counts the observations. The pas-
senger totals are plotted against t with a line
joining the data points, which are symbolized
by small dots. On the horizontal axis a label is
suppressed.
The variation of the data y
t
obviously increases with their height. The
logtransformed data x
t
= log(y
t
), displayed in the following figure,
however, show no dependence of variability from height.
Plot 1.3.3a: Logarithm of Airline Data x
t
= log(y
t
).
1 /* airline_log.sas */
2 TITLE1 ’Logarithmic transformation ’;
3 TITLE2 ’Airline Data ’;
1.3 Autocovariances and Autocorrelations 39
4
5 /* Read in the data and compute log -transformed data */
6 DATA data1;
7 INFILE ’c\data\airline.txt ’;
8 INPUT y;
9 t=_N_;
10 x=LOG(y);
11
12 /* Graphical options */
13 AXIS1 LABEL=NONE ORDER =(0 12 24 36 48 60 72 84 96 108 120 132 144)
→MINOR=(N=5);
14 AXIS2 LABEL=NONE;
15 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.2;
16
17 /* Plot log -transformed data */
18 PROC GPLOT DATA=data1;
19 PLOT x*t / HAXIS=AXIS1 VAXIS=AXIS2;
20 RUN; QUIT;
Program 1.3.3: Computing and plotting the logarithm of Airline Data.
The plot of the log-transformed data is done in
the same manner as for the original data in Pro-
gram 1.3.2 (airline plot.sas). The only differ-
ences are the log-transformation by means of
the LOG function and the suppressed label on
the vertical axis.
The fact that taking the logarithm of data often reduces their variabil-
ity, can be illustrated as follows. Suppose, for example, that the data
were generated by random variables, which are of the form Y
t
= σ
t
Z
t
,
where σ
t
> 0 is a scale factor depending on t, and Z
t
, t ∈ Z, are
independent copies of a positive random variable Z with variance
1. The variance of Y
t
is in this case σ
2
t
, whereas the variance of
log(Y
t
) = log(σ
t
)+log(Z
t
) is a constant, namely the variance of log(Z),
if it exists.
A transformation of the data, which reduces the dependence of the
variability on their height, is called variance stabilizing. The logarithm
is a particular case of the general Box–Cox (1964) transformation T
λ
of a time series (Y
t
), where the parameter λ ≥ 0 is chosen by the
statistician:
T
λ
(Y
t
) :=
_
(Y
λ
t
−1)/λ, Y
t
≥ 0, λ > 0
log(Y
t
), Y
t
> 0, λ = 0.
Note that lim
λ0
T
λ
(Y
t
) = T
0
(Y
t
) = log(Y
t
) if Y
t
> 0 (Exercise 1.19).
40 Elements of Exploratory Time Series Analysis
Popular choices of the parameter λ are 0 and 1/2. A variance stabi-
lizing transformation of the data, if necessary, usually precedes any
further data manipulation such as trend or seasonal adjustment.
Exercises
1.1. Plot the Mitscherlich function for different values of β
1
, β
2
, β
3
using PROC GPLOT.
1.2. Put in the logistic trend model (1.5) z
t
:= 1/y
t
∼ 1/ E(Y
t
) =
1/f
log
(t), t = 1, . . . , n. Then we have the linear regression model
z
t
= a + bz
t−1
+ ε
t
, where ε
t
is the error variable. Compute the least
squares estimates ˆ a,
ˆ
b of a, b and motivate the estimates
ˆ
β
1
:= −log(
ˆ
b),
ˆ
β
3
:= (1 −exp(−
ˆ
β
1
))/ˆ a as well as
ˆ
β
2
:= exp
_
n + 1
2
ˆ
β
1
+
1
n
n

t=1
log
_
ˆ
β
3
y
t
−1
__
,
proposed by Tintner (1958); see also Exercise 1.3.
1.3. The estimate
ˆ
β
2
defined above suffers from the drawback that all
observations y
t
have to be strictly less than the estimate
ˆ
β
3
. Motivate
the following substitute of
ˆ
β
2
˜
β
2
=
_
n

t=1
ˆ
β
3
−y
t
y
t
exp
_

ˆ
β
1
t
_
__
n

t=1
exp
_
−2
ˆ
β
1
t
_
as an estimate of the parameter β
2
in the logistic trend model (1.5).
1.4. Show that in a linear regression model y
t
= β
1
x
t

2
, t = 1, . . . , n,
the squared multiple correlation coefficient R
2
based on the least
squares estimates
ˆ
β
1
,
ˆ
β
2
and ˆ y
t
:=
ˆ
β
1
x
t
+
ˆ
β
2
is necessarily between
zero and one with R
2
= 1 if and only if ˆ y
t
= y
t
, t = 0, . . . , n (see
(1.12)).
1.5. (Population2 Data) Table 1.3.1 lists total population numbers of
North Rhine-Westphalia between 1961 and 1979. Suppose a logistic
trend for these data and compute the estimators
ˆ
β
1
,
ˆ
β
3
using PROC
REG. Since some observations exceed
ˆ
β
3
, use
˜
β
2
from Exercise 1.3 and
do an ex post-analysis. Use PROC NLIN and do an ex post-analysis.
Compare these two procedures by their residual sums of squares.
Exercises 41
Year t Total Population
in millions
1961 1 15.920
1963 2 16.280
1965 3 16.661
1967 4 16.835
1969 5 17.044
1971 6 17.091
1973 7 17.223
1975 8 17.176
1977 9 17.052
1979 10 17.002
Table 1.3.1: Population2 Data.
1.6. (Income Data) Suppose an allometric trend function for the in-
come data in Example 1.1.3 and do a regression analysis. Plot the
data y
t
versus
ˆ
β
2
t
ˆ
β
1
. To this end compute the R
2
-coefficient. Estimate
the parameters also with PROC NLIN and compare the results.
1.7. (Unemployed2 Data) Table 1.3.2 lists total numbers of unem-
ployed (in thousands) in West Germany between 1950 and 1993. Com-
pare a logistic trend function with an allometric one. Which one gives
the better fit?
1.8. Give an update equation for a simple moving average of (even)
order 2s.
1.9. (Public Expenditures Data) Table 1.3.3 lists West Germany’s
public expenditures (in billion D-Marks) between 1961 and 1990.
Compute simple moving averages of order 3 and 5 to estimate a pos-
sible trend. Plot the original data as well as the filtered ones and
compare the curves.
1.10. (Unemployed Females Data) Use PROC X11 to analyze the
monthly unemployed females between ages 16 and 19 in the United
States from January 1961 to December 1985 (in thousands).
1.11. Show that the rank of a matrix A equals the rank of A
T
A.
42 Elements of Exploratory Time Series Analysis
Year Unemployed
1950 1869
1960 271
1970 149
1975 1074
1980 889
1985 2304
1988 2242
1989 2038
1990 1883
1991 1689
1992 1808
1993 2270
Table 1.3.2: Unemployed2 Data.
Year Public Expenditures Year Public Expenditures
1961 113,4 1976 546,2
1962 129,6 1977 582,7
1963 140,4 1978 620,8
1964 153,2 1979 669,8
1965 170,2 1980 722,4
1966 181,6 1981 766,2
1967 193,6 1982 796,0
1968 211,1 1983 816,4
1969 233,3 1984 849,0
1970 264,1 1985 875,5
1971 304,3 1986 912,3
1972 341,0 1987 949,6
1973 386,5 1988 991,1
1974 444,8 1989 1018,9
1975 509,1 1990 1118,1
Table 1.3.3: Public Expenditures Data.
Exercises 43
1.12. The p + 1 columns of the design matrix X in (1.17) are linear
independent.
1.13. Let (c
u
) be the moving average derived by the best local poly-
nomial fit. Show that
(i) fitting locally a polynomial of degree 2 to five consecutive data
points leads to
(c
u
) =
1
35
(−3, 12, 17, 12, −3)
T
,
(ii) the inverse matrix A
−1
of an invertible m × m-matrix A =
(a
ij
)
1≤i,j≤m
with the property that a
ij
= 0, if i +j is odd, shares
this property,
(iii) (c
u
) is symmetric, i.e., c
−u
= c
u
.
1.14. (Unemployed1 Data) Compute a seasonal and trend adjusted
time series for the Unemployed1 Data in the building trade. To this
end compute seasonal differences and first order differences. Compare
the results with those of PROC X11.
1.15. Use the SAS function RANNOR to generate a time series Y
t
=
b
0
+b
1
t+ε
t
, t = 1, . . . , 100, where b
0
, b
1
= 0 and the ε
t
are independent
normal random variables with mean µ and variance σ
2
1
if t ≤ 69 but
variance σ
2
2
= σ
2
1
if t ≥ 70. Plot the exponentially filtered variables
Y

t
for different values of the smoothing parameter α ∈ (0, 1) and
compare the results.
1.16. Compute under the assumptions of Corollary 1.2.8 the variance
of an exponentially filtered variable Y

t
after a change point t = N
with σ
2
:= E(Y
t
−µ)
2
for t < N and τ
2
:= E(Y
t
−λ)
2
for t ≥ N. What
is the limit for t →∞?
1.17. (Bankruptcy Data) Table 1.3.4 lists the percentages to annual
bancruptcies among all US companies between 1867 and 1932:
Compute and plot the empirical autocovariance function and the em-
pirical autocorrelation function using the SAS procedures PROC ARIMA
and PROC GPLOT.
44 Elements of Exploratory Time Series Analysis
1.33 0.94 0.79 0.83 0.61 0.77 0.93 0.97 1.20 1.33
1.36 1.55 0.95 0.59 0.61 0.83 1.06 1.21 1.16 1.01
0.97 1.02 1.04 0.98 1.07 0.88 1.28 1.25 1.09 1.31
1.26 1.10 0.81 0.92 0.90 0.93 0.94 0.92 0.85 0.77
0.83 1.08 0.87 0.84 0.88 0.99 0.99 1.10 1.32 1.00
0.80 0.58 0.38 0.49 1.02 1.19 0.94 1.01 1.00 1.01
1.07 1.08 1.04 1.21 1.33 1.53
Table 1.3.4: Bankruptcy Data.
1.18. Verify that the empirical correlation r(k) at lag k for the trend
y
t
= t, t = 1, . . . , n is given by
r(k) = 1 −3
k
n
+ 2
k(k
2
−1)
n(n
2
−1)
, k = 0, . . . , n.
Plot the correlogram for different values of n. This example shows,
that the correlogram has no interpretation for non-stationary processes
(see Exercise 1.17).
1.19. Show that
lim
λ↓0
T
λ
(Y
t
) = T
0
(Y
t
) = log(Y
t
), Y
t
> 0
for the Box–Cox transformation T
λ
.
Chapter
2
Models of Time Series
Each time series Y
1
, . . . , Y
n
can be viewed as a clipping from a sequence
of random variables . . . , Y
−2
, Y
−1
, Y
0
, Y
1
, Y
2
, . . . In the following we will
introduce several models for such a stochastic process Y
t
with index
set Z.
2.1 Linear Filters and Stochastic Processes
For mathematical convenience we will consider complex valued ran-
dom variables Y , whose range is the set of complex numbers C =
{u + iv : u, v ∈ R}, where i =

−1. Therefore, we can decompose
Y as Y = Y
(1)
+ iY
(2)
, where Y
(1)
= Re(Y ) is the real part of Y and
Y
(2)
= Im(Y ) is its imaginary part. The random variable Y is called
integrable if the real valued random variables Y
(1)
, Y
(2)
both have finite
expectations, and in this case we define the expectation of Y by
E(Y ) := E(Y
(1)
) +i E(Y
(2)
) ∈ C.
This expectation has, up to monotonicity, the usual properties such
as E(aY + bZ) = a E(Y ) + b E(Z) of its real counterpart. Here a
and b are complex numbers and Z is a further integrable complex
valued random variable. In addition we have E(Y ) = E(
¯
Y ), where
¯ a = u−iv denotes the conjugate complex number of a = u+iv. Since
|a|
2
:= u
2
+v
2
= a¯ a = ¯ aa, we define the variance of Y by
Var(Y ) := E((Y −E(Y ))(Y −E(Y ))) ≥ 0.
The complex random variable Y is called square integrable if this
number is finite. To carry the equation Var(X) = Cov(X, X) for a
real random variable X over to complex ones, we define the covariance
46 Models of Time Series
of complex square integrable random variables Y, Z by
Cov(Y, Z) := E((Y −E(Y ))(Z −E(Z))).
Note that the covariance Cov(Y, Z) is no longer symmetric with re-
spect to Y and Z, as it is for real valued random variables, but it
satisfies Cov(Y, Z) = Cov(Z, Y ).
The following lemma implies that the Cauchy–Schwarz inequality car-
ries over to complex valued random variables.
Lemma 2.1.1. For any integrable complex valued random variable
Y = Y
(1)
+iY
(2)
we have
| E(Y )| ≤ E(|Y |) ≤ E(|Y
(1)
|) + E(|Y
(2)
|).
Proof. We write E(Y ) in polar coordinates E(Y ) = re

, where r =
| E(Y )| and ϑ ∈ [0, 2π). Observe that Re(e
−iϑ
Y ) = Re
_
(cos(ϑ) −
i sin(ϑ))(Y
(1)
+iY
(2)
)
_
= cos(ϑ)Y
(1)
+sin(ϑ)Y
(2)
≤ (cos
2
(ϑ)+sin
2
(ϑ))
1/2
(Y
2
(1)
+
Y
2
(2)
)
1/2
= |Y | by the Cauchy–Schwarz inequality for real numbers.
Thus we obtain
| E(Y )| = r = E(e
−iϑ
Y )
= E
_
Re(e
−iϑ
Y )
_
≤ E(|Y |).
The second inequality of the lemma follows from|Y | = (Y
2
(1)
+Y
2
(2)
)
1/2

|Y
(1)
| +|Y
(2)
|.
The next result is a consequence of the preceding lemma and the
Cauchy–Schwarz inequality for real valued random variables.
Corollary 2.1.2. For any square integrable complex valued random
variable we have
| E(Y Z)| ≤ E(|Y ||Z|) ≤ E(|Y |
2
)
1/2
E(|Z|
2
)
1/2
and thus,
| Cov(Y, Z)| ≤ Var(Y )
1/2
Var(Z)
1/2
.
2.1 Linear Filters and Stochastic Processes 47
Stationary Processes
A stochastic process (Y
t
)
t∈Z
of square integrable complex valued ran-
dom variables is said to be (weakly) stationary if for any t
1
, t
2
, k ∈ Z
E(Y
t
1
) = E(Y
t
1
+k
) and E(Y
t
1
Y
t
2
) = E(Y
t
1
+k
Y
t
2
+k
).
The random variables of a stationary process (Y
t
)
t∈Z
have identical
means and variances. The autocovariance function satisfies moreover
for s, t ∈ Z
γ(t, s) : = Cov(Y
t
, Y
s
) = Cov(Y
t−s
, Y
0
) =: γ(t −s)
= Cov(Y
0
, Y
t−s
) = Cov(Y
s−t
, Y
0
) = γ(s −t),
and thus, the autocovariance function of a stationary process can be
viewed as a function of a single argument satisfying γ(t) = γ(−t), t ∈
Z.
A stationary process (ε
t
)
t∈Z
of square integrable and uncorrelated real
valued random variables is called white noise i.e., Cov(ε
t
, ε
s
) = 0 for
t = s and there exist µ ∈ R, σ ≥ 0 such that
E(ε
t
) = µ, E((ε
t
−µ)
2
) = σ
2
, t ∈ Z.
In Section 1.2 we defined linear filters of a time series, which were
based on a finite number of real valued weights. In the following
we consider linear filters with an infinite number of complex valued
weights.
Suppose that (ε
t
)
t∈Z
is a white noise and let (a
t
)
t∈Z
be a sequence of
complex numbers satisfying


t=−∞
|a
t
| :=

t≥0
|a
t
|+

t≥1
|a
−t
| < ∞.
Then (a
t
)
t∈Z
is said to be an absolutely summable (linear) filter and
Y
t
:=

u=−∞
a
u
ε
t−u
:=

u≥0
a
u
ε
t−u
+

u≥1
a
−u
ε
t+u
, t ∈ Z,
is called a general linear process.
Existence of General Linear Processes
We will show that


u=−∞
|a
u
ε
t−u
| < ∞ with probability one for
t ∈ Z and, thus, Y
t
=


u=−∞
a
u
ε
t−u
is well defined. Denote by
48 Models of Time Series
L
2
:= L
2
(Ω, A, P) the set of all complex valued square integrable ran-
dom variables, defined on some probability space (Ω, A, P), and put
||Y ||
2
:= E(|Y |
2
)
1/2
, which is the L
2
-pseudonorm on L
2
.
Lemma 2.1.3. Let X
n
, n ∈ N, be a sequence in L
2
such that ||X
n+1

X
n
||
2
≤ 2
−n
for each n ∈ N. Then there exists X ∈ L
2
such that
lim
n→∞
X
n
= X with probability one.
Proof. Write X
n
=

k≤n
(X
k
− X
k−1
), where X
0
:= 0. By the
monotone convergence theorem, the Cauchy–Schwarz inequality and
Corollary 2.1.2 we have
E
_

k≥1
|X
k
−X
k−1
|
_
=

k≥1
E(|X
k
−X
k−1
|) ≤

k≥1
||X
k
−X
k−1
||
2
≤ ||X
1
||
2
+

k≥1
2
−k
< ∞.
This implies that

k≥1
|X
k
− X
k−1
| < ∞ with probability one and
hence, the limit lim
n→∞

k≤n
(X
k
− X
k−1
) = lim
n→∞
X
n
= X exists
in C almost surely. Finally, we check that X ∈ L
2
:
E(|X|
2
) = E( lim
n→∞
|X
n
|
2
)
≤ E
_
lim
n→∞
_

k≤n
|X
k
−X
k−1
|
_
2
_
= lim
n→∞
E
__

k≤n
|X
k
−X
k−1
|
_
2
_
= lim
n→∞

k,j≤n
E(|X
k
−X
k−1
| |X
j
−X
j−1
|)
≤ limsup
n→∞

k,j≤n
||X
k
−X
k−1
||
2
||X
j
−X
j−1
||
2
= limsup
n→∞
_

k≤n
||X
k
−X
k−1
||
2
_
2
=
_

k≥1
||X
k
−X
k−1
||
2
_
< ∞.
2.1 Linear Filters and Stochastic Processes 49
Theorem 2.1.4. The space (L
2
, || · ||
2
) is complete i.e., suppose that
X
n
∈ L
2
, n ∈ N, has the property that for arbitrary ε > 0 one can find
an integer N(ε) ∈ N such that ||X
n
−X
m
||
2
< ε if n, m ≥ N(ε). Then
there exists a random variable X ∈ L
2
such that lim
n→∞
||X−X
n
||
2
=
0.
Proof. We can obviously find integers n
1
< n
2
< . . . such that
||X
n
−X
m
||
2
≤ 2
−k
ifn, m ≥ n
k
.
By Lemma 2.1.3 there exists a random variable X ∈ L
2
such that
lim
k→∞
X
n
k
= X with probability one. Fatou’s lemma implies
||X
n
−X||
2
2
= E(|X
n
−X|
2
)
= E
_
liminf
k→∞
|X
n
−X
n
k
|
2
_
≤ liminf
k→∞
||X
n
−X
n
k
||
2
2
.
The right-hand side of this inequality becomes arbitrarily small if we
choose n large enough, and thus we have lim
n→∞
||X
n
−X||
2
2
= 0.
The following result implies in particular that a general linear process
is well defined.
Theorem 2.1.5. Suppose that (Z
t
)
t∈Z
is a complex valued stochastic
process such that sup
t
E(|Z
t
|) < ∞ and let (a
t
)
t∈Z
be an absolutely
summable filter. Then we have

u∈Z
|a
u
Z
t−u
| < ∞ with probability
one for t ∈ Z and, thus, Y
t
:=

u∈Z
a
u
Z
t−u
exists almost surely in C.
We have moreover E(|Y
t
|) < ∞, t ∈ Z, and
(i) E(Y
t
) = lim
n→∞

n
u=−n
a
u
E(Z
t−u
), t ∈ Z,
(ii) E(|Y
t

n
u=−n
a
u
Z
t−u
|)
n→∞
−→ 0.
If, in addition, sup
t
E(|Z
t
|
2
) < ∞, then we have E(|Y
t
|
2
) < ∞, t ∈ Z,
and
(iii) ||Y
t

n
u=−n
a
u
Z
t−u
||
2
n→∞
−→ 0.
Proof. The monotone convergence theorem implies
E
_

u∈Z
|a
u
| |Z
t−u
|
_
≤ lim
n→∞
_
n

u=−n
|a
u
|
_
sup
t∈Z
E(|Z
t−u
|) < ∞
50 Models of Time Series
and, thus, we have

u∈Z
|a
u
||Z
t−u
| < ∞ with probability one as
well as E(|Y
t
|) ≤ E(

u∈Z
|a
u
||Z
t−u
|) < ∞, t ∈ Z. Put X
n
(t) :=

n
u=−n
a
u
Z
t−u
. Then we have |Y
t
− X
n
(t)|
n→∞
−→ 0 almost surely. By
the inequality |Y
t
−X
n
(t)| ≤ 2

u∈Z
|a
u
||Z
t−u
|, n ∈ N, the dominated
convergence theorem implies (ii) and therefore (i):
| E(Y
t
) −
n

u=−n
a
u
E(Z
t−u
)| = | E(Y
t
) −E(X
n
(t))|
≤ E(|Y
t
−X
n
(t)|)
n→∞
−→ 0.
Put K := sup
t
E(|Z
t
|
2
) < ∞. The Cauchy–Schwarz inequality implies
for m, n ∈ N and ε > 0
E(|X
n+m
(t) −X
n
(t)|
2
) = E
_
_
_
¸
¸
¸
¸
¸
¸
n+m

|u|=n+1
a
u
Z
t−u
¸
¸
¸
¸
¸
¸
2
_
_
_
=
n+m

|u|=n+1
n+m

|w|=n+1
a
u
¯ a
w
E(Z
t−u
¯
Z
t−w
)
≤ K
2
_
_
n+m

|u|=n+1
|a
u
|
_
_
2
≤ K
2
_
_

|u|≥n
|a
u
|
_
_
2
< ε
if n is chosen sufficiently large. Theorem 2.1.4 now implies the exis-
tence of a random variable X(t) ∈ L
2
with lim
n→∞
||X
n
(t)−X(t)||
2
=
0. For the proof of (iii) it remains to show that X(t) = Y
t
almost
surely. Markov’s inequality implies
P{|Y
t
−X
n
(t)| ≥ ε} ≤ ε
−1
E(|Y
t
−X
n
(t)|) −→
n→∞
0
by (ii), and Chebyshev’s inequality yields
P{|X(t) −X
n
(t)| ≥ ε} ≤ ε
−2
||X(t) −X
n
(t)||
2
−→
n→∞
0
for arbitrary ε > 0. This implies
P{|Y
t
−X(t)| ≥ ε}
≤ P{|Y
t
−X
n
(t)| +|X
n
(t) −X(t)| ≥ ε}
≤ P{|Y
t
−X
n
(t)| ≥ ε/2} +P{|X(t) −X
n
(t)| ≥ ε/2} −→
n→∞
0
2.1 Linear Filters and Stochastic Processes 51
and thus Y
t
= X(t) almost surely, which completes the proof of The-
orem 2.1.5.
Theorem 2.1.6. Suppose that (Z
t
)
t∈Z
is a stationary process with
mean µ
Z
:= E(Z
0
) and autocovariance function γ
Z
and let (a
t
) be
an absolutely summable filter. Then Y
t
=

u
a
u
Z
t−u
, t ∈ Z, is also
stationary with
µ
Y
= E(Y
0
) =
_

u
a
u
_
µ
Z
and autocovariance function
γ
Y
(t) =

u

w
a
u
¯ a
w
γ
Z
(t +w −u).
Proof. Note that
E(|Z
t
|
2
) = E
_
|Z
t
−µ
Z

Z
|
2
_
= E
_
(Z
t
−µ
Z

Z
)(Z
t
−µ
z

z
)
_
= E
_
|Z
t
−µ
Z
|
2
_
+|µ
Z
|
2
= γ
Z
(0) +|µ
Z
|
2
and, thus,
sup
t∈Z
E(|Z
t
|
2
) < ∞.
We can, therefore, now apply Theorem 2.1.5. Part (i) of Theorem
2.1.5 immediately implies E(Y
t
) = (

u
a
u

Z
and part (iii) implies
for t, s ∈ Z
E((Y
t
−µ
Y
)(Y
s
−µ
Y
)) = lim
n→∞
Cov
_
n

u=−n
a
u
Z
t−u
,
n

w=−n
a
w
Z
s−w
_
= lim
n→∞
n

u=−n
n

w=−n
a
u
¯ a
w
Cov(Z
t−u
, Z
s−w
)
= lim
n→∞
n

u=−n
n

w=−n
a
u
¯ a
w
γ
Z
(t −s +w −u)
=

u

w
a
u
¯ a
w
γ
Z
(t −s +w −u).
52 Models of Time Series
The covariance of Y
t
and Y
s
depends, therefore, only on the difference
t−s. Note that |γ
Z
(t)| ≤ γ
Z
(0) < ∞and thus,

u

w
|a
u
¯ a
w
γ
Z
(t−s+
w −u)| ≤ γ
Z
(0)(

u
|a
u
|)
2
< ∞, i.e., (Y
t
) is a stationary process.
The Covariance Generating Function
The covariance generating function of a stationary process with au-
tocovariance function γ is defined as the double series
G(z) :=

t∈Z
γ(t)z
t
=

t≥0
γ(t)z
t
+

t≥1
γ(−t)z
−t
,
known as a Laurent series in complex analysis. We assume that there
exists a real number r > 1 such that G(z) is defined for all z ∈ C in
the annulus 1/r < |z| < r. The covariance generating function will
help us to compute the autocovariances of filtered processes.
Since the coefficients of a Laurent series are uniquely determined (see
e.g. Chapter V, 1.11 in Conway (1975)), the covariance generating
function of a stationary process is a constant function if and only if
this process is a white noise i.e., γ(t) = 0 for t = 0.
Theorem 2.1.7. Suppose that Y
t
=

u
a
u
ε
t−u
, t ∈ Z, is a general
linear process with

u
|a
u
||z
u
| < ∞, if r
−1
< |z| < r for some r > 1.
Put σ
2
:= Var(ε
0
). The process (Y
t
) then has the covariance generat-
ing function
G(z) = σ
2
_

u
a
u
z
u
__

u
¯ a
u
z
−u
_
, r
−1
< |z| < r.
Proof. Theorem 2.1.6 implies for t ∈ Z
Cov(Y
t
, Y
0
) =

u

w
a
u
¯ a
w
γ
ε
(t +w −u)
= σ
2

u
a
u
¯ a
u−t
.
2.1 Linear Filters and Stochastic Processes 53
This implies
G(z) = σ
2

t

u
a
u
¯ a
u−t
z
t
= σ
2
_

u
|a
u
|
2
+

t≥1

u
a
u
¯ a
u−t
z
t
+

t≤−1

u
a
u
¯ a
u−t
z
t
_
= σ
2
_

u
|a
u
|
2
+

u

t≤u−1
a
u
¯ a
t
z
u−t
+

u

t≥u+1
a
u
¯ a
t
z
u−t
_
= σ
2

u

t
a
u
¯ a
t
z
u−t
= σ
2
_

u
a
u
z
u
__

t
¯ a
t
z
−t
_
.
Example 2.1.8. Let (ε
t
)
t∈Z
be a white noise with Var(ε
0
) =: σ
2
>
0. The covariance generating function of the simple moving average
Y
t
=

u
a
u
ε
t−u
with a
−1
= a
0
= a
1
= 1/3 and a
u
= 0 elsewhere is
then given by
G(z) =
σ
2
9
(z
−1
+z
0
+z
1
)(z
1
+z
0
+z
−1
)
=
σ
2
9
(z
−2
+ 2z
−1
+ 3z
0
+ 2z
1
+z
2
), z ∈ R.
Then the autocovariances are just the coefficients in the above series
γ(0) =
σ
2
3
,
γ(1) = γ(−1) =

2
9
,
γ(2) = γ(−2) =
σ
2
9
,
γ(k) = 0 elsewhere.
This explains the name covariance generating function.
The Characteristic Polynomial
Let (a
u
) be an absolutely summable filter. The Laurent series
A(z) :=

u∈Z
a
u
z
u
54 Models of Time Series
is called characteristic polynomial of (a
u
). We know from complex
analysis that A(z) exists either for all z in some annulus r < |z| < R
or almost nowhere. In the first case the coefficients a
u
are uniquely
determined by the function A(z) (see e.g. Chapter V, 1.11 in Conway
(1975)).
If, for example, (a
u
) is absolutely summable with a
u
= 0 for u ≥ 1,
then A(z) exists for all complex z such that |z| ≥ 1. If a
u
= 0 for all
large |u|, then A(z) exists for all z = 0.
Inverse Filters
Let now (a
u
) and (b
u
) be absolutely summable filters and denote by
Y
t
:=

u
a
u
Z
t−u
, the filtered stationary sequence, where (Z
u
)
u∈Z
is a
stationary process. Filtering (Y
t
)
t∈Z
by means of (b
u
) leads to

w
b
w
Y
t−w
=

w

u
b
w
a
u
Z
t−w−u
=

v
(

u+w=v
b
w
a
u
)Z
t−v
,
where c
v
:=

u+w=v
b
w
a
u
, v ∈ Z, is an absolutely summable filter:

v
|c
v
| ≤

v

u+w=v
|b
w
a
u
| = (

u
|a
u
|)(

w
|b
w
|) < ∞.
We call (c
v
) the product filter of (a
u
) and (b
u
).
Lemma 2.1.9. Let (a
u
) and (b
u
) be absolutely summable filters with
characteristic polynomials A
1
(z) and A
2
(z), which both exist on some
annulus r < |z| < R. The product filter (c
v
) = (

u+w=v
b
w
a
u
) then
has the characteristic polynomial
A(z) = A
1
(z)A
2
(z).
Proof. By repeating the above arguments we obtain
A(z) =

v
_

u+w=v
b
w
a
u
_
z
v
= A
1
(z)A
2
(z).
2.1 Linear Filters and Stochastic Processes 55
Suppose now that (a
u
) and (b
u
) are absolutely summable filters with
characteristic polynomials A
1
(z) and A
2
(z), which both exist on some
annulus r < z < R, where they satisfy A
1
(z)A
2
(z) = 1. Since 1 =

v
c
v
z
v
if c
0
= 1 and c
v
= 0 elsewhere, the uniquely determined
coefficients of the characteristic polynomial of the product filter of
(a
u
) and (b
u
) are given by

u+w=v
b
w
a
u
=
_
1 if v = 0
0 if v = 0.
In this case we obtain for a stationary process (Z
t
) that almost surely
Y
t
=

u
a
u
Z
t−u
and

w
b
w
Y
t−w
= Z
t
, t ∈ Z. (2.1)
The filter (b
u
) is, therefore, called the inverse filter of (a
u
).
Causal Filters
An absolutely summable filter (a
u
)
u∈Z
is called causal if a
u
= 0 for
u < 0.
Lemma 2.1.10. Let a ∈ C. The filter (a
u
) with a
0
= 1, a
1
= −a and
a
u
= 0 elsewhere has an absolutely summable and causal inverse filter
(b
u
)
u≥0
if and only if |a| < 1. In this case we have b
u
= a
u
, u ≥ 0.
Proof. The characteristic polynomial of (a
u
) is A
1
(z) = 1−az, z ∈ C.
Since the characteristic polynomial A
2
(z) of an inverse filter satisfies
A
1
(z)A
2
(z) = 1 on some annulus, we have A
2
(z) = 1/(1−az). Observe
now that
1
1 −az
=

u≥0
a
u
z
u
, if |z| < 1/|a|.
As a consequence, if |a| < 1, then A
2
(z) =

u≥0
a
u
z
u
exists for all
|z| < 1 and the inverse causal filter (a
u
)
u≥0
is absolutely summable,
i.e.,

u≥0
|a
u
| < ∞. If |a| ≥ 1, then A
2
(z) =

u≥0
a
u
z
u
exists for all
|z| < 1/|a|, but

u≥0
|a|
u
= ∞, which completes the proof.
56 Models of Time Series
Theorem 2.1.11. Let a
1
, a
2
, . . . , a
p
∈ C, a
p
= 0. The filter (a
u
) with
coefficients a
0
= 1, a
1
, . . . , a
p
and a
u
= 0 elsewhere has an absolutely
summable and causal inverse filter if the p roots z
1
, . . . , z
p
∈ C of
A(z) = 1 + a
1
z + a
2
z
2
+ · · · + a
p
z
p
= 0 are outside of the unit circle
i.e., |z
i
| > 1 for 1 ≤ i ≤ p.
Proof. We know from the Fundamental Theorem of Algebra that the
equation A(z) = 0 has exactly p roots z
1
, . . . , z
p
∈ C (see e.g. Chap-
ter IV, 3.5 in Conway (1975)), which are all different from zero, since
A(0) = 1. Hence we can write (see e.g. Chapter IV, 3.6 in Conway
(1975))
A(z) = a
p
(z −z
1
) . . . (z −z
p
)
= c
_
1 −
z
z
1
__
1 −
z
z
2
_
. . .
_
1 −
z
z
p
_
,
where c := a
p
(−1)
p
z
1
. . . z
p
. In case of |z
i
| > 1 we can write for
|z| < |z
i
|
1
1 −
z
z
i
=

u≥0
_
1
z
i
_
u
z
u
,
where the coefficients (1/z
i
)
u
, u ≥ 0, are absolutely summable. In
case of |z
i
| < 1, we have for |z| > |z
i
|
1
1 −
z
z
i
= −
1
z
z
i
1
1 −
z
i
z
= −
z
i
z

u≥0
z
u
i
z
−u
= −

u≤−1
_
1
z
i
_
u
z
u
,
where the filter with coefficients −(1/z
i
)
u
, u ≤ −1, is not a causal
one. In case of |z
i
| = 1, we have for |z| < 1
1
1 −
z
z
i
=

u≥0
_
1
z
i
_
u
z
u
,
where the coefficients (1/z
i
)
u
, u ≥ 0, are not absolutely summable.
Since the coefficients of a Laurent series are uniquely determined, the
factor 1 − z/z
i
has an inverse 1/(1 − z/z
i
) =

u≥0
b
u
z
u
on some
annulus with

u≥0
|b
u
| < ∞ if |z
i
| > 1. A small analysis implies that
this argument carries over to the product
1
A(z)
=
1
c
_
1 −
z
z
1
_
. . .
_
1 −
z
z
p
_
2.1 Linear Filters and Stochastic Processes 57
which has an expansion 1/A(z) =

u≥0
b
u
z
u
on some annulus with

u≥0
|b
u
| < ∞ if each factor has such an expansion, and thus, the
proof is complete.
Remark 2.1.12. Note that the roots z
1
, . . . , z
p
of A(z) = 1 + a
1
z +
· · ·+a
p
z
p
are complex valued and thus, the coefficients b
u
of the inverse
causal filter will, in general, be complex valued as well. The preceding
proof shows, however, that if a
p
and each z
i
are real numbers, then
the coefficients b
u
, u ≥ 0, are real as well.
The preceding proof shows, moreover, that a filter (a
u
) with complex
coefficients a
0
, a
1
, . . . , a
p
∈ C and a
u
= 0 elsewhere has an absolutely
summable inverse filter if no root z ∈ C of the equation A(z) =
a
0
+ a
1
z +· · · + a
p
z
p
= 0 has length 1 i.e., |z| = 1 for each root. The
additional condition |z| > 1 for each root then implies that the inverse
filter is a causal one.
Example 2.1.13. The filter with coefficients a
0
= 1, a
1
= −0.7 and
a
2
= 0.1 has the characteristic polynomial A(z) = 1 −0.7z + 0.1z
2
=
0.1(z − 2)(z − 5), with z
1
= 2, z
2
= 5 being the roots of A(z) =
0. Theorem 2.1.11 implies the existence of an absolutely summable
inverse causal filter, whose coefficients can be obtained by expanding
1/A(z) as a power series of z:
1
A(z)
=
1
_
1 −
z
2
__
1 −
z
5
_
=

u≥0
_
1
2
_
u
z
u

w≥0
_
1
5
_
w
z
w
=

v≥0

u+w=v
_
1
2
_
u
_
1
5
_
w
z
v
=

v≥0
v

w=0
_
1
2
_
v−w
_
1
5
_
w
z
v
=

v≥0
_
1
2
_
v
1 −
_
2
5
_
v+1
1 −
2
5
z
v
=

v≥0
10
3
_
_
1
2
_
v+1

_
1
5
_
v+1
_
z
v
.
The preceding expansion implies that b
v
:= (10/3)(2
−(v+1)
−5
−(v+1)
), v ≥
0, are the coefficients of the inverse causal filter.
58 Models of Time Series
2.2 Moving Averages and Autoregressive
Processes
Let a
1
, . . . , a
q
∈ R with a
q
= 0 and let (ε
t
)
t∈Z
be a white noise. The
process
Y
t
:= ε
t
+a
1
ε
t−1
+· · · +a
q
ε
t−q
is said to be a moving average of order q, denoted by MA(q). Put
a
0
= 1. Theorem 2.1.6 and 2.1.7 imply that a moving average Y
t
=

q
u=0
a
u
ε
t−u
is a stationary process with covariance generating func-
tion
G(z) = σ
2
_
q

u=0
a
u
z
u
__
q

w=0
a
w
z
−w
_
= σ
2
q

u=0
q

w=0
a
u
a
w
z
u−w
= σ
2
q

v=−q

u−w=v
a
u
a
w
z
u−w
= σ
2
q

v=−q
_
q−v

w=0
a
v+w
a
w
_
z
v
, z ∈ C,
where σ
2
= Var(ε
0
). The coefficients of this expansion provide the
autocovariance function γ(v) = Cov(Y
0
, Y
v
), v ∈ Z, which cuts off
after lag q.
Lemma 2.2.1. Suppose that Y
t
=

q
u=0
a
u
ε
t−u
, t ∈ Z, is a MA(q)-
process. Put µ := E(ε
0
) and σ
2
:= V ar(ε
0
). Then we have
(i) E(Y
t
) = µ

q
u=0
a
u
,
(ii) γ(v) = Cov(Y
v
, Y
0
) =
_
¸
_
¸
_
0, v > q,
σ
2
q−v

w=0
a
v+w
a
w
, 0 ≤ v ≤ q,
γ(−v) = γ(v),
(iii) Var(Y
0
) = γ(0) = σ
2

q
w=0
a
2
w
,
2.2 Moving Averages and Autoregressive Processes 59
(iv) ρ(v) =
γ(v)
γ(0)
=
_
¸
¸
¸
_
¸
¸
¸
_
0, v > q,
_
q−v

w=0
a
v+w
a
w
___

q
w=0
a
2
w
_
, 0 < v ≤ q,
1, v = 0,
ρ(−v) = ρ(v).
Example 2.2.2. The MA(1)-process Y
t
= ε
t
+ aε
t−1
with a = 0 has
the autocorrelation function
ρ(v) =
_
¸
_
¸
_
1, v = 0
a/(1 +a
2
), v = ±1
0 elsewhere.
Since a/(1 + a
2
) = (1/a)/(1 + (1/a)
2
), the autocorrelation functions
of the two MA(1)-processes with parameters a and 1/a coincide. We
have, moreover, |ρ(1)| ≤ 1/2 for an arbitrary MA(1)-process and thus,
a large value of the empirical autocorrelation function r(1), which
exceeds 1/2 essentially, might indicate that an MA(1)-model for a
given data set is not a correct assumption.
Invertible Processes
The MA(q)-process Y
t
=

q
u=0
a
u
ε
t−u
, with a
0
= 1 and a
q
= 0, is said
to be invertible if all q roots z
1
, . . . , z
q
∈ C of A(z) =

q
u=0
a
u
z
u
= 0
are outside of the unit circle i.e., if |z
i
| > 1 for 1 ≤ i ≤ q.
Theorem 2.1.11 and representation (2.1) imply that the white noise
process (ε
t
), pertaining to an invertible MA(q)-process Y
t
=

q
u=0
a
u
ε
t−u
,
can be obtained by means of an absolutely summable and causal filter
(b
u
)
u≥0
via
ε
t
=

u≥0
b
u
Y
t−u
, t ∈ Z,
with probability one. In particular the MA(1)-process Y
t
= ε
t
−aε
t−1
is invertible iff |a| < 1, and in this case we have by Lemma 2.1.10 with
probability one
ε
t
=

u≥0
a
u
Y
t−u
, t ∈ Z.
60 Models of Time Series
Autoregressive Processes
A real valued stochastic process (Y
t
) is said to be an autoregressive
process of order p, denoted by AR(p) if there exist a
1
, . . . , a
p
∈ R with
a
p
= 0, and a white noise (ε
t
) such that
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
, t ∈ Z. (2.2)
The value of an AR(p)-process at time t is, therefore, regressed on its
own past p values plus a random shock.
The Stationarity Condition
While by Theorem 2.1.6 MA(q)-processes are automatically station-
ary, this is not true for AR(p)-processes (see Exercise 2.26). The fol-
lowing result provides a sufficient condition on the constants a
1
, . . . , a
p
implying the existence of a uniquely determined stationary solution
(Y
t
) of (2.2).
Theorem 2.2.3. The AR(p)-equation (2.2) with the given constants
a
1
, . . . , a
p
and white noise (ε
t
)
t∈Z
has a stationary solution (Y
t
)
t∈Z
if
all p roots of the equation 1 −a
1
z −a
2
z
2
−· · · −a
p
z
p
= 0 are outside
of the unit circle. In this case, the stationary solution is almost surely
uniquely determined by
Y
t
:=

u≥0
b
u
ε
t−u
, t ∈ Z,
where (b
u
)
u≥0
is the absolutely summable inverse causal filter of c
0
=
1, c
u
= −a
u
, u = 1, . . . , p and c
u
= 0 elsewhere.
Proof. The existence of an absolutely summable causal filter follows
from Theorem 2.1.11. The stationarity of Y
t
=

u≥0
b
u
ε
t−u
is a con-
sequence of Theorem 2.1.6, and its uniqueness follows from
ε
t
= Y
t
−a
1
Y
t−1
−· · · −a
p
Y
t−p
, t ∈ Z,
and equation (2.1).
The condition that all roots of the characteristic equation of an AR(p)-
process Y
t
=

p
u=1
a
u
Y
t−u

t
are outside of the unit circle i.e.,
1 −a
1
z −a
2
z
2
−· · · −a
p
z
p
= 0 for |z| ≤ 1, (2.3)
2.2 Moving Averages and Autoregressive Processes 61
will be referred to in the following as the stationarity condition for an
AR(p)-process.
Note that a stationary solution (Y
t
) of (2.1) exists in general if no root
z
i
of the characteristic equation lies on the unit sphere. If there are
solutions in the unit circle, then the stationary solution is noncausal,
i.e., Y
t
is correlated with future values of ε
s
, s > t. This is frequently
regarded as unnatural.
Example 2.2.4. The AR(1)-process Y
t
= aY
t−1
+ ε
t
, t ∈ Z, with
a = 0 has the characteristic equation 1 − az = 0 with the obvious
solution z
1
= 1/a. The process (Y
t
), therefore, satisfies the stationar-
ity condition iff |z
1
| > 1 i.e., iff |a| < 1. In this case we obtain from
Lemma 2.1.10 that the absolutely summable inverse causal filter of
a
0
= 1, a
1
= −a and a
u
= 0 elsewhere is given by b
u
= a
u
, u ≥ 0,
and thus, with probability one
Y
t
=

u≥0
b
u
ε
t−u
=

u≥0
a
u
ε
t−u
.
Denote by σ
2
the variance of ε
0
. From Theorem 2.1.6 we obtain the
autocovariance function of (Y
t
)
γ(s) =

u

w
b
u
b
w
Cov(ε
0
, ε
s+w−u
)
=

u≥0
b
u
b
u−s
Cov(ε
0
, ε
0
)
= σ
2
a
s

u≥s
a
2(u−s)
= σ
2
a
s
1 −a
2
, s = 0, 1, 2, . . .
and γ(−s) = γ(s). In particular we obtain γ(0) = σ
2
/(1 − a
2
) and
thus, the autocorrelation function of (Y
t
) is given by
ρ(s) = a
|s|
, s ∈ Z.
The autocorrelation function of an AR(1)-process Y
t
= aY
t−1
+ ε
t
with |a| < 1 therefore decreases at an exponential rate. Its sign is
alternating if a ∈ (−1, 0).
62 Models of Time Series
Plot 2.2.1a: Autocorrelation functions of AR(1)-processes Y
t
=
aY
t−1

t
with different values of a.
1 /* ar1_autocorrelation .sas */
2 TITLE1 ’Autocorrelation functions of AR(1)-processes ’;
3
4 /* Generate data for different autocorrelation functions */
5 DATA data1;
6 DO a=-0.7, 0.5, 0.9;
7 DO s=0 TO 20;
8 rho=a**s;
9 OUTPUT;
10 END;
11 END;
12
13 /* Graphical options */
14 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3 L=1;
15 SYMBOL2 C=GREEN V=DOT I=JOIN H=0.3 L=2;
16 SYMBOL3 C=GREEN V=DOT I=JOIN H=0.3 L=33;
17 AXIS1 LABEL=(’s’);
18 AXIS2 LABEL=(F=CGREEK ’r’ F=COMPLEX H=1 ’a’ H=2 ’(s)’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL (10 ,0.6);
20
21 /* Plot autocorrelation functions */
22 PROC GPLOT DATA=data1;
2.2 Moving Averages and Autoregressive Processes 63
23 PLOT rho*s=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1 VREF =0;
24 RUN; QUIT;
Program 2.2.1: Computing autocorrelation functions of AR(1)-
processes.
The data step evaluates rho for three different
values of a and the range of s from 0 to 20 us-
ing two loops. The plot is generated by the pro-
cedure GPLOT. The LABEL option in the AXIS2
statement uses, in addition to the greek font
CGREEK, the font COMPLEX assuming this to be
the default text font (GOPTION FTEXT=COMPLEX).
The SHAPE option SHAPE=SYMBOL(10,0.6) in
the LEGEND statement defines width and height
of the symbols presented in the legend.
The following figure illustrates the significance of the stationarity con-
dition |a| < 1 of an AR(1)-process. Realizations Y
t
= aY
t−1
+ ε
t
, t =
1, . . . , 10, are displayed for a = 0.5 and a = 1.5, where ε
1
, ε
2
, . . . , ε
10
are independent standard normal in each case and Y
0
is assumed to
be zero. While for a = 0.5 the sample path follows the constant
zero closely, which is the expectation of each Y
t
, the observations Y
t
decrease rapidly in case of a = 1.5.
64 Models of Time Series
Plot 2.2.2a: Realizations Y
t
= 0.5Y
t−1
+ ε
t
and Y
t
= 1.5Y
t−1
+ ε
t
, t =
1, . . . , 10, with ε
t
independent standard normal and Y
0
= 0.
1 /* ar1_plot.sas */
2 TITLE1 ’Realizations of AR(1)-processes ’;
3
4 /* Generated AR(1)-processes */
5 DATA data1;
6 DO a=0.5, 1.5;
7 t=0; y=0; OUTPUT;
8 DO t=1 TO 10;
9 y=a*y+RANNOR (1);
10 OUTPUT;
11 END;
12 END;
13
14 /* Graphical options */
15 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.4 L=1;
16 SYMBOL2 C=GREEN V=DOT I=JOIN H=0.4 L=2;
17 AXIS1 LABEL=(’t’) MINOR=NONE;
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL (10 ,0.6);
20
21 /* Plot the AR(1)-processes */
22 PROC GPLOT DATA=data1(WHERE=(t>0));
23 PLOT y*t=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1;
24 RUN; QUIT;
Program 2.2.2: Simulating AR(1)-processes.
2.2 Moving Averages and Autoregressive Processes 65
The data are generated within two loops, the
first one over the two values for a. The vari-
able y is initialized with the value 0 correspond-
ing to t=0. The realizations for t=1, ..., 10
are created within the second loop over t and
with the help of the function RANNOR which re-
turns pseudo random numbers distributed as
standard normal. The argument 1 is the initial
seed to produce a stream of random numbers.
A positive value of this seed always produces
the same series of random numbers, a nega-
tive value generates a different series each time
the program is submitted. A value of y is calcu-
lated as the sum of a times the actual value of
y and the random number and stored in a new
observation. The resulting data set has 22 ob-
servations and 3 variables (a, t and y).
In the plot created by PROC GPLOT the initial ob-
servations are dropped using the WHERE data
set option. Only observations fulfilling the con-
dition t>0 are read into the data set used here.
To suppress minor tick marks between the inte-
gers 0,1, ...,10 the option MINOR in the AXIS1
statement is set to NONE.
The Yule–Walker Equations
The Yule–Walker equations entail the recursive computation of the
autocorrelation function ρ of an AR(p)-process satisfying the station-
arity condition (2.3).
Lemma 2.2.5. Let Y
t
=

p
u=1
a
u
Y
t−u

t
be an AR(p)-process, which
satisfies the stationarity condition (2.3). Its autocorrelation function
ρ then satisfies for s = 1, 2, . . . the recursion
ρ(s) =
p

u=1
a
u
ρ(s −u), (2.4)
known as Yule–Walker equations.
Proof. With µ := E(Y
0
) we have for t ∈ Z
Y
t
−µ =
p

u=1
a
u
(Y
t−u
−µ) +ε
t
−µ
_
1 −
p

u=1
a
u
_
, (2.5)
and taking expectations of (2.5) gives µ(1 −

p
u=1
a
u
) = E(ε
0
) =: ν
due to the stationarity of (Y
t
). By multiplying equation (2.5) with
66 Models of Time Series
Y
t−s
−µ for s > 0 and taking expectations again we obtain
γ(s) = E((Y
t
−µ)(Y
t−s
−µ))
=
p

u=1
a
u
E((Y
t−u
−µ)(Y
t−s
−µ)) + E((ε
t
−ν)(Y
t−s
−µ))
=
p

u=1
a
u
γ(s −u).
for the autocovariance function γ of (Y
t
). The final equation fol-
lows from the fact that Y
t−s
and ε
t
are uncorrelated for s > 0. This
is a consequence of Theorem 2.2.3, by which almost surely Y
t−s
=

u≥0
b
u
ε
t−s−u
with an absolutely summable causal filter (b
u
) and thus,
Cov(Y
t−s
, ε
t
) =

u≥0
b
u
Cov(ε
t−s−u
, ε
t
) = 0, see Theorem 2.1.5. Di-
viding the above equation by γ(0) now yields the assertion.
Since ρ(−s) = ρ(s), equations (2.4) can be represented as
_
_
_
_
_
_
ρ(1)
ρ(2)
ρ(3)
.
.
.
ρ(p)
_
_
_
_
_
_
=
_
_
_
_
_
_
1 ρ(1) ρ(2) . . . ρ(p −1)
ρ(1) 1 ρ(1) ρ(p −2)
ρ(2) ρ(1) 1 ρ(p −3)
.
.
.
.
.
.
.
.
.
ρ(p −1) ρ(p −2) ρ(p −3) . . . 1
_
_
_
_
_
_
_
_
_
_
_
_
a
1
a
2
a
3
.
.
.
a
p
_
_
_
_
_
_
(2.6)
This matrix equation offers an estimator of the coefficients a
1
, . . . , a
p
by replacing the autocorrelations ρ(j) by their empirical counterparts
r(j), 1 ≤ j ≤ p. Equation (2.6) then formally becomes r = Ra,
where r = (r(1), . . . , r(p))
T
, a = (a
1
, . . . , a
p
)
T
and
R :=
_
_
_
_
1 r(1) r(2) . . . r(p −1)
r(1) 1 r(1) . . . r(p −2)
.
.
.
.
.
.
r(p −1) r(p −2) r(p −3) . . . 1
_
_
_
_
.
If the p×p-matrix R is invertible, we can rewrite the formal equation
r = Ra as R
−1
r = a, which motivates the estimator
ˆ a := R
−1
r (2.7)
of the vector a = (a
1
, . . . , a
p
)
T
of the coefficients.
2.2 Moving Averages and Autoregressive Processes 67
The Partial Autocorrelation Coefficients
We have seen that the autocorrelation function ρ(k) of an MA(q)-
process vanishes for k > q, see Lemma 2.2.1. This is not true for an
AR(p)-process, whereas the partial autocorrelation coefficients will
share this property. Note that the correlation matrix
P
k
: =
_
Corr(Y
i
, Y
j
)
_
1≤i,j≤k
=
_
_
_
_
_
_
1 ρ(1) ρ(2) . . . ρ(k −1)
ρ(1) 1 ρ(1) ρ(k −2)
ρ(2) ρ(1) 1 ρ(k −3)
.
.
.
.
.
.
.
.
.
ρ(k −1) ρ(k −2) ρ(k −3) . . . 1
_
_
_
_
_
_
(2.8)
is positive semidefinite for any k ≥ 1. If we suppose that P
k
is positive
definite, then it is invertible, and the equation
_
_
ρ(1)
.
.
.
ρ(k)
_
_
= P
k
_
_
a
k1
.
.
.
a
kk
_
_
(2.9)
has the unique solution
a
k
:=
_
_
a
k1
.
.
.
a
kk
_
_
= P
−1
k
_
_
ρ(1)
.
.
.
ρ(k)
_
_
.
The number a
kk
is called partial autocorrelation coefficient at lag
k, denoted by α(k), k ≥ 1. Observe that for k ≥ p the vector
(a
1
, . . . , a
p
, 0, . . . , 0) ∈ R
k
, with k − p zeros added to the vector of
coefficients (a
1
, . . . , a
p
), is by the Yule–Walker equations (2.4) a so-
lution of the equation (2.9). Thus we have α(p) = a
p
, α(k) = 0 for
k > p. Note that the coefficient α(k) also occurs as the coefficient
of Y
n−k
in the best linear one-step forecast

k
u=0
c
u
Y
n−u
of Y
n+1
, see
equation (2.18) in Section 2.3.
If the empirical counterpart R
k
of P
k
is invertible as well, then
ˆ a
k
:= R
−1
k
r
k
,
68 Models of Time Series
with r
k
:= (r(1), . . . , r(k))
T
being an obvious estimate of a
k
. The
k-th component
ˆ α(k) := ˆ a
kk
(2.10)
of ˆ a
k
= (ˆ a
k1
, . . . , ˆ a
kk
) is the empirical partial autocorrelation coef-
ficient at lag k. It can be utilized to estimate the order p of an
AR(p)-process, since ˆ α(p) ≈ α(p) = a
p
is different from zero, whereas
ˆ α(k) ≈ α(k) = 0 for k > p should be close to zero.
Example 2.2.6. The Yule–Walker equations (2.4) for an AR(2)-
process Y
t
= a
1
Y
t−1
+a
2
Y
t−2

t
are for s = 1, 2
ρ(1) = a
1
+a
2
ρ(1), ρ(2) = a
1
ρ(1) +a
2
with the solutions
ρ(1) =
a
1
1 −a
2
, ρ(2) =
a
2
1
1 −a
2
+a
2
.
and thus, the partial autocorrelation coefficients are
α(1) = ρ(1),
α(2) = a
2
,
α(j) = 0, j ≥ 3.
The recursion (2.4) entails the computation of ρ(s) for an arbitrary s
from the two values ρ(1) and ρ(2).
The following figure displays realizations of the AR(2)-process Y
t
=
0.6Y
t−1
− 0.3Y
t−2
+ ε
t
for 1 ≤ t ≤ 200, conditional on Y
−1
= Y
0
= 0.
The random shocks ε
t
are iid standard normal. The corresponding
partial autocorrelation function is shown in Plot 2.2.4a
2.2 Moving Averages and Autoregressive Processes 69
Plot 2.2.3a: Realization of the AR(2)-process Y
t
= 0.6Y
t−1
−0.3Y
t−2
+
ε
t
, conditional on Y
−1
= Y
0
= 0. The ε
t
, 1 ≤ t ≤ 200, are iid standard
normal.
1 /* ar2_plot.sas */
2 TITLE1 ’Realisation of an AR(2)-process ’;
3
4 /* Generated AR(2)-process */
5 DATA data1;
6 t=-1; y=0; OUTPUT;
7 t=0; y1=y; y=0; OUTPUT;
8 DO t=1 TO 200;
9 y2=y1;
10 y1=y;
11 y=0.6*y1 -0.3*y2+RANNOR (1);
12 OUTPUT;
13 END;
14
15 /* Graphical options */
16 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3;
17 AXIS1 LABEL=(’t’);
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19
20 /* Plot the AR(2)-processes */
21 PROC GPLOT DATA=data1(WHERE=(t>0));
22 PLOT y*t / HAXIS=AXIS1 VAXIS=AXIS2;
23 RUN; QUIT;
Program 2.2.3: Simulating AR(2)-processes.
70 Models of Time Series
The two initial values of y are defined and
stored in an observation by the OUTPUT state-
ment. The second observation contains an ad-
ditional value y1 for y
t−1
. Within the loop the
values y2 (for y
t−2
), y1 and y are updated one
after the other. The data set used by PROC
GPLOT again just contains the observations with
t > 0.
Plot 2.2.4a: Empirical partial autocorrelation function of the AR(2)-
data in Plot 2.2.3a
1 /* ar2_epa.sas */
2 TITLE1 ’Empirical partial autocorrelation function ’;
3 TITLE2 ’of simulated AR(2)-process data ’;
4 /* Note that this program requires data1 generated by the previous
→program (ar2_plot.sas) */
5
6 /* Compute partial autocorrelation function */
7 PROC ARIMA DATA=data1(WHERE=(t>0));
8 IDENTIFY VAR=y NLAG =50 OUTCOV=corr NOPRINT;
9
10 /* Graphical options */
11 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.7;
12 AXIS1 LABEL=(’k’);
2.2 Moving Averages and Autoregressive Processes 71
13 AXIS2 LABEL=(’a(k)’);
14
15 /* Plot autocorrelation function */
16 PROC GPLOT DATA=corr;
17 PLOT PARTCORR*LAG / HAXIS=AXIS1 VAXIS=AXIS2 VREF =0;
18 RUN; QUIT;
Program 2.2.4: Computing the empirical partial autocorrelation
function of AR(2)-data.
This program requires to be submitted to SAS
for execution within a joint session with Pro-
gram 2.2.3 (ar2 plot.sas), because it uses the
temporary data step data1 generated there.
Otherwise you have to add the block of state-
ments to this programconcerning the data step.
Like in Program1.3.1 (sunspot correlogram.sas)
the procedure ARIMA with the IDENTIFY state-
ment is used to create a data set. Here we are
interested in the variable PARTCORR containing
the values of the empirical partial autocorrela-
tion function from the simulated AR(2)-process
data. This variable is plotted against the lag
stored in variable LAG.
ARMA-Processes
Moving averages MA(q) and autoregressive AR(p)-processes are spe-
cial cases of so called autoregressive moving averages. Let (ε
t
)
t∈Z
be a
white noise, p, q ≥ 0 integers and a
0
, . . . , a
p
, b
0
, . . . , b
q
∈ R. A real val-
ued stochastic process (Y
t
)
t∈Z
is said to be an autoregressive moving
average process of order p, q, denoted by ARMA(p, q), if it satisfies
the equation
Y
t
= a
1
Y
t−1
+a
2
Y
t−2
+· · · +a
p
Y
t−p

t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
. (2.11)
An ARMA(p, 0)-process with p ≥ 1 is obviously an AR(p)-process,
whereas an ARMA(0, q)-process with q ≥ 1 is a moving average
MA(q). The polynomials
A(z) := 1 −a
1
z −· · · −a
p
z
p
(2.12)
and
B(z) := 1 +b
1
z +· · · +b
q
z
q
, (2.13)
are the characteristic polynomials of the autoregressive part and of
the moving average part of an ARMA(p, q)-process (Y
t
), which we
can represent in the form
Y
t
−a
1
Y
t−1
−· · · −a
p
Y
t−p
= ε
t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
.
72 Models of Time Series
Denote by Z
t
the right-hand side of the above equation i.e., Z
t
:=
ε
t
+ b
1
ε
t−1
+ · · · + b
q
ε
t−q
. This is a MA(q)-process and, therefore,
stationary by Theorem 2.1.6. If all p roots of the equation A(z) =
1 −a
1
z −· · · −a
p
z
p
= 0 are outside of the unit circle, then we deduce
from Theorem 2.1.11 that the filter c
0
= 1, c
u
= −a
u
, u = 1, . . . , p,
c
u
= 0 elsewhere, has an absolutely summable causal inverse filter
(d
u
)
u≥0
. Consequently we obtain from the equation Z
t
= Y
t
−a
1
Y
t−1

· · · −a
p
Y
t−p
and (2.1) that with b
0
= 1, b
w
= 0 if w > q
Y
t
=

u≥0
d
u
Z
t−u
=

u≥0
d
u

t−u
+b
1
ε
t−1−u
+· · · +b
q
ε
t−q−u
)
=

u≥0

w≥0
d
u
b
w
ε
t−w−u
=

v≥0
_

u+w=v
d
u
b
w
_
ε
t−v
=

v≥0
_
min(v,q)

w=0
b
w
d
v−w
_
ε
t−v
=:

v≥0
α
v
ε
t−v
is the almost surely uniquely determined stationary solution of the
ARMA(p, q)-equation (2.11) for a given white noise (ε
t
) .
The condition that all p roots of the characteristic equation A(z) =
1 − a
1
z − a
2
z
2
− · · · − a
p
z
p
= 0 of the ARMA(p, q)-process (Y
t
)
are outside of the unit circle will again be referred to in the fol-
lowing as the stationarity condition (2.3). In this case, the process
Y
t
=

v≥0
α
v
ε
t−v
, t ∈ Z, is the almost surely uniquely determined
stationary solution of the ARMA(p, q)-equation (2.11), which is called
causal.
The MA(q)-process Z
t
= ε
t
+ b
1
ε
t−1
+ · · · + b
q
ε
t−q
is by definition
invertible if all q roots of the polynomial B(z) = 1+b
1
z+· · ·+b
q
z
q
are
outside of the unit circle. Theorem 2.1.11 and equation (2.1) imply in
this case the existence of an absolutely summable causal filter (g
u
)
u≥0
such that with a
0
= −1
ε
t
=

u≥0
g
u
Z
t−u
=

u≥0
g
u
(Y
t−u
−a
1
Y
t−1−u
−· · · −a
p
Y
t−p−u
)
= −

v≥0
_
min(v,p)

w=0
a
w
g
v−w
_
Y
t−v
.
In this case the ARMA(p, q)-process (Y
t
) is said to be invertible.
2.2 Moving Averages and Autoregressive Processes 73
The Autocovariance Function of an ARMA-Process
In order to deduce the autocovariance function of an ARMA(p, q)-
process (Y
t
), which satisfies the stationarity condition (2.3), we com-
pute at first the absolutely summable coefficients
α
v
=
min(q,v)

w=0
b
w
d
v−w
, v ≥ 0,
in the above representation Y
t
=

v≥0
α
v
ε
t−v
. The characteristic
polynomial D(z) of the absolutely summable causal filter (d
u
)
u≥0
co-
incides by Lemma 2.1.9 for 0 < |z| < 1 with 1/A(z), where A(z)
is given in (2.12). Thus we obtain with B(z) as given in (2.13) for
0 < |z| < 1
A(z)(B(z)D(z)) = B(z)

_

p

u=0
a
u
z
u
__

v≥0
α
v
z
v
_
=
q

w=0
b
w
z
w

w≥0
_

u+v=w
a
u
α
v
_
z
w
=

w≥0
b
w
z
w

w≥0
_

w

u=0
a
u
α
w−u
_
z
w
=

w≥0
b
w
z
w

_
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
_
α
0
= 1
α
w

w

u=1
a
u
α
w−u
= b
w
for 1 ≤ w ≤ p
α
w

p

u=1
a
u
α
w−u
= 0 for w > p.
(2.14)
Example 2.2.7. For the ARMA(1, 1)-process Y
t
−aY
t−1
= ε
t
+bε
t−1
with |a| < 1 we obtain from (2.14)
α
0
= 1, α
1
−a = b, α
w
−aα
w−1
= 0. w ≥ 2,
This implies α
0
= 1, α
w
= a
w−1
(b +a), w ≥ 1, and, hence,
Y
t
= ε
t
+ (b +a)

w≥1
a
w−1
ε
t−w
.
74 Models of Time Series
Theorem 2.2.8. Suppose that Y
t
=

p
u=1
a
u
Y
t−u
+

q
v=0
b
v
ε
t−v
, b
0
:=
1, is an ARMA(p, q)-process, which satisfies the stationarity condition
(2.3). Its autocovariance function γ then satisfies the recursion
γ(s) −
p

u=1
a
u
γ(s −u) = σ
2
q

v=s
b
v
α
v−s
, 0 ≤ s ≤ q,
γ(s) −
p

u=1
a
u
γ(s −u) = 0, s ≥ q + 1, (2.15)
where α
v
, v ≥ 0, are the coefficients in the representation Y
t
=

v≥0
α
v
ε
t−v
, which we computed in (2.14) and σ
2
is the variance of
ε
0
.
By the preceding result the autocorrelation function ρ of the ARMA(p, q)-
process (Y
t
) satisfies
ρ(s) =
p

u=1
a
u
ρ(s −u), s ≥ q + 1,
which coincides with the autocorrelation function of the stationary
AR(p)-process X
t
=

p
u=1
a
u
X
t−u

t
, c.f. Lemma 2.2.5.
Proof of Theorem 2.2.8. Put µ := E(Y
0
) and ν := E(ε
0
). Then we
have
Y
t
−µ =
p

u=1
a
u
(Y
t−u
−µ) +
q

v=0
b
v

t−v
−ν), t ∈ Z.
Recall that Y
t
=

p
u=1
a
u
Y
t−u
+

q
v=0
b
v
ε
t−v
, t ∈ Z. Taking expec-
tations on both sides we obtain µ =

p
u=1
a
u
µ +

q
v=0
b
v
ν, which
yields now the equation displayed above. Multiplying both sides with
Y
t−s
−µ, s ≥ 0, and taking expectations, we obtain
Cov(Y
t−s
, Y
t
) =
p

u=1
a
u
Cov(Y
t−s
, Y
t−u
) +
q

v=0
b
v
Cov(Y
t−s
, ε
t−v
),
which implies
γ(s) −
p

u=1
a
u
γ(s −u) =
q

v=0
b
v
Cov(Y
t−s
, ε
t−v
).
2.2 Moving Averages and Autoregressive Processes 75
From the representation Y
t−s
=

w≥0
α
w
ε
t−s−w
and Theorem 2.1.5
we obtain
Cov(Y
t−s
, ε
t−v
) =

w≥0
α
w
Cov(ε
t−s−w
, ε
t−v
) =
_
0 if v < s
σ
2
α
v−s
if v ≥ s.
This implies
γ(s) −
p

u=1
a
u
γ(s −u) =
q

v=s
b
v
Cov(Y
t−s
, ε
t−v
)
=
_
σ
2

q
v=s
b
v
α
v−s
if s ≤ q
0 if s > q,
which is the assertion.
Example 2.2.9. For the ARMA(1, 1)-process Y
t
−aY
t−1
= ε
t
+bε
t−1
with |a| < 1 we obtain from Example 2.2.7 and Theorem 2.2.8 with
σ
2
= Var(ε
0
)
γ(0) −aγ(1) = σ
2
(1 +b(b +a)), γ(1) −aγ(0) = σ
2
b,
and thus
γ(0) = σ
2
1 + 2ab +b
2
1 −a
2
, γ(1) = σ
2
(1 +ab)(a +b)
1 −a
2
.
For s ≥ 2 we obtain from (2.15)
γ(s) = aγ(s −1) = · · · = a
s−1
γ(1).
76 Models of Time Series
Plot 2.2.5a: Autocorrelation functions of ARMA(1, 1)-processes with
a = 0.8/ −0.8, b = 0.5/0/ −0.5 and σ
2
= 1.
1 /* arma11_autocorrelation .sas */
2 TITLE1 ’Autocorrelation functions of ARMA (1,1)-processes ’;
3
4 /* Compute autocorrelations functions for different ARMA (1,1)-
→processes */
5 DATA data1;
6 DO a=-0.8, 0.8;
7 DO b=-0.5, 0, 0.5;
8 s=0; rho=1;
9 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
10 OUTPUT;
11 s=1; rho =(1+a*b)*(a+b)/(1+2*a*b+b*b);
12 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
13 OUTPUT;
14 DO s=2 TO 10;
15 rho=a*rho;
16 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
17 OUTPUT;
18 END;
19 END;
20 END;
21
22 /* Graphical options */
23 SYMBOL1 C=RED V=DOT I=JOIN H=0.7 L=1;
24 SYMBOL2 C=YELLOW V=DOT I=JOIN H=0.7 L=2;
25 SYMBOL3 C=BLUE V=DOT I=JOIN H=0.7 L=33;
26 SYMBOL4 C=RED V=DOT I=JOIN H=0.7 L=3;
27 SYMBOL5 C=YELLOW V=DOT I=JOIN H=0.7 L=4;
2.2 Moving Averages and Autoregressive Processes 77
28 SYMBOL6 C=BLUE V=DOT I=JOIN H=0.7 L=5;
29 AXIS1 LABEL=(F=CGREEK ’r’ F=COMPLEX ’(k)’);
30 AXIS2 LABEL=(’lag k’) MINOR=NONE;
31 LEGEND1 LABEL=(’(a,b)=’) SHAPE=SYMBOL (10 ,0.8);
32
33 /* Plot the autocorrelation functions */
34 PROC GPLOT DATA=data1;
35 PLOT rho*s=q / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
36 RUN; QUIT;
Program 2.2.5: Computing autocorrelation functions of ARMA(1, 1)-
processes.
In the data step the values of the autocor-
relation function belonging to an ARMA(1, 1)
process are calculated for two different val-
ues of a, the coefficient of the AR(1)-part, and
three different values of b, the coefficient of the
MA(1)-part. Pure AR(1)-processes result for
the value b=0. For the arguments (lags) s=0
and s=1 the computation is done directly, for
the rest up to s=10 a loop is used for a recur-
sive computation. For the COMPRESS statement
see Program 1.1.3 (logistic.sas).
The second part of the program uses PROC
GPLOT to plot the autocorrelation function, using
known statements and options to customize the
output.
ARIMA-Processes
Suppose that the time series (Y
t
) has a polynomial trend of degree d.
Then we can eliminate this trend by considering the process (∆
d
Y
t
),
obtained by d times differencing as described in Section 1.2. If the
filtered process (∆
d
Y
d
) is an ARMA(p, q)-process satisfying the sta-
tionarity condition (2.3), the original process (Y
t
) is said to be an
autoregressive integrated moving average of order p, d, q, denoted by
ARIMA(p, d, q). In this case constants a
1
, . . . , a
p
, b
0
= 1, b
1
, . . . , b
q

R exist such that

d
Y
t
=
p

u=1
a
u

d
Y
t−u
+
q

w=0
b
w
ε
t−w
, t ∈ Z,
where (ε
t
) is a white noise.
Example 2.2.10. An ARIMA(1, 1, 1)-process (Y
t
) satisfies
∆Y
t
= a∆Y
t−1

t
+bε
t−1
, t ∈ Z,
78 Models of Time Series
where |a| < 1, b = 0 and (ε
t
) is a white noise, i.e.,
Y
t
−Y
t−1
= a(Y
t−1
−Y
t−2
) +ε
t
+bε
t−1
, t ∈ Z.
This implies Y
t
= (a + 1)Y
t−1
−aY
t−2

t
+bε
t−1
.
A random walk X
t
= X
t−1

t
is obviously an ARIMA(0, 1, 0)-process.
Consider Y
t
= S
t
+ R
t
, t ∈ Z, where the random component (R
t
)
is a stationary process and the seasonal component (S
t
) is periodic
of length s, i.e., S
t
= S
t+s
= S
t+2s
= . . . for t ∈ Z. Then the
process (Y
t
) is in general not stationary, but Y

t
:= Y
t
− Y
t−s
is. If
this seasonally adjusted process (Y

t
) is an ARMA(p, q)-process satis-
fying the stationarity condition (2.3), then the original process (Y
t
) is
called a seasonal ARMA(p, q)-process with period length s, denoted by
SARMA
s
(p, q). One frequently encounters a time series with a trend
as well as a periodic seasonal component. A stochastic process (Y
t
)
with the property that (∆
d
(Y
t
− Y
t−s
)) is an ARMA(p, q)-process is,
therefore, called a SARIMA(p, d, q)-process. This is a quite common
assumption in practice.
Cointegration
In the sequel we will frequently use the notation that a time series
(Y
t
) is I(d), d = 0, 1, if the sequence of differences (∆
d
Y
t
) of order d
is a stationary process. By the difference ∆
0
Y
t
of order zero we denote
the undifferenced process Y
t
, t ∈ Z.
Suppose that the two time series (Y
t
) and (Z
t
) satisfy
Y
t
= aW
t

t
, Z
t
= W
t

t
, t ∈ Z,
for some real number a = 0, where (W
t
) is I(1), and (ε
t
), (δ
t
) are
uncorrelated white noise processes, i.e., Cov(ε
t
, δ
s
) = 0, t, s ∈ Z.
Then (Y
t
) and (Z
t
) are both I(1), but
X
t
:= Y
t
−aZ
t
= ε
t
−aδ
t
, t ∈ Z,
is I(0).
The fact that the combination of two nonstationary series yields a
stationary process arises from a common component (W
t
), which is
2.2 Moving Averages and Autoregressive Processes 79
I(1). More generally, two I(1) series (Y
t
), (Z
t
) are said to be cointe-
grated, if there exist constants µ, α
1
, α
2
with α
1
, α
2
different from 0,
such that the process
X
t
= µ +α
1
Y
t

2
Z
t
, t ∈ Z
is I(0). Without loss of generality, we can choose α
1
= 1 in this case.
Such cointegrated time series are often encountered in macroeconomics
(Granger (1981), Engle and Granger (1987)). Consider, for example,
prices for the same commodity in different parts of a country. Prin-
ciples of supply and demand, along with the possibility of arbitrage,
means that, while the process may fluctuate more-or-less randomly,
the distance between them will, in equilibrium, be relatively constant
(typically about zero).
The link between cointegration and error correction can vividly be de-
scribed by the humorous tale of the drunkard and his dog, c.f. Murray
(1994). In the same way a drunkard seems to follow a random walk
an unleashed dog wanders aimlessly. We can, therefore, model their
ways by random walks
Y
t
= Y
t−1

t
and
Z
t
= Z
t−1

t
,
where the individual single steps (ε
t
), (δ
t
) of man and dog are uncor-
related white noise processes. Random walks are not stationary, since
their variances increase, and so both processes (Y
t
) and (Z
t
) are not
stationary.
And if the dog belongs to the drunkard? We assume the dog to
be unleashed and thus, the distance Y
t
− Z
t
between the drunk and
his dog is a random variable. It seems reasonable to assume that
these distances form a stationary process, i.e., that (Y
t
) and (Z
t
) are
cointegrated with constants α
1
= 1 and α
2
= −1.
We model the cointegrated walks above more tritely by assuming the
existence of constants c, d ∈ R such that
Y
t
−Y
t−1
= ε
t
+c(Y
t−1
−Z
t−1
) and
Z
t
−Z
t−1
= δ
t
+d(Y
t−1
−Z
t−1
).
The additional terms on the right-hand side of these equations are the
error correction terms.
80 Models of Time Series
Cointegration requires that both variables in question be I(1), but
that a linear combination of them be I(0). This means that the first
step is to figure out if the series themselves are I(1), typically by using
unit root tests. If one or both are not I(1), cointegration is not an
option.
Whether two processes (Y
t
) and (Z
t
) are cointegrated can be tested
by means of a linear regression approach. This is based on the coin-
tegration regression
Y
t
= β
0

1
Z
t

t
,
where (ε
t
) is a stationary process and β
0
, β
1
∈ R are the cointegration
constants.
One can use the ordinary least squares estimates
ˆ
β
0
,
ˆ
β
1
of the target
parameters β
0
, β
1
, which satisfy
n

t=1
_
Y
t

ˆ
β
0

ˆ
β
1
Z
t
_
2
= min
β
0

1
∈R
n

t=1
_
Y
t
−β
0
−β
1
Z
t
_
2
,
and one checks, whether the estimated residuals
ˆ ε
t
= Y
t

ˆ
β
0

ˆ
β
1
Z
t
are generated by a stationary process.
A general strategy for examining cointegrated series can now be sum-
marized as follows:
1. Determine that the two series are I(1).
2. Compute ˆ ε
t
= Y
t

ˆ
β
0

ˆ
β
1
Z
t
using ordinary least squares.
3. Examine ˆ ε
t
for stationarity, using
• the Durbin–Watson test
• standard unit root tests such as Dickey–Fuller or augmented
Dickey–Fuller.
Example 2.2.11. (Hog Data) Quenouille’s (1957) Hog Data list the
annual hog supply and hog prices in the U.S. between 1867 and 1948.
Do they provide a typical example of cointegrated series? A discussion
can be found in Box and Tiao (1977).
2.2 Moving Averages and Autoregressive Processes 81
Plot 2.2.6a: Hog Data: hog supply and hog prices.
1 /* hog.sas */
2 TITLE1 ’Hog supply , hog prices and differences ’;
3 TITLE2 ’Hog Data (1867 -1948) ’;
4
5 /* Read in the two data sets */
6 DATA data1;
7 INFILE ’c:\data\hogsuppl.txt ’;
8 INPUT supply @@;
9
10 DATA data2;
11 INFILE ’c:\data\hogprice.txt ’;
12 INPUT price @@;
13
14 /* Merge data sets , generate year and compute differences */
82 Models of Time Series
15 DATA data3;
16 MERGE data1 data2;
17 year=_N_ +1866;
18 diff=supply -price;
19
20 /* Graphical options */
21 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
22 AXIS1 LABEL=(ANGLE =90 ’h o g s u p p l y’);
23 AXIS2 LABEL=(ANGLE =90 ’h o g p r i c e s’);
24 AXIS3 LABEL=(ANGLE =90 ’d i f f e r e n c e s’);
25
26 /* Generate three plots */
27 GOPTIONS NODISPLAY;
28 PROC GPLOT DATA=data3 GOUT=abb;
29 PLOT supply*year / VAXIS=AXIS1;
30 PLOT price*year / VAXIS=AXIS2;
31 PLOT diff*year / VAXIS=AXIS3 VREF =0;
32 RUN;
33
34 /* Display them in one output */
35 GOPTIONS DISPLAY;
36 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
37 TEMPLATE=V3;
38 TREPLAY 1:GPLOT 2: GPLOT1 3: GPLOT2;
39 RUN; DELETE _ALL_; QUIT;
Program 2.2.6: Plotting the Hog Data.
The supply data and the price data read
in from two external files are merged in
data3. Year is an additional variable with val-
ues 1867, 1868, . . . , 1932. By PROC GPLOT hog
supply, hog prices and their differences diff
are plotted in three different plots stored in the
graphics catalog abb. The horizontal line at the
zero level is plotted by the option VREF=0. The
plots are put into a common graphic using PROC
GREPLAY and the template V3. Note that the la-
bels of the vertical axes are spaced out as SAS
sets their characters too close otherwise.
Hog supply (=: y
t
) and hog price (=: z
t
) obviously increase in time t
and do, therefore, not seem to be realizations of stationary processes;
nevertheless, as they behave similarly, a linear combination of both
might be stationary. In this case, hog supply and hog price would be
cointegrated.
This phenomenon can easily be explained as follows. A high price z
t
at time t is a good reason for farmers to breed more hogs, thus leading
to a large supply y
t+1
in the next year t +1. This makes the price z
t+1
fall with the effect that farmers will reduce their supply of hogs in the
following year t + 2. However, when hogs are in short supply, their
2.2 Moving Averages and Autoregressive Processes 83
price z
t+2
will rise etc. There is obviously some error correction mech-
anism inherent in these two processes, and the observed cointegration
helps us to detect its existence.
The AUTOREG Procedure
Dependent Variable supply
Ordinary Least Squares Estimates
SSE 338324.258 DFE 80
MSE 4229 Root MSE 65.03117
SBC 924.172704 AIC 919.359266
Regress R-Square 0.3902 Total R-Square 0.3902
Durbin -Watson 0.5839
Phillips -Ouliaris
Cointegration Test
Lags Rho Tau
1 -28.9109 -4.0142
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 515.7978 26.6398 19.36 <.0001
price 1 0.2059 0.0288 7.15 <.0001
Listing 2.2.7a: Phillips–Ouliaris test for cointegration of Hog Data.
1 /* hog_cointegration.sas */
2 TITLE1 ’Testing for cointegration ’;
3 TITLE2 ’Hog Data (1867 -1948) ’;
4 /* Note that this program needs data3 generated by the previous
→program (hog.sas) */
5
6 /* Compute Phillips -Ouliaris -test for conintegration */
7 PROC AUTOREG DATA=data3;
8 MODEL supply=price / STATIONARITY =( PHILLIPS);
9 RUN; QUIT;
Program 2.2.7: Phillips–Ouliaris test for cointegration of Hog Data.
84 Models of Time Series
The procedure AUTOREG (for autoregressive
models) uses data3 from Program 2.2.6
(hog.sas). In the MODEL statement a regression
from supply on price is defined and the option
STATIONARITY=(PHILLIPS) makes SAS calcu-
late the statistics of the Phillips–Ouliaris test for
cointegration of order 1.
The output of the above program contains some characteristics of
the regression, the Phillips-Ouliaris test statistics and the regression
coefficients with their t-ratios. The Phillips-Ouliaris test statistics
need some further explanation.
The hypothesis of the Phillips–Ouliaris cointegration test is no coin-
tegration. Unfortunately SAS does not provide the p-value, but only
the values of the test statistics denoted by RHO and TAU. Tables of
critical values of these test statistics can be found in Phillips and Ou-
liaris (1990). Note that in the original paper the two test statistics
are denoted by
ˆ
Z
α
and
ˆ
Z
t
. The hypothesis is to be rejected if RHO or
TAU are below the critical value for the desired type I level error α.
For this one has to differentiate between the following cases.
(1) If the estimated cointegrating regression does not contain any
intercept, i.e. β
0
= 0, and none of the explanatory variables
has a nonzero trend component, then use the following table
for critical values of RHO and TAU. This is the so-called standard
case.
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -10.74 -11.57 -12.54 -13.81 -15.64 -18.88 -22.83
TAU -2.26 -2.35 -2.45 -2.58 -2.76 -3.05 -3.39
(2) If the estimated cointegrating regression contains an intercept,
i.e. β
0
= 0, and none of the explanatory variables has a nonzero
trend component, then use the following table for critical values
of RHO and TAU. This case is referred to as demeaned.
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -14.91 -15.93 -17.04 -18.48 -20.49 -23.81 -28.32
TAU -2.86 -2.96 -3.07 -3.20 -3.37 -3.64 -3.96
2.2 Moving Averages and Autoregressive Processes 85
(3) If the estimated cointegrating regression contains an intercept,
i.e. β
0
= 0, and at least one of the explanatory variables has
a nonzero trend component, then use the following table for
critical values of RHO and TAU. This case is said to be demeaned
and detrended.
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -20.79 -21.81 -23.19 -24.75 -27.09 -30.84 -35.42
TAU -3.33 -3.42 -3.52 -3.65 -3.80 -4.07 -4.36
In our example with an arbitrary β
0
and a visible trend in the in-
vestigated time series, the RHO-value is −28.9109 and the TAU-value
−4.0142. Both are smaller than the critical values of −27.09 and
−3.80 in the above table of the demeaned and detrended case and
thus, lead to a rejection of the null hypothesis of no cointegration at
the 5%-level.
For further information on cointegration we refer to Chapter 19 of the
time series book by Hamilton (1994).
ARCH- and GARCH-Processes
In particular the monitoring of stock prices gave rise to the idea that
the volatility of a time series (Y
t
) might not be a constant but rather
a random variable, which depends on preceding realizations. The
following approach to model such a change in volatility is due to Engle
(1982).
We assume the multiplicative model
Y
t
= σ
t
Z
t
, t ∈ Z,
where the Z
t
are independent and identically distributed random vari-
ables with
E(Z
t
) = 0 and E(Z
2
t
) = 1, t ∈ Z.
The scale σ
t
is supposed to be a function of the past p values of the
series:
σ
2
t
= a
0
+
p

j=1
a
j
Y
2
t−j
, t ∈ Z,
86 Models of Time Series
where p ∈ {0, 1, . . . } and a
0
> 0, a
j
≥ 0, 1 ≤ j ≤ p − 1, a
p
> 0 are
constants.
The particular choice p = 0 yields obviously a white noise model for
(Y
t
). Common choices for the distribution of Z
t
are the standard
normal distribution or the (standardized) t-distribution, which in the
non-standardized form has the density
f
m
(x) :=
Γ((m+ 1)/2)
Γ(m/2)

πm
_
1 +
x
2
m
_
−(m+1)/2
, x ∈ R.
The number m ≥ 1 is the degree of freedom of the t-distribution. The
scale σ
t
in the above model is determined by the past observations
Y
t−1
, . . . , Y
t−p
, and the innovation on this scale is then provided by
Z
t
. We assume moreover that the process (Y
t
) is a causal one in the
sense that Z
t
and Y
s
, s < t, are independent. Some autoregressive
structure is, therefore, inherent in the process (Y
t
). Conditional on
Y
t−j
= y
t−j
, 1 ≤ j ≤ p, the variance of Y
t
is a
0
+

p
j=1
a
j
y
2
t−j
and,
thus, the conditional variances of the process will generally be differ-
ent. The process Y
t
= σ
t
Z
t
is, therefore, called an autoregressive and
conditional heteroscedastic process of order p, ARCH(p)-process for
short.
If, in addition, the causal process (Y
t
) is stationary, then we obviously
have
E(Y
t
) = E(σ
t
) E(Z
t
) = 0
and
σ
2
:= E(Y
2
t
) = E(σ
2
t
) E(Z
2
t
)
= a
0
+
p

j=1
a
j
E(Y
2
t−j
)
= a
0

2
p

j=1
a
j
,
which yields
σ
2
=
a
0
1 −

p
j=1
a
j
.
A necessary condition for the stationarity of the process (Y
t
) is, there-
fore, the inequality

p
j=1
a
j
< 1. Note, moreover, that the preceding
2.2 Moving Averages and Autoregressive Processes 87
arguments immediately imply that the Y
t
and Y
s
are uncorrelated for
different values of s and t, but they are not independent.
The following lemma is crucial. It embeds the ARCH(p)-processes to
a certain extent into the class of AR(p)-processes, so that our above
tools for the analysis of autoregressive processes can be applied here
as well.
Lemma 2.2.12. Let (Y
t
) be a stationary and causal ARCH(p)-process
with constants a
0
, a
1
, . . . , a
p
. If the process of squared random vari-
ables (Y
2
t
) is a stationary one, then it is an AR(p)-process:
Y
2
t
= a
1
Y
2
t−1
+· · · +a
p
Y
2
t−p

t
,
where (ε
t
) is a white noise with E(ε
t
) = a
0
, t ∈ Z.
Proof. From the assumption that (Y
t
) is an ARCH(p)-process we ob-
tain
ε
t
:= Y
2
t

p

j=1
a
j
Y
2
t−j
= σ
2
t
Z
2
t
−σ
2
t
+a
0
= a
0

2
t
(Z
2
t
−1), t ∈ Z.
This implies E(ε
t
) = a
0
and
E((ε
t
−a
0
)
2
) = E(σ
4
t
) E((Z
2
t
−1)
2
)
= E
_
(a
0
+
p

j=1
a
j
Y
2
t−j
)
2
_
E((Z
2
t
−1)
2
) =: c,
independent of t by the stationarity of (Y
2
t
). For h ∈ N the causality
of (Y
t
) finally implies
E((ε
t
−a
0
)(ε
t+h
−a
0
)) = E(σ
2
t
σ
2
t+h
(Z
2
t
−1)(Z
2
t+h
−1))
= E(σ
2
t
σ
2
t+h
(Z
2
t
−1)) E(Z
2
t+h
−1) = 0,
i.e., (ε
t
) is a white noise with E(ε
t
) = a
0
.
The process (Y
2
t
) satisfies, therefore, the stationarity condition (2.3)
if all p roots of the equation 1 −

p
j=1
a
j
z
j
= 0 are outside of the
unit circle. Hence, we can estimate the order p using an estimate as
88 Models of Time Series
in (2.10) of the partial autocorrelation function of (Y
2
t
). The Yule–
Walker equations provide us, for example, with an estimate of the
coefficients a
1
, . . . , a
p
, which then can be utilized to estimate the ex-
pectation a
0
of the error ε
t
.
Note that conditional on Y
t−1
= y
t−1
, . . . , Y
t−p
= y
t−p
, the distribution
of Y
t
= σ
t
Z
t
is a normal one if the Z
t
are normally distributed. In
this case it is possible to write down explicitly the joint density of
the vector (Y
p+1
, . . . , Y
n
), conditional on Y
1
= y
1
, . . . , Y
p
= y
p
(Ex-
ercise 2.36). A numerical maximization of this density with respect
to a
0
, a
1
, . . . , a
p
then leads to a maximum likelihood estimate of the
vector of constants; see also Section 2.3.
A generalized ARCH-process, GARCH(p, q) for short (Bollerslev (1986)),
adds an autoregressive structure to the scale σ
t
by assuming the rep-
resentation
σ
2
t
= a
0
+
p

j=1
a
j
Y
2
t−j
+
q

k=1
b
k
σ
2
t−k
,
where the constants b
k
are nonnegative. The set of parameters a
j
, b
k
can again be estimated by conditional maximum likelihood as before
if a parametric model for the distribution of the innovations Z
t
is
assumed.
Example 2.2.13. (Hongkong Data). The daily Hang Seng closing
index was recorded between July 16th, 1981 and September 30th,
1983, leading to a total amount of 552 observations p
t
. The daily log
returns are defined as
y
t
:= log(p
t
) −log(p
t−1
),
where we now have a total of n = 551 observations. The expansion
log(1 +ε) ≈ ε implies that
y
t
= log
_
1 +
p
t
−p
t−1
p
t−1
_

p
t
−p
t−1
p
t−1
,
provided that p
t−1
and p
t
are close to each other. In this case we can
interpret the return as the difference of indices on subsequent days,
relative to the initial one.
2.2 Moving Averages and Autoregressive Processes 89
We use an ARCH(3) model for the generation of y
t
, which seems to
be a plausible choice by the partial autocorrelations plot. If one as-
sumes t-distributed innovations Z
t
, SAS estimates the distribution’s
degrees of freedom and displays the reciprocal in the TDFI-line, here
m = 1/0.1780 = 5.61 degrees of freedom. Following we obtain
the estimates a
0
= 0.000214, a
1
= 0.147593, a
2
= 0.278166 and
a
3
= 0.157807. The SAS output also contains some general regres-
sion model information from an ordinary least squares estimation ap-
proach, some specific information for the (G)ARCH approach and as
mentioned above the estimates for the ARCH model parameters in
combination with t ratios and approximated p-values. The following
plots show the returns of the Hang Seng index, their squares, the per-
taining partial autocorrelation function and the parameter estimates.
90 Models of Time Series
Plot 2.2.8a: Log returns of Hang Seng index and their squares.
1 /* hongkong_plot.sas */
2 TITLE1 ’Daily log returns and their squares ’;
3 TITLE2 ’Hongkong Data ’;
4
5 /* Read in the data , compute log return and their squares */
6 DATA data1;
7 INFILE ’c:\data\hongkong.txt ’;
8 INPUT p;
9 t=_N_;
10 y=DIF(LOG(p));
11 y2=y**2;
12
13 /* Graphical options */
14 SYMBOL1 C=RED V=DOT H=0.5 I=JOIN L=1;
15 AXIS1 LABEL=(’y’ H=1 ’t’) ORDER =(-.12 TO .10 BY .02);
16 AXIS2 LABEL=(’y2’ H=1 ’t’);
17
18 /* Generate two plots */
2.2 Moving Averages and Autoregressive Processes 91
19 GOPTIONS NODISPLAY;
20 PROC GPLOT DATA=data1 GOUT=abb;
21 PLOT y*t / VAXIS=AXIS1;
22 PLOT y2*t / VAXIS=AXIS2;
23 RUN;
24
25 /* Display them in one output */
26 GOPTIONS DISPLAY;
27 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
28 TEMPLATE=V2;
29 TREPLAY 1:GPLOT 2: GPLOT1;
30 RUN; DELETE _ALL_; QUIT;
Program 2.2.8: Plotting the log returns and their squares.
In the DATA step the observed values of the
Hang Seng closing index are read into the vari-
able p from an external file. The time index vari-
able t uses the SAS-variable N , and the log
transformed and differenced values of the in-
dex are stored in the variable y, their squared
values in y2.
After defining different axis labels, two plots
are generated by two PLOT statements in PROC
GPLOT, but they are not displayed. By means of
PROC GREPLAY the plots are merged vertically in
one graphic.
92 Models of Time Series
Plot 2.2.9a: Partial autocorrelations of squares of log returns of Hang
Seng index.
The AUTOREG Procedure
Dependent Variable = Y
Ordinary Least Squares Estimates
SSE 0.265971 DFE 551
MSE 0.000483 Root MSE 0.021971
SBC -2643.82 AIC -2643.82
Reg Rsq 0.0000 Total Rsq 0.0000
Durbin -Watson 1.8540
NOTE: No intercept term is used. R-squares are redefined.
GARCH Estimates
SSE 0.265971 OBS 551
MSE 0.000483 UVAR 0.000515
Log L 1706.532 Total Rsq 0.0000
SBC -3381.5 AIC -3403.06
Normality Test 119.7698 Prob >Chi -Sq 0.0001
Variable DF B Value Std Error t Ratio Approx Prob
2.3 The Box–Jenkins Program 93
ARCH0 1 0.000214 0.000039 5.444 0.0001
ARCH1 1 0.147593 0.0667 2.213 0.0269
ARCH2 1 0.278166 0.0846 3.287 0.0010
ARCH3 1 0.157807 0.0608 2.594 0.0095
TDFI 1 0.178074 0.0465 3.833 0.0001
Listing 2.2.9b: Parameter estimates in the ARCH(3)-model for stock
returns.
1 /* hongkong_pa.sas */
2 TITLE1 ’ARCH (3)-model ’;
3 TITLE2 ’Hongkong Data ’;
4 /* Note that this program needs data1 generated by the previous
→program (hongkong_plot.sas) */
5
6 /* Compute partial autocorrelation function */
7 PROC ARIMA DATA=data1;
8 IDENTIFY VAR=y2 NLAG =50 OUTCOV=data2;
9
10 /* Graphical options */
11 SYMBOL1 C=RED V=DOT H=0.5 I=JOIN;
12
13 /* Plot partial autocorrelation function */
14 PROC GPLOT DATA=data2;
15 PLOT partcorr*lag / VREF =0;
16 RUN;
17
18 /* Estimate ARCH (3)-model */
19 PROC AUTOREG DATA=data1;
20 MODEL y = / NOINT GARCH=(q=3) DIST=T;
21 RUN; QUIT;
Program 2.2.9: Analyzing the log returns.
To identify the order of a possibly underlying
ARCH process for the daily log returns of the
Hang Seng closing index, the empirical partial
autocorrelations of their squared values, which
are stored in the variable y2 of the data set
data1 in Program 2.2.8 (hongkong plot.sas),
are calculated by means of PROC ARIMA and the
IDENTIFY statement. The subsequent proce-
dure GPLOT displays these partial autocorrela-
tions. A horizontal reference line helps to de-
cide whether a value is substantially different
from 0.
PROC AUTOREG is used to analyze the ARCH(3)
model for the daily log returns. The MODEL state-
ment specifies the dependent variable y. The
option NOINT suppresses an intercept parame-
ter, GARCH=(q=3) selects the ARCH(3) model
and DIST=T determines a t distribution for the
innovations Z
t
in the model equation. Note that,
in contrast to our notation, SAS uses the letter
q for the ARCH model order.
94 Chapter 2. Models of Time Series
2.3 Specification of ARMA-Models: The Box–
Jenkins Program
The aim of this section is to fit a time series model (Y
t
)
t∈Z
to a given
set of data y
1
, . . . , y
n
collected in time t. We suppose that the data
y
1
, . . . , y
n
are (possibly) variance-stabilized as well as trend or sea-
sonally adjusted. We assume that they were generated by clipping
Y
1
, . . . , Y
n
from an ARMA(p, q)-process (Y
t
)
t∈Z
, which we will fit to
the data in the following. As noted in Section 2.2, we could also fit
the model Y
t
=

v≥0
α
v
ε
t−v
to the data, where (ε
t
) is a white noise.
But then we would have to determine infinitely many parameters α
v
,
v ≥ 0. By the principle of parsimony it seems, however, reasonable
to fit only the finite number of parameters of an ARMA(p, q)-process.
The Box–Jenkins program consists of four steps:
1. Order selection: Choice of the parameters p and q.
2. Estimation of coefficients: The coefficients a
1
, . . . , a
p
and b
1
, . . . , b
q
are estimated.
3. Diagnostic check: The fit of the ARMA(p, q)-model with the
estimated coefficients is checked.
4. Forecasting: The prediction of future values of the original process.
The four steps are discussed in the following.
Order Selection
The order q of a moving average MA(q)-process can be estimated by
means of the empirical autocorrelation function r(k) i.e., by the cor-
relogram. Lemma 2.2.1 (iv) shows that the autocorrelation function
ρ(k) vanishes for k ≥ q + 1. This suggests to choose the order q such
that r(q) is clearly different from zero, whereas r(k) for k ≥ q + 1 is
quite close to zero. This, however, is obviously a rather vague selection
rule.
The order p of an AR(p)-process can be estimated in an analogous way
using the empirical partial autocorrelation function ˆ α(k), k ≥ 1, as
defined in (2.10). Since ˆ α(p) should be close to the p-th coefficient a
p
2.3 Specification of ARMA-Models: The Box–Jenkins Program 95
of the AR(p)-process, which is different from zero, whereas ˆ α(k) ≈ 0
for k > p, the above rule can be applied again with r replaced by ˆ α.
The choice of the orders p and q of an ARMA(p, q)-process is a bit
more challenging. In this case one commonly takes the pair (p, q),
minimizing some function, which is based on an estimate ˆ σ
2
p,q
of the
variance of ε
0
. Popular functions are Akaike’s Information Criterion
AIC(p, q) := log(ˆ σ
2
p,q
) + 2
p +q + 1
n + 1
,
the Bayesian Information Criterion
BIC(p, q) := log(ˆ σ
2
p,q
) +
(p +q) log(n + 1)
n + 1
and the Hannan-Quinn Criterion
HQ(p, q) := log(ˆ σ
2
p,q
) +
2(p +q)c log(log(n + 1))
n + 1
with c > 1.
AIC and BIC are discussed in Section 9.3 of Brockwell and Davis
(1991) for Gaussian processes (Y
t
), where the joint distribution of
an arbitrary vector (Y
t
1
, . . . , Y
t
k
) with t
1
< · · · < t
k
is multivariate
normal, see below. For the HQ-criterion we refer to Hannan and
Quinn (1979). Note that the variance estimate ˆ σ
2
p,q
will in general
become arbitrarily small as p + q increases. The additive terms in
the above criteria serve, therefore, as penalties for large values, thus
helping to prevent overfitting of the data by choosing p and q too
large.
Estimation of Coefficients
Suppose we fixed the order p and q of an ARMA(p, q)-process (Y
t
)
t∈Z
,
with Y
1
, . . . , Y
n
now modelling the data y
1
, . . . , y
n
. In the next step
we will derive estimators of the constants a
1
, . . . , a
p
, b
1
, . . . , b
q
in the
model
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
, t ∈ Z.
96 Chapter 2. Models of Time Series
The Gaussian Model: Maximum Likelihood Estima-
tor
We assume first that (Y
t
) is a Gaussian process and thus, the joint
distribution of (Y
1
, . . . , Y
n
) is a n-dimensional normal distribution
P{Y
i
≤ s
i
, i = 1, . . . , n} =
_
s
1
−∞
. . .
_
s
n
−∞
ϕ
µ,Σ
(x
1
, . . . , x
n
) dx
n
. . . dx
1
for arbitrary s
1
, . . . , s
n
∈ R. Here
ϕ
µ,Σ
(x
1
, . . . , x
n
)
=
1
(2π)
n/2
(det Σ)
1/2
exp
_

1
2
((x
1
, . . . , x
n
) −µ)Σ
−1
((x
1
, . . . , x
n
) −µ)
T
_
for arbitrary x
1
, . . . , x
n
∈ R is the density of the n-dimensional normal
distribution with mean vector µ = (µ, . . . , µ)
T
∈ R
n
and covariance
matrix Σ = (γ(i − j))
1≤i,j≤n
denoted by N(µ, Σ), where µ = E(Y
0
)
and γ is the autocovariance function of the stationary process (Y
t
).
The number ϕ
µ,Σ
(x
1
, . . . , x
n
) reflects the probability that the random
vector (Y
1
, . . . , Y
n
) realizes close to (x
1
, . . . , x
n
). Precisely, we have for
ε ↓ 0
P{Y
i
∈ [x
i
−ε, x
i
+ε], i = 1, . . . , n}
=
_
x
1

x
1
−ε
. . .
_
x
n

x
n
−ε
ϕ
µ,Σ
(z
1
, . . . , z
n
) dz
n
. . . dz
1
≈ 2
n
ε
n
ϕ
µ,Σ
(x
1
, . . . , x
n
).
The likelihood principle is the fact that a random variable tends to
attain its most likely value and thus, if the vector (Y
1
, . . . , Y
n
) actually
attained the value (y
1
, . . . , y
n
), the unknown underlying mean vector
µ and covariance matrix Σ ought to be such that ϕ
µ,Σ
(y
1
, . . . , y
n
)
is maximized. The computation of these parameters leads to the
maximum likelihood estimator of µ and Σ.
We assume that the process (Y
t
) satisfies the stationarity condition
(2.3), in which case Y
t
=

v≥0
α
v
ε
t−v
, t ∈ Z, is invertible, where (ε
t
)
is a white noise and the coefficients α
v
depend only on a
1
, . . . , a
p
and
b
1
, . . . , b
q
. Consequently we have for s ≥ 0
γ(s) = Cov(Y
0
, Y
s
) =

v≥0

w≥0
α
v
α
w
Cov(ε
−v
, ε
s−w
) = σ
2

v≥0
α
v
α
s+v
.
2.3 Specification of ARMA-Models: The Box–Jenkins Program 97
The matrix
Σ

:= σ
−2
Σ,
therefore, depends only on a
1
, . . . , a
p
and b
1
, . . . , b
q
. We can write now
the density ϕ
µ,Σ
(x
1
, . . . , x
n
) as a function of ϑ := (σ
2
, µ, a
1
, . . . , a
p
, b
1
, . . . , b
q
) ∈
R
p+q+2
and (x
1
, . . . , x
n
) ∈ R
n
p(x
1
, . . . , x
n
|ϑ) := ϕ
µ,Σ
(x
1
, . . . , x
n
)
= (2πσ
2
)
−n/2
(det Σ

)
−1/2
exp
_

1

2
Q(ϑ|x
1
, . . . , x
n
)
_
,
where
Q(ϑ|x
1
, . . . , x
n
) := ((x
1
, . . . , x
n
) −µ)Σ

−1
((x
1
, . . . , x
n
) −µ)
T
is a quadratic function. The likelihood function pertaining to the
outcome (y
1
, . . . , y
n
) is
L(ϑ|y
1
, . . . , y
n
) := p(y
1
, . . . , y
n
|ϑ).
A parameter
ˆ
ϑ maximizing the likelihood function
L(
ˆ
ϑ|y
1
, . . . , y
n
) = sup
ϑ
L(ϑ|y
1
, . . . , y
n
),
is then a maximum likelihood estimator of ϑ.
Due to the strict monotonicity of the logarithm, maximizing the like-
lihood function is in general equivalent to the maximization of the
loglikelihood function
l(ϑ|y
1
, . . . , y
n
) = log L(ϑ|y
1
, . . . , y
n
).
The maximum likelihood estimator
ˆ
ϑ therefore satisfies
l(
ˆ
ϑ|y
1
, . . . , y
n
)
= sup
ϑ
l(ϑ|y
1
, . . . , y
n
)
= sup
ϑ
_

n
2
log(2πσ
2
) −
1
2
log(det Σ

) −
1

2
Q(ϑ|y
1
, . . . , y
n
)
_
.
The computation of a maximizer is a numerical and usually computer
intensive problem.
98 Chapter 2. Models of Time Series
Example 2.3.1. The AR(1)-process Y
t
= aY
t−1

t
with |a| < 1 has
by Example 2.2.4 the autocovariance function
γ(s) = σ
2
a
s
1 −a
2
, s ≥ 0,
and thus,
Σ

=
1
1 −a
2
_
_
_
_
1 a a
2
. . . a
n−1
a 1 a a
n−2
.
.
.
.
.
.
.
.
.
a
n−1
a
n−2
a
n−3
. . . 1
_
_
_
_
.
The inverse matrix is
Σ

−1
=
_
_
_
_
_
_
_
_
1 −a 0 0 . . . 0
−a 1 +a
2
−a 0 0
0 −a 1 +a
2
−a 0
.
.
.
.
.
.
.
.
.
0 . . . −a 1 +a
2
−a
0 0 . . . 0 −a 1
_
_
_
_
_
_
_
_
.
Check that the determinant of Σ

−1
is det(Σ

−1
) = 1−a
2
= 1/ det(Σ

),
see Exercise 2.40. If (Y
t
) is a Gaussian process, then the likelihood
function of ϑ = (σ
2
, µ, a) is given by
L(ϑ|y
1
, . . . , y
n
) = (2πσ
2
)
−n/2
(1 −a
2
)
1/2
exp
_

1

2
Q(ϑ|y
1
, . . . , y
n
)
_
,
where
Q(ϑ|y
1
, . . . , y
n
)
= ((y
1
, . . . , y
n
) −µ)Σ

−1
((y
1
, . . . , y
n
) −µ)
T
= (y
1
−µ)
2
+ (y
n
−µ)
2
+ (1 +a
2
)
n−1

i=2
(y
i
−µ)
2
−2a
n−1

i=1
(y
i
−µ)(y
i+1
−µ).
Nonparametric Approach: Least Squares
If E(ε
t
) = 0, then
ˆ
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p
+b
1
ε
t−1
+· · · +b
q
ε
t−q
2.3 Specification of ARMA-Models: The Box–Jenkins Program 99
would obviously be a reasonable one-step forecast of the ARMA(p, q)-
process
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
,
based on Y
t−1
, . . . , Y
t−p
and ε
t−1
, . . . , ε
t−q
. The prediction error is given
by the residual
Y
t

ˆ
Y
t
= ε
t
.
Suppose that ˆ ε
t
is an estimator of ε
t
, t ≤ n, which depends on the
choice of constants a
1
, . . . , a
p
, b
1
, . . . , b
q
and satisfies the recursion
ˆ ε
t
= y
t
−a
1
y
t−1
−· · · −a
p
y
t−p
−b
1
ˆ ε
t−1
−· · · −b
q
ˆ ε
t−q
.
The function
S
2
(a
1
, . . . , a
p
, b
1
, . . . , b
q
)
=
n

t=−∞
ˆ ε
2
t
=
n

t=−∞
(y
t
−a
1
y
t−1
−· · · −a
p
y
t−p
−b
1
ˆ ε
t−1
−· · · −b
q
ˆ ε
t−q
)
2
is the residual sum of squares and the least squares approach suggests
to estimate the underlying set of constants by minimizers a
1
, . . . , a
p
, b
1
, . . . , b
q
of S
2
. Note that the residuals ˆ ε
t
and the constants are nested.
We have no observation y
t
available for t ≤ 0. But from the assump-
tion E(ε
t
) = 0 and thus E(Y
t
) = 0, it is reasonable to backforecast y
t
by zero and to put ˆ ε
t
:= 0 for t ≤ 0, leading to
S
2
(a
1
, . . . , a
p
, b
1
, . . . , b
q
) =
n

t=1
ˆ ε
2
t
.
The estimated residuals ˆ ε
t
can then be computed from the recursion
ˆ ε
1
= y
1
ˆ ε
2
= y
2
−a
1
y
1
−b
1
ˆ ε
1
ˆ ε
3
= y
3
−a
1
y
2
−a
2
y
1
−b
1
ˆ ε
2
−b
2
ˆ ε
1
.
.
.
ˆ ε
j
= y
j
−a
1
y
j−1
−· · · −a
p
y
j−p
−b
1
ˆ ε
j−1
−· · · −b
q
ˆ ε
j−q
,
100 Chapter 2. Models of Time Series
where j now runs from max{p, q} to n.
The coefficients a
1
, . . . , a
p
of a pure AR(p)-process can be estimated
directly, using the Yule-Walker equations as described in (2.7).
Diagnostic Check
Suppose that the orders p and q as well as the constants a
1
, . . . , a
p
, b
1
, . . . , b
q
have been chosen in order to model an ARMA(p, q)-process underly-
ing the data. The Portmanteau-test of Box and Pierce (1970) checks,
whether the estimated residuals ˆ ε
t
, t = 1, . . . , n, behave approximately
like realizations from a white noise process. To this end one considers
the pertaining empirical autocorrelation function
ˆ r
ε
(k) :=

n−k
j=1
(ˆ ε
j
− ¯ ε)(ˆ ε
j+k
− ¯ ε)

n
j=1
(ˆ ε
j
− ¯ ε)
2
, k = 1, . . . , n −1,
where ¯ ε = n
−1

n
j=1
ˆ ε
j
, and checks, whether the values ˆ r
ε
(k) are suf-
ficiently close to zero. This decision is based on
Q(K) := n
K

k=1
ˆ r
2
ε
(k),
which follows asymptotically for n → ∞ a χ
2
-distribution with K −
p − q degrees of freedom if (Y
t
) is actually an ARMA(p, q)-process
(see e.g. Section 9.4 in Brockwell and Davis (1991)). The parame-
ter K must be chosen such that the sample size n − k in ˆ r
ε
(k) is
large enough to give a stable estimate of the autocorrelation function.
The ARMA(p, q)-model is rejected if the p-value 1 − χ
2
K−p−q
(Q(K))
is too small, since in this case the value Q(K) is unexpectedly large.
By χ
2
K−p−q
we denote the distribution function of the χ
2
-distribution
with K −p −q degrees of freedom. To accelerate the convergence to
the χ
2
K−p−q
distribution under the null hypothesis of an ARMA(p, q)-
process, one often replaces the Box–Pierce statistic Q(K) by the Box–
Ljung statistic (Ljung and Box (1978))
Q

(K) := n
K

k=1
_
_
n + 2
n −k
_
1/2
ˆ r
ε
(k)
_
2
= n(n + 2)
K

k=1
1
n −k
ˆ r
2
ε
(k)
with weighted empirical autocorrelations.
2.3 Specification of ARMA-Models: The Box–Jenkins Program 101
Forecasting
We want to determine weights c

0
, . . . , c

n−1
∈ R such that for h ∈ N
E
_
_
_
Y
n+h

n−1

u=0
c

u
Y
n−u
_
2
_
_
= min
c
0
,...,c
n−1
∈R
E
_
_
_
Y
n+h

n−1

u=0
c
u
Y
n−u
_
2
_
_
.
Then
ˆ
Y
n+h
:=

n−1
u=0
c

u
Y
n−u
with minimum mean squared error is said
to be a best (linear) h-step forecast of Y
n+h
, based on Y
1
, . . . , Y
n
. The
following result provides a sufficient condition for the optimality of
weights.
Lemma 2.3.2. Let (Y
t
) be an arbitrary stochastic process with finite
second moments. If the weights c

0
, . . . , c

n−1
have the property that
E
_
Y
i
_
Y
n+h

n−1

u=0
c

u
Y
n−u
__
= 0, i = 1, . . . , n, (2.16)
then
ˆ
Y
n+h
:=

n−1
u=0
c

u
Y
n−u
is a best h-step forecast of Y
n+h
.
Proof. Let
˜
Y
n+h
:=

n−1
u=0
c
u
Y
n−u
be an arbitrary forecast, based on
Y
1
, . . . , Y
n
. Then we have
E((Y
n+h

˜
Y
n+h
)
2
)
= E((Y
n+h

ˆ
Y
n+h
+
ˆ
Y
n+h

˜
Y
n+h
)
2
)
= E((Y
n+h

ˆ
Y
n+h
)
2
) + 2
n−1

u=0
(c

u
−c
u
) E(Y
n−u
(Y
n+h

ˆ
Y
n+h
))
+ E((
ˆ
Y
n+h

˜
Y
n+h
)
2
)
= E((Y
n+h

ˆ
Y
n+h
)
2
) + E((
ˆ
Y
n+h

˜
Y
n+h
)
2
)
≥ E((Y
n+h

ˆ
Y
n+h
)
2
).
Suppose that (Y
t
) is a stationary process with mean zero and auto-
correlation function ρ. The equations (2.16) are then of Yule-Walker
type
ρ(h +s) =
n−1

u=0
c

u
ρ(s −u), s = 0, 1, . . . , n −1,
102 Chapter 2. Models of Time Series
or, in matrix language
_
_
_
_
ρ(h)
ρ(h + 1)
.
.
.
ρ(h +n −1)
_
_
_
_
= P
n
_
_
_
_
c

0
c

1
.
.
.
c

n−1
_
_
_
_
(2.17)
with the matrix P
n
as defined in (2.8). If this matrix is invertible,
then
_
_
c

0
.
.
.
c

n−1
_
_
:= P
−1
n
_
_
ρ(h)
.
.
.
ρ(h +n −1)
_
_
(2.18)
is the uniquely determined solution of (2.17).
If we put h = 1, then equation (2.18) implies that c

n−1
equals the
partial autocorrelation coefficient α(n). In this case, α(n) is the coef-
ficient of Y
1
in the best linear one-step forecast
ˆ
Y
n+1
=

n−1
u=0
c

u
Y
n−u
of Y
n+1
.
Example 2.3.3. Consider the MA(1)-process Y
t
= ε
t
+ aε
t−1
with
E(ε
0
) = 0. Its autocorrelation function is by Example 2.2.2 given by
ρ(0) = 1, ρ(1) = a/(1 + a
2
), ρ(u) = 0 for u ≥ 2. The matrix P
n
equals therefore
P
n
=
_
_
_
_
_
_
_
_
_
1
a
1+a
2
0 0 . . . 0
a
1+a
2
1
a
1+a
2
0 0
0
a
1+a
2
1
a
1+a
2
0
.
.
.
.
.
.
.
.
.
0 . . . 1
a
1+a
2
0 0 . . .
a
1+a
2
1
_
_
_
_
_
_
_
_
_
.
Check that the matrix P
n
= (Corr(Y
i
, Y
j
))
1≤i,j≤n
is positive definite,
x
T
P
n
x > 0 for any x ∈ R
n
unless x = 0 (Exercise 2.40), and thus,
P
n
is invertible. The best forecast of Y
n+1
is by (2.18), therefore,

n−1
u=0
c

u
Y
n−u
with
_
_
c

0
.
.
.
c

n−1
_
_
= P
−1
n
_
_
_
_
a
1+a
2
0
.
.
.
0
_
_
_
_
2.3 Specification of ARMA-Models: The Box–Jenkins Program 103
which is a/(1 + a
2
) times the first column of P
−1
n
. The best forecast
of Y
n+h
for h ≥ 2 is by (2.18) the constant 0. Note that Y
n+h
is for
h ≥ 2 uncorrelated with Y
1
, . . . , Y
n
and thus not really predictable by
Y
1
, . . . , Y
n
.
Theorem 2.3.4. Suppose that Y
t
=

p
u=1
a
u
Y
t−u
+ ε
t
, t ∈ Z, is
a stationary AR(p)-process, which satisfies the stationarity condition
(2.3) and has zero mean E(Y
0
) = 0. Let n ≥ p. The best one-step
forecast is
ˆ
Y
n+1
= a
1
Y
n
+a
2
Y
n−1
+· · · +a
p
Y
n+1−p
and the best two-step forecast is
ˆ
Y
n+2
= a
1
ˆ
Y
n+1
+a
2
Y
n
+· · · +a
p
Y
n+2−p
.
The best h-step forecast for arbitrary h ≥ 2 is recursively given by
ˆ
Y
n+h
= a
1
ˆ
Y
n+h−1
+· · · +a
h−1
ˆ
Y
n+1
+a
h
Y
n
+· · · +a
p
Y
n+h−p
.
Proof. Since (Y
t
) satisfies the stationarity condition (2.3), it is in-
vertible by Theorem 2.2.3 i.e., there exists an absolutely summable
causal filter (b
u
)
u≥0
such that Y
t
=

u≥0
b
u
ε
t−u
, t ∈ Z, almost surely.
This implies in particular E(Y
t
ε
t+h
) =

u≥0
b
u
E(ε
t−u
ε
t+h
) = 0 for
any h ≥ 1, cf. Theorem 2.1.5. Hence we obtain for i = 1, . . . , n
E((Y
n+1

ˆ
Y
n+1
)Y
i
) = E(ε
n+1
Y
i
) = 0
from which the assertion for h = 1 follows by Lemma 2.3.2. The case
of an arbitrary h ≥ 2 is now a consequence of the recursion
E((Y
n+h

ˆ
Y
n+h
)Y
i
)
= E
_
_
_
_
ε
n+h
+
min(h−1,p)

u=1
a
u
Y
n+h−u

min(h−1,p)

u=1
a
u
ˆ
Y
n+h−u
_
_
Y
i
_
_
=
min(h−1,p)

u=1
a
u
E
__
Y
n+h−u

ˆ
Y
n+h−u
_
Y
i
_
= 0, i = 1, . . . , n,
and Lemma 2.3.2.
104 Chapter 2. Models of Time Series
A repetition of the arguments in the preceding proof implies the fol-
lowing result, which shows that for an ARMA(p, q)-process the fore-
cast of Y
n+h
for h > q is controlled only by the AR-part of the process.
Theorem 2.3.5. Suppose that Y
t
=

p
u=1
a
u
Y
t−u
+ ε
t
+

q
v=1
b
v
ε
t−v
,
t ∈ Z, is an ARMA(p, q)-process, which satisfies the stationarity con-
dition (2.3) and has zero mean, precisely E(ε
0
) = 0. Suppose that
n +q −p ≥ 0. The best h-step forecast of Y
n+h
for h > q satisfies the
recursion
ˆ
Y
n+h
=
p

u=1
a
u
ˆ
Y
n+h−u
.
Example 2.3.6. We illustrate the best forecast of the ARMA(1, 1)-
process
Y
t
= 0.4Y
t−1

t
−0.6ε
t−1
, t ∈ Z,
with E(Y
t
) = E(ε
t
) = 0. First we need the optimal 1-step forecast
´
Y
i
for i = 1, . . . , n. These are defined by putting unknown values of Y
t
with an index t ≤ 0 equal to their expected value, which is zero. We,
thus, obtain
´
Y
1
:= 0, ˆ ε
1
:= Y
1

´
Y
1
= Y
1
,
´
Y
2
:= 0.4Y
1
+ 0 −0.6ˆ ε
1
= −0.2Y
1
, ˆ ε
2
:= Y
2

´
Y
2
= Y
2
+ 0.2Y
1
,
´
Y
3
:= 0.4Y
2
+ 0 −0.6ˆ ε
2
= 0.4Y
2
−0.6(Y
2
+ 0.2Y
1
)
= −0.2Y
2
−0.12Y
1
, ˆ ε
3
:= Y
3

´
Y
3
,
.
.
.
.
.
.
until
´
Y
i
and ˆ ε
i
are defined for i = 1, . . . , n. The actual forecast is then
given by
´
Y
n+1
= 0.4Y
n
+ 0 −0.6ˆ ε
n
= 0.4Y
n
−0.6(Y
n

´
Y
n
),
´
Y
n+2
= 0.4
´
Y
n+1
+ 0 + 0,
.
.
.
´
Y
n+h
= 0.4
´
Y
n+h−1
= · · · = 0.4
h−1
´
Y
n+1
h→∞
−→ 0,
2.4 State-Space Models 105
where ε
t
with index t ≥ n + 1 is replaced by zero, since it is uncorre-
lated with Y
i
, i ≤ n.
In practice one replaces the usually unknown coefficients a
u
, b
v
in the
above forecasts by their estimated values.
2.4 State-Space Models
In state-space models we have, in general, a nonobservable target
process (X
t
) and an observable process (Y
t
). They are linked by the
assumption that (Y
t
) is a linear function of (X
t
) with an added noise,
where the linear function may vary in time. The aim is the derivation
of best linear estimates of X
t
, based on (Y
s
)
s≤t
.
Many models of time series such as ARMA(p, q)-processes can be
embedded in state-space models if we allow in the following sequences
of random vectors X
t
∈ R
k
and Y
t
∈ R
m
.
A multivariate state-space model is now defined by the state equation
X
t+1
= A
t
X
t
+B
t
ε
t+1
∈ R
k
, (2.19)
describing the time-dependent behavior of the state X
t
∈ R
k
, and the
observation equation
Y
t
= C
t
X
t

t
∈ R
m
. (2.20)
We assume that (A
t
), (B
t
) and (C
t
) are sequences of known matrices,

t
) and (η
t
) are uncorrelated sequences of white noises with mean
vectors 0 and known covariance matrices Cov(ε
t
) = E(ε
t
ε
T
t
) =: Q
t
,
Cov(η
t
) = E(η
t
η
T
t
) =: R
t
.
We suppose further that X
0
and ε
t
, η
t
, t ≥ 1, are uncorrelated, where
two random vectors W ∈ R
p
and V ∈ R
q
are said to be uncorrelated
if their components are i.e., if the matrix of their covariances vanishes
E((W −E(W)(V −E(V ))
T
) = 0.
By E(W) we denote the vector of the componentwise expectations of
W. We say that a time series (Y
t
) has a state-space representation if
it satisfies the representations (2.19) and (2.20).
106 Chapter 2. Models of Time Series
Example 2.4.1. Let (η
t
) be a white noise in R and put
Y
t
:= µ
t

t
with linear trend µ
t
= a + bt. This simple model can be represented
as a state-space model as follows. Define the state vector X
t
as
X
t
:=
_
µ
t
1
_
,
and put
A :=
_
1 b
0 1
_
From the recursion µ
t+1
= µ
t
+b we then obtain the state equation
X
t+1
=
_
µ
t+1
1
_
=
_
1 b
0 1
__
µ
t
1
_
= AX
t
,
and with
C := (1, 0)
the observation equation
Y
t
= (1, 0)
_
µ
t
1
_

t
= CX
t

t
.
Note that the state X
t
is nonstochastic i.e., B
t
= 0. This model is
moreover time-invariant, since the matrices A, B := B
t
and C do
not depend on t.
Example 2.4.2. An AR(p)-process
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
with a white noise (ε
t
) has a state-space representation with state
vector
X
t
= (Y
t
, Y
t−1
, . . . , Y
t−p+1
)
T
.
If we define the p ×p-matrix A by
A :=
_
_
_
_
_
_
a
1
a
2
. . . a
p−1
a
p
1 0 . . . 0 0
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0
_
_
_
_
_
_
2.4 State-Space Models 107
and the p ×1-matrices B, C
T
by
B := (1, 0, . . . , 0)
T
=: C
T
,
then we have the state equation
X
t+1
= AX
t
+Bε
t+1
and the observation equation
Y
t
= CX
t
.
Example 2.4.3. For the MA(q)-process
Y
t
= ε
t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
we define the non observable state
X
t
:= (ε
t
, ε
t−1
, . . . , ε
t−q
)
T
∈ R
q+1
.
With the (q + 1) ×(q + 1)-matrix
A :=
_
_
_
_
_
_
0 0 0 . . . 0 0
1 0 0 . . . 0 0
0 1 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 1 0
_
_
_
_
_
_
,
the (q + 1) ×1-matrix
B := (1, 0, . . . , 0)
T
and the 1 ×(q + 1)-matrix
C := (1, b
1
, . . . , b
q
)
we obtain the state equation
X
t+1
= AX
t
+Bε
t+1
and the observation equation
Y
t
= CX
t
.
108 Chapter 2. Models of Time Series
Example 2.4.4. Combining the above results for AR(p) and MA(q)-
processes, we obtain a state-space representation of ARMA(p, q)-processes
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
.
In this case the state vector can be chosen as
X
t
:= (Y
t
, Y
t−1
, . . . , Y
t−p+1
, ε
t
, ε
t−1
, . . . , ε
t−q+1
)
T
∈ R
p+q
.
We define the (p +q) ×(p +q)-matrix
A :=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1
a
2
. . . a
p−1
a
p
b
1
b
2
. . . b
q−1
b
q
1 0 . . . 0 0 0 . . . . . . . . . 0
0 1 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 1 0 0 . . . . . . . . . 0
0 . . . . . . . . . 0 0 . . . . . . . . . 0
.
.
.
.
.
. 1 0 . . . . . . 0
.
.
.
.
.
. 0 1 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . . . . 0 0 . . . 0 1 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
the (p +q) ×1-matrix
B := (1, 0, . . . , 0, 1, 0, . . . , 0)
T
with the entry 1 at the first and (p+1)-th position and the 1×(p+q)-
matrix
C := (1, 0, . . . , 0).
Then we have the state equation
X
t+1
= AX
t
+Bε
t+1
and the observation equation
Y
t
= CX
t
.
2.4 State-Space Models 109
The Kalman-Filter
The key problem in the state-space model (2.19), (2.20) is the estima-
tion of the nonobservable state X
t
. It is possible to derive an estimate
of X
t
recursively from an estimate of X
t−1
together with the last ob-
servation Y
t
, known as the Kalman recursions (Kalman 1960). We
obtain in this way a unified prediction technique for each time series
model that has a state-space representation.
We want to compute the best linear prediction
ˆ
X
t
:= D
1
Y
1
+· · · +D
t
Y
t
(2.21)
of X
t
, based on Y
1
, . . . , Y
t
i.e., the k × m-matrices D
1
, . . . , D
t
are
such that the mean squared error is minimized
E((X
t

ˆ
X
t
)
T
(X
t

ˆ
X
t
))
= E((X
t

t

j=1
D
j
Y
j
)
T
(X
t

t

j=1
D
j
Y
j
))
= min
k×m−matrices D

1
,...,D

t
E((X
t

t

j=1
D

j
Y
j
)
T
(X
t

t

j=1
D

j
Y
j
)). (2.22)
By repeating the arguments in the proof of Lemma 2.3.2 we will prove
the following result. It states that
ˆ
X
t
is a best linear prediction of
X
t
based on Y
1
, . . . , Y
t
if each component of the vector X
t

ˆ
X
t
is
orthogonal to each component of the vectors Y
s
, 1 ≤ s ≤ t, with
respect to the inner product E(XY ) of two random variable X and
Y .
Lemma 2.4.5. If the estimate
ˆ
X
t
defined in (2.20) satisfies
E((X
t

ˆ
X
t
)Y
T
s
) = 0, 1 ≤ s ≤ t, (2.23)
then it minimizes the mean squared error (2.22).
Note that E((X
t

ˆ
X
t
)Y
T
s
) is a k ×m-matrix, which is generated by
multiplying each component of X
t

ˆ
X
t
∈ R
k
with each component
of Y
s
∈ R
m
.
110 Chapter 2. Models of Time Series
Proof. Let X

t
=

t
j=1
D

j
Y
j
∈ R
k
be an arbitrary linear combination
of Y
1
, . . . , Y
t
. Then we have
E((X
t
−X

t
)
T
(X
t
−X

t
))
= E
__
X
t

ˆ
X
t
+
t

j=1
(D
j
−D

j
)Y
j
_
T
_
X
t

ˆ
X
t
+
t

j=1
(D
j
−D

j
)Y
j
__
= E((X
t

ˆ
X
t
)
T
(X
t

ˆ
X
t
)) + 2
t

j=1
E((X
t

ˆ
X
t
)
T
(D
j
−D

j
)Y
j
)
+ E
__
t

j=1
(D
j
−D

j
)Y
j
_
T
t

j=1
(D
j
−D

j
)Y
j
_
≥ E((X
t

ˆ
X
t
)
T
(X
t

ˆ
X
t
)),
since in the second-to-last line the final term is nonnegative and the
second one vanishes by the property (2.23).
Let now
ˆ
X
t−1
be the best linear prediction of X
t−1
based on Y
1
, . . . , Y
t−1
.
Then
˜
X
t
:= A
t−1
ˆ
X
t−1
(2.24)
is the best linear prediction of X
t
based on Y
1
, . . . , Y
t−1
, which is
easy to see. We simply replaced ε
t
in the state equation by its
expectation 0. Note that ε
t
and Y
s
are uncorrelated if s < t i.e.,
E((X
t

˜
X
t
)Y
T
s
) = 0 for 1 ≤ s ≤ t −1.
From this we obtain that
˜
Y
t
:= C
t
˜
X
t
is the best linear prediction of Y
t
based on Y
1
, . . . , Y
t−1
, since E((Y
t

˜
Y
t
)Y
T
s
) = E((C
t
(X
t

˜
X
t
) + η
t
)Y
T
s
) = 0, 1 ≤ s ≤ t − 1; note that
η
t
and Y
s
are uncorrelated if s < t.
Define now by

t
:= E((X
t

ˆ
X
t
)(X
t

ˆ
X
t
)
T
) and
˜

t
:= E((X
t

˜
X
t
)(X
t

˜
X
t
)
T
).
2.4 State-Space Models 111
the covariance matrices of the approximation errors. Then we have
˜

t
= E((A
t−1
(X
t−1

ˆ
X
t−1
) +B
t−1
ε
t
)(A
t−1
(X
t−1

ˆ
X
t−1
) +B
t−1
ε
t
)
T
)
= E(A
t−1
(X
t−1

ˆ
X
t−1
)(A
t−1
(X
t−1

ˆ
X
t−1
))
T
)
+ E((B
t−1
ε
t
)(B
t−1
ε
t
)
T
)
= A
t−1

t−1
A
T
t−1
+B
t−1
Q
t
B
T
t−1
,
since ε
t
and X
t−1

ˆ
X
t−1
are obviously uncorrelated. In complete
analogy one shows that
E((Y
t

˜
Y
t
)(Y
t

˜
Y
t
)
T
) = C
t
˜

t
C
T
t
+R
t
.
Suppose that we have observed Y
1
, . . . , Y
t−1
, and that we have pre-
dicted X
t
by
˜
X
t
= A
t−1
ˆ
X
t−1
. Assume that we now also observe Y
t
.
How can we use this additional information to improve the prediction
˜
X
t
of X
t
? To this end we add a matrix K
t
such that we obtain the
best prediction
ˆ
X
t
based on Y
1
, . . . Y
t
:
˜
X
t
+K
t
(Y
t

˜
Y
t
) =
ˆ
X
t
(2.25)
i.e., we have to choose the matrix K
t
according to Lemma 2.4.5 such
that X
t

ˆ
X
t
and Y
s
are uncorrelated for s = 1, . . . , t. In this case,
the matrix K
t
is called the Kalman gain.
Lemma 2.4.6. The matrix K
t
in (2.24) is a solution of the equation
K
t
(C
t
˜

t
C
T
t
+R
t
) =
˜

t
C
T
t
. (2.26)
Proof. The matrix K
t
has to be chosen such that X
t

ˆ
X
t
and Y
s
are
uncorrelated for s = 1, . . . , t, i.e.,
0 = E((X
t

ˆ
X
t
)Y
T
s
) = E((X
t

˜
X
t
−K
t
(Y
t

˜
Y
t
))Y
T
s
), s ≤ t.
Note that an arbitrary k ×m-matrix K
t
satisfies
E
_
(X
t

˜
X
t
−K
t
(Y
t

˜
Y
t
))Y
T
s
_
= E((X
t

˜
X
t
)Y
T
s
) −K
t
E((Y
t

˜
Y
t
)Y
T
s
) = 0, s ≤ t −1.
112 Chapter 2. Models of Time Series
In order to fulfill the above condition, the matrix K
t
needs to satisfy
only
0 = E((X
t

˜
X
t
)Y
T
t
) −K
t
E((Y
t

˜
Y
t
)Y
T
t
)
= E((X
t

˜
X
t
)(Y
t

˜
Y
t
)
T
) −K
t
E((Y
t

˜
Y
t
)(Y
t

˜
Y
t
)
T
)
= E((X
t

˜
X
t
)(C
t
(X
t

˜
X
t
) +η
t
)
T
) −K
t
E((Y
t

˜
Y
t
)(Y
t

˜
Y
t
)
T
)
= E((X
t

˜
X
t
)(X
t

˜
X
t
)
T
)C
T
t
−K
t
E((Y
t

˜
Y
t
)(Y
t

˜
Y
t
)
T
)
=
˜

t
C
T
t
−K
t
(C
t
˜

t
C
T
t
+R
t
).
But this is the assertion of Lemma 2.4.6. Note that
˜
Y
t
is a linear
combination of Y
1
, . . . , Y
t−1
and that η
t
and X
t

˜
X
t
as well as η
t
and Y
s
are uncorrelated for s ≤ t −1.
If the matrix C
t
˜

t
C
T
t
+R
t
is invertible, then
K
t
:=
˜

t
C
T
t
(C
t
˜

t
C
T
t
+R
t
)
−1
is the uniquely determined Kalman gain. We have, moreover, for a
Kalman gain

t
= E((X
t

ˆ
X
t
)(X
t

ˆ
X
t
)
T
)
= E
_
(X
t

˜
X
t
−K
t
(Y
t

˜
Y
t
))(X
t

˜
X
t
−K
t
(Y
t

˜
Y
t
))
T
_
=
˜

t
+K
t
E((Y
t

˜
Y
t
)(Y
t

˜
Y
t
)
T
)K
T
t
−E((X
t

˜
X
t
)(Y
t

˜
Y
t
)
T
)K
T
t
−K
t
E((Y
t

˜
Y
t
)(X
t

˜
X
t
)
T
)
=
˜

t
+K
t
(C
t
˜

t
C
T
t
+R
t
)K
T
t

˜

t
C
T
t
K
T
t
−K
t
C
t
˜

t
=
˜

t
−K
t
C
t
˜

t
by (2.26) and the arguments in the proof of Lemma 2.4.6.
The recursion in the discrete Kalman filter is done in two steps: From
ˆ
X
t−1
and ∆
t−1
one computes in the prediction step first
˜
X
t
= A
t−1
ˆ
X
t−1
,
˜
Y
t
= C
t
˜
X
t
,
˜

t
= A
t−1

t−1
A
T
t−1
+B
t−1
Q
t
B
T
t−1
. (2.27)
2.4 State-Space Models 113
In the updating step one then computes K
t
and the updated values
ˆ
X
t
, ∆
t
K
t
=
˜

t
C
T
t
(C
t
˜

t
C
T
t
+R
t
)
−1
,
ˆ
X
t
=
˜
X
t
+K
t
(Y
t

˜
Y
t
),

t
=
˜

t
−K
t
C
t
˜

t
. (2.28)
An obvious problem is the choice of the initial values
˜
X
1
and
˜

1
. One
frequently puts
˜
X
1
= 0 and
˜

1
as the diagonal matrix with constant
entries σ
2
> 0. The number σ
2
reflects the degree of uncertainty about
the underlying model. Simulations as well as theoretical results show,
however, that the estimates
ˆ
X
t
are often not affected by the initial
values
˜
X
1
and
˜

1
if t is large, see for instance Example 2.4.7 below.
If in addition we require that the state-space model (2.19), (2.20) is
completely determined by some parametrization ϑ of the distribution
of (Y
t
) and (X
t
), then we can estimate the matrices of the Kalman
filter in (2.27) and (2.28) under suitable conditions by a maximum
likelihood estimate of ϑ; see e.g. Section 8.5 of Brockwell and Davis
(2002) or Section 4.5 in Janacek and Swift (1993).
By iterating the 1-step prediction
˜
X
t
= A
t−1
ˆ
X
t−1
of
ˆ
X
t
in (2.24) h
times, we obtain the h-step prediction of the Kalman filter
˜
X
t+h
:= A
t+h−1
˜
X
t+h−1
, h ≥ 1,
with the initial value
˜
X
t+0
:=
ˆ
X
t
. The pertaining h-step prediction
of Y
t+h
is then
˜
Y
t+h
:= C
t+h
˜
X
t+h
, h ≥ 1.
Example 2.4.7. Let (η
t
) be a white noise in R with E(η
t
) = 0,
E(η
2
t
) = σ
2
> 0 and put for some µ ∈ R
Y
t
:= µ +η
t
, t ∈ Z.
This process can be represented as a state-space model by putting
X
t
:= µ, with state equation X
t+1
= X
t
and observation equation
Y
t
= X
t

t
i.e., A
t
= 1 = C
t
and B
t
= 0. The prediction step (2.27)
of the Kalman filter is now given by
˜
X
t
=
ˆ
X
t−1
,
˜
Y
t
=
˜
X
t
,
˜

t
= ∆
t−1
.
114 Chapter 2. Models of Time Series
Note that all these values are in R. The h-step predictions
˜
X
t+h
,
˜
Y
t+h
are, therefore, given by
˜
X
t
. The update step (2.28) of the Kalman
filter is
K
t
=

t−1

t−1

2
ˆ
X
t
=
ˆ
X
t−1
+K
t
(Y
t

ˆ
X
t−1
)

t
= ∆
t−1
−K
t

t−1
= ∆
t−1
σ
2

t−1

2
.
Note that ∆
t
= E((X
t

ˆ
X
t
)
2
) ≥ 0 and thus,
0 ≤ ∆
t
= ∆
t−1
σ
2

t−1

2
≤ ∆
t−1
is a decreasing and bounded sequence. Its limit ∆ := lim
t→∞

t
consequently exists and satisfies
∆ = ∆
σ
2
∆ +σ
2
i.e., ∆ = 0. This means that the mean squared error E((X
t

ˆ
X
t
)
2
) =
E((µ−
ˆ
X
t
)
2
) vanishes asymptotically, no matter how the initial values
˜
X
1
and
˜

1
are chosen. Further we have lim
t→∞
K
t
= 0, which means
that additional observations Y
t
do not contribute to
ˆ
X
t
if t is large.
Finally, we obtain for the mean squared error of the h-step prediction
˜
Y
t+h
of Y
t+h
E((Y
t+h

˜
Y
t+h
)
2
) = E((µ +η
t+h

ˆ
X
t
)
2
)
= E((µ −
ˆ
X
t
)
2
) + E(η
2
t+h
)
t→∞
−→ σ
2
.
Example 2.4.8. The following figure displays the Airline Data from
Example 1.3.1 together with 12-step forecasts based on the Kalman
filter. The original data y
t
, t = 1, . . . , 144 were log-transformed x
t
=
log(y
t
) to stabilize the variance; first order differences ∆x
t
= x
t
−x
t−1
were used to eliminate the trend and, finally, z
t
= ∆x
t
− ∆x
t−12
were computed to remove the seasonal component of 12 months. The
2.4 State-Space Models 115
Kalman filter was applied to forecast z
t
, t = 145, . . . , 156, and the
results were transformed in the reverse order of the preceding steps
to predict the initial values y
t
, t = 145, . . . , 156.
Plot 2.4.1a: Airline Data and predicted values using the Kalman filter.
1 /* airline_kalman.sas */
2 TITLE1 ’Original and Forecasted Data ’;
3 TITLE2 ’Airline Data ’;
4
5 /* Read in the data and compute log -transformation */
6 DATA data1;
7 INFILE ’c:\data\airline.txt ’;
8 INPUT y;
9 yl=LOG(y);
10 t=_N_;
11
12 /* Compute trend and seasonally adjusted data set */
13 PROC STATESPACE DATA=data1 OUT=data2 LEAD =12;
14 VAR yl(1,12); ID t;
15
16 /* Compute forecasts by inverting the log -transformation */
17 DATA data3;
18 SET data2;
19 yhat=EXP(FOR1);
20
116 Chapter 2. Models of Time Series
21 /* Meger data sets */
22 DATA data4(KEEP=t y yhat);
23 MERGE data1 data3;
24 BY t;
25
26 /* Graphical options */
27 LEGEND1 LABEL=(’’) VALUE=(’original ’ ’forecast ’);
28 SYMBOL1 C=BLACK V=DOT H=0.7 I=JOIN L=1;
29 SYMBOL2 C=BLACK V=CIRCLE H=1.5 I=JOIN L=1;
30 AXIS1 LABEL=(ANGLE =90 ’Passengers ’);
31 AXIS2 LABEL=(’January 1949 to December 1961’);
32
33 /* Plot data and forecasts */
34 PROC GPLOT DATA=data4;
35 PLOT y*t=1 yhat*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=
→LEGEND1;
36 RUN; QUIT;
Program 2.4.1: Applying the Kalman filter.
In the first data step the Airline Data are read
into data1. Their logarithm is computed and
stored in the variable yl. The variable t con-
tains the observation number.
The statement VAR yl(1,12) of the PROC
STATESPACE procedure tells SAS to use first or-
der differences of the initial data to remove their
trend and to adjust them to a seasonal compo-
nent of 12 months. The data are identified by
the time index set to t. The results are stored in
the data set data2 with forecasts of 12 months
after the end of the input data. This is invoked
by LEAD=12.
data3 contains the exponentially trans-
formed forecasts, thereby inverting the log-
transformation in the first data step.
Finally, the two data sets are merged and dis-
played in one plot.
Exercises
2.1. Suppose that the complex random variables Y and Z are square
integrable. Show that
Cov(aY +b, Z) = a Cov(Y, Z), a, b ∈ C.
2.2. Give an example of a stochastic process (Y
t
) such that for arbi-
trary t
1
, t
2
∈ Z and k = 0
E(Y
t
1
) = E(Y
t
1
+k
) but Cov(Y
t
1
, Y
t
2
) = Cov(Y
t
1
+k
, Y
t
2
+k
).
2.3. (i) Let (X
t
), (Y
t
) be stationary processes such that Cov(X
t
, Y
s
) =
0 for t, s ∈ Z. Show that for arbitrary a, b ∈ C the linear combina-
tions (aX
t
+bY
t
) yield a stationary process.
2.4 State-Space Models 117
(ii) Suppose that the decomposition Z
t
= X
t
+Y
t
, t ∈ Z holds. Show
that stationarity of (Z
t
) does not necessarily imply stationarity of
(X
t
).
2.4. (i) Show that the process Y
t
= Xe
iat
, a ∈ R, is stationary, where
X is a complex valued random variable with mean zero and finite
variance.
(ii) Show that the random variable Y = be
iU
has mean zero, where U
is a uniformly distributed random variable on (0, 2π) and b ∈ C.
2.5. Let Z
1
, Z
2
be independent and normal N(µ
i
, σ
2
i
), i = 1, 2, distrib-
uted random variables and choose λ ∈ R. For which means µ
1
, µ
2
∈ R
and variances σ
2
1
, σ
2
2
> 0 is the cosinoid process
Y
t
= Z
1
cos(2πλt) +Z
2
sin(2πλt), t ∈ Z
stationary?
2.6. Show that the autocovariance function γ : Z →C of a complex-
valued stationary process (Y
t
)
t∈Z
, which is defined by
γ(h) = E(Y
t+h
¯
Y
t
) −E(Y
t+h
) E(
¯
Y
t
), h ∈ Z,
has the following properties: γ(0) ≥ 0, |γ(h)| ≤ γ(0), γ(h) = γ(−h),
i.e., γ is a Hermitian function, and

1≤r,s≤n
z
r
γ(r − s)¯ z
s
≥ 0 for
z
1
, . . . , z
n
∈ C, n ∈ N, i.e., γ is a positive semidefinite function.
2.7. Suppose that Y
t
, t = 1, . . . , n, is a stationary process with mean
µ. Then ˆ µ
n
:= n
−1

n
t=1
Y
t
is an unbiased estimator of µ. Express the
mean square error E(ˆ µ
n
−µ)
2
in terms of the autocovariance function
γ and show that E(ˆ µ
n
−µ)
2
→0 if γ(n) →0, n →∞.
2.8. Suppose that (Y
t
)
t∈Z
is a stationary process and denote by
c(k) :=
_
1
n

n−|k|
t=1
(Y
t

¯
Y )(Y
t+|k|

¯
Y ), |k| = 0, . . . , n −1,
0, |k| ≥ n.
the empirical autocovariance function at lag k, k ∈ Z.
118 Chapter 2. Models of Time Series
(i) Show that c(k) is a biased estimator of γ(k) (even if the factor
n
−1
is replaced by (n −k)
−1
) i.e., E(c(k)) = γ(k).
(ii) Show that the k-dimensional empirical covariance matrix
C
k
:=
_
_
_
_
c(0) c(1) . . . c(k −1)
c(1) c(0) c(k −2)
.
.
.
.
.
.
.
.
.
c(k −1) c(k −2) . . . c(0)
_
_
_
_
is positive semidefinite. (If the factor n
−1
in the definition of c(j)
is replaced by (n − j)
−1
, j = 1, . . . , k, the resulting covariance
matrix may not be positive semidefinite.) Hint: Consider k ≥ n
and write C
k
= n
−1
AA
T
with a suitable k × 2k-matrix A.
Show further that C
m
is positive semidefinite if C
k
is positive
semidefinite for k > m.
(iii) If c(0) > 0, then C
k
is nonsingular, i.e., C
k
is positive definite.
2.9. Suppose that (Y
t
) is a stationary process with autocovariance
function γ
Y
. Express the autocovariance function of the difference
filter of first order ∆Y
t
= Y
t
− Y
t−1
in terms of γ
Y
. Find it when
γ
Y
(k) = λ
|k|
.
2.10. Let (Y
t
)
t∈Z
be a stationary process with mean zero. If its au-
tocovariance function satisfies γ(τ) = 0 for some τ > 0, then γ is
periodic with length τ, i.e., γ(t +τ) = γ(t), t ∈ Z.
2.11. Let (Y
t
) be a stochastic process such that for t ∈ Z
P{Y
t
= 1} = p
t
= 1 −P{Y
t
= −1}, 0 < p
t
< 1.
Suppose in addition that (Y
t
) is a Markov process, i.e., for any t ∈ Z,
k ≥ 1
P(Y
t
= y
0
|Y
t−1
= y
1
, . . . , Y
t−k
= y
k
) = P(Y
t
= y
0
|Y
t−1
= y
1
).
(i) Is (Y
t
)
t∈N
a stationary process?
(ii) Compute the autocovariance function in case P(Y
t
= 1|Y
t−1
=
1) = λ and p
t
= 1/2.
2.4 State-Space Models 119
2.12. Let (ε
t
)
t
be a white noise process with independent ε
t
∼ N(0, 1)
and define
˜ ε
t
=
_
ε
t
, if t is even,

2
t−1
−1)/

2, if t is odd.
Show that (˜ ε
t
)
t
is a white noise process with E(˜ ε
t
) = 0 and Var(˜ ε
t
) =
1, where the ˜ ε
t
are neither independent nor identically distributed.
Plot the path of (ε
t
)
t
and (˜ ε
t
)
t
for t = 1, . . . , 100 and compare!
2.13. Let (ε
t
)
t∈Z
be a white noise. The process Y
t
=

t
s=1
ε
s
is said
to be a random walk. Plot the path of a random walk with normal
N(µ, σ
2
) distributed ε
t
for each of the cases µ < 0, µ = 0 and µ > 0.
2.14. Let (a
u
) be absolutely summable filters and let (Z
t
) be a sto-
chastic process with sup
t∈Z
E(Z
2
t
) < ∞. Put for t ∈ Z
X
t
=

u
a
u
Z
t−u
, Y
t
=

v
b
v
Z
t−v
.
Then we have
E(X
t
Y
t
) =

u

v
a
u
b
v
E(Z
t−u
Z
t−v
).
Hint: Use the general inequality |xy| ≤ (x
2
+y
2
)/2.
2.15. Let Y
t
= aY
t−1
+ ε
t
, t ∈ Z be an AR(1)-process with |a| > 1.
Compute the autocorrelation function of this process.
2.16. Compute the orders p and the coefficients a
u
of the process Y
t
=

p
u=0
a
u
ε
t−u
with Var(ε
0
) = 1 and autocovariance function γ(1) =
2, γ(2) = 1, γ(3) = −1 and γ(t) = 0 for t ≥ 4. Is this process
invertible?
2.17. The autocorrelation function ρ of an arbitrary MA(q)-process
satisfies

1
2

q

v=1
ρ(v) ≤
1
2
q.
Give examples of MA(q)-processes, where the lower bound and the
upper bound are attained, i.e., these bounds are sharp.
120 Chapter 2. Models of Time Series
2.18. Let (Y
t
)
t∈Z
be a stationary stochastic process with E(Y
t
) = 0,
t ∈ Z, and
ρ(t) =
_
¸
_
¸
_
1 if t = 0
ρ(1) if t = 1
0 if t > 1,
where |ρ(1)| < 1/2. Then there exists a ∈ (−1, 1) and a white noise

t
)
t∈Z
such that
Y
t
= ε
t
+aε
t−1
.
Hint: Example 2.2.2.
2.19. Find two MA(1)-processes with the same autocovariance func-
tions.
2.20. Suppose that Y
t
= ε
t
+ aε
t−1
is a noninvertible MA(1)-process,
where |a| > 1. Define the new process
˜ ε
t
=

j=0
(−a)
−j
Y
t−j
and show that (˜ ε
t
) is a white noise. Show that Var(˜ ε
t
) = a
2
Var(ε
t
)
and (Y
t
) has the invertible representation
Y
t
= ˜ ε
t
+a
−1
˜ ε
t−1
.
2.21. Plot the autocorrelation functions of MA(p)-processes for dif-
ferent values of p.
2.22. Generate and plot AR(3)-processes (Y
t
), t = 1, . . . , 500 where
the roots of the characteristic polynomial have the following proper-
ties:
(i) all roots are outside the unit disk,
(ii) all roots are inside the unit disk,
(iii) all roots are on the unit circle,
(iv) two roots are outside, one root inside the unit disk,
2.4 State-Space Models 121
(v) one root is outside, one root is inside the unit disk and one root
is on the unit circle,
(vi) all roots are outside the unit disk but close to the unit circle.
2.23. Show that the AR(2)-process Y
t
= a
1
Y
t−1
+a
2
Y
t−2

t
for a
1
=
1/3 and a
2
= 2/9 has the autocorrelation function
ρ(k) =
16
21
_
2
3
_
|k|
+
5
21
_

1
3
_
|k|
, k ∈ Z
and for a
1
= a
2
= 1/12 the autocorrelation function
ρ(k) =
45
77
_
1
3
_
|k|
+
32
77
_

1
4
_
|k|
, k ∈ Z.
2.24. Let (ε
t
) be a white noise with E(ε
0
) = µ, Var(ε
0
) = σ
2
and put
Y
t
= ε
t
−Y
t−1
, t ∈ N, Y
0
= 0.
Show that
Corr(Y
s
, Y
t
) = (−1)
s+t
min{s, t}/

st.
2.25. An AR(2)-process Y
t
= a
1
Y
t−1
+a
2
Y
t−2

t
satisfies the station-
arity condition (2.3), if the pair (a
1
, a
2
) is in the triangle
∆ :=
_
(α, β) ∈ R
2
: −1 < β < 1, α +β < 1 and β −α < 1
_
.
Hint: Use the fact that necessarily ρ(1) ∈ (−1, 1).
2.26. (i) Let (Y
t
) denote the unique stationary solution of the autore-
gressive equations
Y
t
= aY
t−1

t
, t ∈ Z,
with |a| > 1. Then (Y
t
) is given by the expression Y
t
= −


j=1
a
−j
ε
t+j
(see the proof of Lemma 2.1.10). Define the new process
˜ ε
t
= Y
t

1
a
Y
t−1
,
122 Chapter 2. Models of Time Series
and show that (˜ ε
t
) is a white noise with Var(˜ ε
t
) = Var(ε
t
)/a
2
. These
calculations show that (Y
t
) is the (unique stationary) solution of the
causal AR-equations
Y
t
=
1
a
Y
t−1
+ ˜ ε
t
, t ∈ Z.
Thus, every AR(1)-process with |a| > 1 can be represented as an
AR(1)-process with |a| < 1 and a new white noise.
(ii) Show that for |a| = 1 the above autoregressive equations have no
stationary solutions. A stationary solution exists if the white noise
process is degenerated, i.e., E(ε
2
t
) = 0.
2.27. (i) Consider the process
˜
Y
t
:=
_
ε
1
for t = 1
aY
t−1

t
for t > 1,
i.e.,
˜
Y
t
, t ≥ 1, equals the AR(1)-process Y
t
= aY
t−1
+ ε
t
, conditional
on Y
0
= 0. Compute E(
˜
Y
t
), Var(
˜
Y
t
) and Cov(Y
t
, Y
t+s
). Is there some-
thing like asymptotic stationarity for t →∞?
(ii) Choose a ∈ (−1, 1), a = 0, and compute the correlation matrix
of Y
1
, . . . , Y
10
.
2.28. Use the IML function ARMASIM to simulate the stationary AR(2)-
process
Y
t
= −0.3Y
t−1
+ 0.3Y
t−2

t
.
Estimate the parameters a
1
= −0.3 and a
2
= 0.3 by means of the
Yule–Walker equations using the SAS procedure PROC ARIMA.
2.29. Show that the value at lag 2 of the partial autocorrelation func-
tion of the MA(1)-process
Y
t
= ε
t
+aε
t−1
, t ∈ Z
is
α(2) = −
a
2
1 +a
2
+a
4
.
2.4 State-Space Models 123
2.30. (Unemployed1 Data) Plot the empirical autocorrelations and
partial autocorrelations of the trend and seasonally adjusted Unem-
ployed1 Data from the building trade, introduced in Example 1.1.1.
Apply the Box–Jenkins program. Is a fit of a pure MA(q)- or AR(p)-
process reasonable?
2.31. Plot the autocorrelation functions of ARMA(p, q)-processes for
different values of p, q using the IML function ARMACOV. Plot also their
empirical counterparts.
2.32. Compute the autocovariance function of an ARMA(1, 2)-process.
2.33. Derive the least squares normal equations for an AR(p)-process
and compare them with the Yule–Walker equations.
2.34. Show that the density of the t-distribution with m degrees of
freedom converges to the density of the standard normal distribution
as m tends to infinity. Hint: Apply the dominated convergence theo-
rem (Lebesgue).
2.35. Let (Y
t
)
t
be a stationary and causal ARCH(1)-process with
|a
1
| < 1.
(i) Show that Y
2
t
= a
0


j=0
a
j
1
Z
2
t
Z
2
t−1
· · · · · Z
2
t−j
with probability
one.
(ii) Show that E(Y
2
t
) = a
0
/(1 −a
1
).
(iii) Evaluate E(Y
4
t
) and deduce that E(Z
4
1
)a
2
1
< 1 is a sufficient
condition for E(Y
4
t
) < ∞.
Hint: Theorem 2.1.5.
2.36. Determine the joint density of Y
p+1
, . . . , Y
n
for an ARCH(p)-
process Y
t
with normal distributed Z
t
given that Y
1
= y
1
, . . . , Y
p
=
y
p
. Hint: Recall that the joint density f
X,Y
of a random vector
(X, Y ) can be written in the form f
X,Y
(x, y) = f
Y |X
(y|x)f
X
(x), where
f
Y |X
(y|x) := f
X,Y
(x, y)/f
X
(x) if f
X
(x) > 0, and f
Y |X
(y|x) := f
Y
(y),
else, is the (conditional) density of Y given X = x and f
X
, f
Y
is the
(marginal) density of X, Y .
124 Chapter 2. Models of Time Series
2.37. Generate an ARCH(1)-process (Y
t
)
t
with a
0
= 1 and a
1
= 0.5.
Plot (Y
t
)
t
as well as (Y
2
t
)
t
and its partial autocorrelation function.
What is the value of the partial autocorrelation coefficient at lag 1
and lag 2? Use PROC ARIMA to estimate the parameter of the AR(1)-
process (Y
2
t
)
t
and apply the Box–Ljung test.
2.38. (Hong Kong Data) Fit an GARCH(p, q)-model to the daily
Hang Seng closing index of Hong Kong stock prices from July 16, 1981,
to September 31, 1983. Consider in particular the cases p = q = 2
and p = 3, q = 2.
2.39. (Zurich Data) The daily value of the Zurich stock index was
recorded between January 1st, 1988 and December 31st, 1988. Use
a difference filter of first order to remove a possible trend. Plot the
(trend-adjusted) data, their squares, the pertaining partial autocor-
relation function and parameter estimates. Can the squared process
be considered as an AR(1)-process?
2.40. (i) Show that the matrix Σ

−1
in Example 2.3.1 has the deter-
minant 1 −a
2
.
(ii) Show that the matrix P
n
in Example 2.3.3 has the determinant
(1 +a
2
+a
4
+· · · +a
2n
)/(1 +a
2
)
n
.
2.41. (Car Data) Apply the Box–Jenkins program to the Car Data.
2.42. Consider the two state-space models
X
t+1
= A
t
X
t
+B
t
ε
t+1
Y
t
= C
t
X
t

t
and
˜
X
t+1
=
˜
A
t
˜
X
t
+
˜
B
t
˜ ε
t+1
˜
Y
t
=
˜
C
t
˜
X
t
+ ˜ η
t
,
where (ε
T
t
, η
T
t
, ˜ ε
T
t
, ˜ η
T
t
)
T
is a white noise. Derive a state-space repre-
sentation for (Y
T
t
,
˜
Y
T
t
)
T
.
2.4 State-Space Models 125
2.43. Find the state-space representation of an ARIMA(p, d, q)-process
(Y
t
)
t
. Hint: Y
t
= ∆
d
Y
t

d
j=1
(−1)
j
_
d
d
_
Y
t−j
and consider the state
vector Z
t
:= (X
t
, Y
t−1
)
T
, where X
t
∈ R
p+q
is the state vector of the
ARMA(p, q)-process ∆
d
Y
t
and Y
t−1
:= (Y
t−d
, . . . , Y
t−1
)
T
.
2.44. Assume that the matrices A and B in the state-space model
(2.19) are independent of t and that all eigenvalues of A are in the
interior of the unit circle {z ∈ C : |z| ≤ 1}. Show that the unique
stationary solution of equation (2.19) is given by the infinite series
X
t
=


j=0
A
j

t−j+1
. Hint: The condition on the eigenvalues is
equivalent to det(I
r
− Az) = 0 for |z| ≤ 1. Show that there exists
some ε > 0 such that (I
r
−Az)
−1
has the power series representation


j=0
A
j
z
j
in the region |z| < 1 +ε.
2.45. Apply PROC STATESPACE to the simulated data of the AR(2)-
process in Exercise 2.26.
2.46. (Gas Data) Apply PROC STATESPACE to the gas data. Can
they be stationary? Compute the one-step predictors and plot them
together with the actual data.
126 Chapter 2. Models of Time Series
Chapter
3
The Frequency Domain
Approach of a Time
Series
The preceding sections focussed on the analysis of a time series in the
time domain, mainly by modelling and fitting an ARMA(p, q)-process
to stationary sequences of observations. Another approach towards
the modelling and analysis of time series is via the frequency domain:
A series is often the sum of a whole variety of cyclic components, from
which we had already added to our model (1.2) a long term cyclic one
or a short term seasonal one. In the following we show that a time
series can be completely decomposed into cyclic components. Such
cyclic components can be described by their periods and frequencies.
The period is the interval of time required for one cycle to complete.
The frequency of a cycle is its number of occurrences during a fixed
time unit; in electronic media, for example, frequencies are commonly
measured in hertz, which is the number of cycles per second, abbre-
viated by Hz. The analysis of a time series in the frequency domain
aims at the detection of such cycles and the computation of their
frequencies.
Note that in this chapter the results are formulated for any data
y
1
, . . . , y
n
, which need for mathematical reasons not to be generated
by a stationary process. Nevertheless it is reasonable to apply the
results only to realizations of stationary processes, since the empiri-
cal autocovariance function occurring below has no interpretation for
non-stationary processes, see Exercise 1.18.
128 The Frequency Domain Approach of a Time Series
3.1 Least Squares Approach with Known
Frequencies
A function f : R −→ R is said to be periodic with period P > 0
if f(t + P) = f(t) for any t ∈ R. A smallest period is called a
fundamental one. The reciprocal value λ = 1/P of a fundamental
period is the fundamental frequency. An arbitrary (time) interval of
length L consequently shows Lλ cycles of a periodic function f with
fundamental frequency λ. Popular examples of periodic functions
are sine and cosine, which both have the fundamental period P =
2π. Their fundamental frequency, therefore, is λ = 1/(2π). The
predominant family of periodic functions within time series analysis
are the harmonic components
m(t) := Acos(2πλt) +Bsin(2πλt), A, B ∈ R, λ > 0,
which have period 1/λ and frequency λ. A linear combination of
harmonic components
g(t) := µ +
r

k=1
_
A
k
cos(2πλ
k
t) +B
k
sin(2πλ
k
t)
_
, µ ∈ R,
will be named a harmonic wave of length r.
Example 3.1.1. (Star Data). To analyze physical properties of a pul-
sating star, the intensity of light emitted by this pulsar was recorded
at midnight during 600 consecutive nights. The data are taken from
Newton (1988). It turns out that a harmonic wave of length two fits
the data quite well. The following figure displays the first 160 data
y
t
and the sum of two harmonic components with period 24 and 29,
respectively, plus a constant term µ = 17.07 fitted to these data, i.e.,
˜ y
t
= 17.07 −1.86 cos(2π(1/24)t) + 6.82 sin(2π(1/24)t)
+ 6.09 cos(2π(1/29)t) + 8.01 sin(2π(1/29)t).
The derivation of these particular frequencies and coefficients will be
the content of this section and the following ones. For easier access
we begin with the case of known frequencies but unknown constants.
3.1 Least Squares Approach with Known Frequencies 129
Plot 3.1.1a: Intensity of light emitted by a pulsating star and a fitted
harmonic wave.
Model: MODEL1
Dependent Variable: LUMEN
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob >F
Model 4 48400 12100 49297.2 <.0001
Error 595 146.04384 0.24545
C Total 599 48546
Root MSE 0.49543 R-square 0.9970
Dep Mean 17.09667 Adj R-sq 0.9970
C.V. 2.89782
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Prob > |T|
130 The Frequency Domain Approach of a Time Series
Intercept 1 17.06903 0.02023 843.78 <.0001
sin24 1 6.81736 0.02867 237.81 <.0001
cos24 1 -1.85779 0.02865 -64.85 <.0001
sin29 1 8.01416 0.02868 279.47 <.0001
cos29 1 6.08905 0.02865 212.57 <.0001
Listing 3.1.1b: Regression results of fitting a harmonic wave.
1 /* star_harmonic.sas */
2 TITLE1 ’Harmonic wave ’;
3 TITLE2 ’Star Data ’;
4
5 /* Read in the data and compute harmonic waves to which data are to
→ be fitted */
6 DATA data1;
7 INFILE ’c:\data\star.txt ’;
8 INPUT lumen @@;
9 t=_N_;
10 pi=CONSTANT(’PI ’);
11 sin24=SIN(2*pi*t/24);
12 cos24=COS(2*pi*t/24);
13 sin29=SIN(2*pi*t/29);
14 cos29=COS(2*pi*t/29);
15
16 /* Compute a regression */
17 PROC REG DATA=data1;
18 MODEL lumen=sin24 cos24 sin29 cos29;
19 OUTPUT OUT=regdat P=predi;
20
21 /* Graphical options */
22 SYMBOL1 C=GREEN V=DOT I=NONE H=.4;
23 SYMBOL2 C=RED V=NONE I=JOIN;
24 AXIS1 LABEL=(ANGLE =90 ’lumen ’);
25 AXIS2 LABEL=(’t’);
26
27 /* Plot data and fitted harmonic wave */
28 PROC GPLOT DATA=regdat(OBS =160);
29 PLOT lumen*t=1 predi*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
30 RUN; QUIT;
Program 3.1.1: Fitting a harmonic wave.
The number π is generated by the SAS func-
tion CONSTANT with the argument ’PI’. It is then
stored in the variable pi. This is used to define
the variables cos24, sin24, cos29 and sin29
for the harmonic components. The other vari-
ables here are lumen read from an external file
and t generated by N .
The PROC REG statement causes SAS to make
a regression from the independent variable
lumen defined on the left side of the MODEL
statement on the harmonic components which
are on the right side. A temporary data file
named regdat is generated by the OUTPUT
statement. It contains the original variables
of the source data step and the values pre-
dicted by the regression for lumen in the vari-
able predi.
The last part of the program creates a plot of
the observed lumen values and a curve of the
predicted values restricted on the first 160 ob-
servations.
3.1 Least Squares Approach with Known Frequencies 131
The output of Program 3.1.1 (star harmonic.sas) is the standard text
output of a regression with an ANOVA table and parameter esti-
mates. For further information on how to read the output, we refer
to Chapter 3 of Falk et al. (2002).
In a first step we will fit a harmonic component with fixed frequency
λ to mean value adjusted data y
t
− ¯ y, t = 1, . . . , n. To this end, we
put with arbitrary A, B ∈ R
m(t) = Am
1
(t) +Bm
2
(t),
where
m
1
(t) := cos(2πλt), m
2
(t) = sin(2πλt).
In order to get a proper fit uniformly over all t, it is reasonable to
choose the constants A and B as minimizers of the residual sum of
squares
R(A, B) :=
n

t=1
(y
t
− ¯ y −m(t))
2
.
Taking partial derivatives of the function R with respect to A and B
and equating them to zero, we obtain that the minimizing pair A, B
has to satisfy the normal equations
Ac
11
+Bc
12
=
n

t=1
(y
t
− ¯ y) cos(2πλt)
Ac
21
+Bc
22
=
n

t=1
(y
t
− ¯ y) sin(2πλt),
where
c
ij
=
n

t=1
m
i
(t)m
j
(t).
If c
11
c
22
− c
12
c
21
= 0, the uniquely determined pair of solutions A, B
132 The Frequency Domain Approach of a Time Series
of these equations is
A = A(λ) = n
c
22
C(λ) −c
12
S(λ)
c
11
c
22
−c
12
c
21
B = B(λ) = n
c
21
C(λ) −c
11
S(λ)
c
12
c
21
−c
11
c
22
,
where
C(λ) :=
1
n
n

t=1
(y
t
− ¯ y) cos(2πλt),
S(λ) :=
1
n
n

t=1
(y
t
− ¯ y) sin(2πλt) (3.1)
are the empirical (cross-) covariances of (y
t
)
1≤t≤n
and (cos(2πλt))
1≤t≤n
and of (y
t
)
1≤t≤n
and (sin(2πλt))
1≤t≤n
, respectively. As we will see,
these cross-covariances C(λ) and S(λ) are fundamental to the analy-
sis of a time series in the frequency domain.
The solutions A and B become particularly simple in the case of
Fourier frequencies λ = k/n, k = 0, 1, 2, . . . , [n/2], where [x] denotes
the greatest integer less than or equal to x ∈ R. If k = 0 and k = n/2
in the case of an even sample size n, then we obtain from (3.2) below
that c
12
= c
21
= 0 and c
11
= c
22
= n/2 and thus
A = 2C(λ), B = 2S(λ).
Harmonic Waves with Fourier Frequencies
Next we will fit harmonic waves to data y
1
, . . . , y
n
, where we restrict
ourselves to Fourier frequencies, which facilitates the computations.
The following lemma will be crucial.
Lemma 3.1.2. For arbitrary 0 ≤ k, m ≤ [n/2] we have
3.1 Least Squares Approach with Known Frequencies 133
n

t=1
cos
_

k
n
t
_
cos
_

m
n
t
_
=
_
¸
_
¸
_
n, k = m = 0 or n/2, if n is even
n/2, k = m = 0 and = n/2, if n is even
0, k = m
n

t=1
sin
_

k
n
t
_
sin
_

m
n
t
_
=
_
¸
_
¸
_
0, k = m = 0 or n/2, if n is even
n/2, k = m = 0 and = n/2, if n is even
0, k = m
n

t=1
cos
_

k
n
t
_
sin
_

m
n
t
_
= 0.
Proof. Exercise 3.3.
The above lemma implies that the 2[n/2] + 1 vectors in R
n
(sin(2π(k/n)t))
1≤t≤n
, k = 1, . . . , [n/2],
and
(cos(2π(k/n)t))
1≤t≤n
, k = 0, . . . , [n/2],
span the space R
n
. Note that by Lemma 3.1.2 in the case of n odd
the above 2[n/2] + 1 = n vectors are linearly independent, whereas
in the case of an even sample size n the vector (sin(2π(k/n)t))
1≤t≤n
with k = n/2 is the null vector (0, . . . , 0) and the remaining n vectors
are again linearly independent. As a consequence we obtain that for a
given set of data y
1
, . . . , y
n
, there exist in any case uniquely determined
coefficients A
k
and B
k
, k = 0, . . . , [n/2], with B
0
:= 0 such that
y
t
=
[n/2]

k=0
_
A
k
cos
_

k
n
t
_
+B
k
sin
_

k
n
t
_
_
, t = 1, . . . , n. (3.2)
Next we determine these coefficients A
k
, B
k
. They are obviously min-
imizers of the residual sum of squares
R :=
n

t=1
_
y
t

[n/2]

k=0
_
α
k
cos
_

k
n
t
_

k
sin
_

k
n
t
_
_
_
2
134 The Frequency Domain Approach of a Time Series
with respect to α
k
, β
k
. Taking partial derivatives, equating them to
zero and taking into account the linear independence of the above
vectors, we obtain that these minimizers solve the normal equations
n

t=1
y
t
cos
_

k
n
t
_
= α
k
n

t=1
cos
2
_

k
n
t
_
, k = 0, . . . , [n/2]
n

t=1
y
t
sin
_

k
n
t
_
= β
k
n

t=1
sin
2
_

k
n
t
_
, k = 1, . . . , [(n −1)/2].
The solutions of this system are by Lemma 3.1.2 given by
A
k
=
_
_
_
2
n

n
t=1
y
t
cos
_

k
n
t
_
, k = 1, . . . , [(n −1)/2]
1
n

n
t=1
y
t
cos
_

k
n
t
_
, k = 0 and k = n/2, if n is even
B
k
=
2
n
n

t=1
y
t
sin
_

k
n
t
_
, k = 1, . . . , [(n −1)/2]. (3.3)
One can substitute A
k
, B
k
in R to obtain directly that R = 0 in this
case. A popular equivalent formulation of (3.3) is
y
t
=
˜
A
0
2
+
[n/2]

k=1
_
A
k
cos
_

k
n
t
_
+B
k
sin
_

k
n
t
_
_
, t = 1, . . . , n,
(3.4)
with A
k
, B
k
as in (3.3) for k = 1, . . . , [n/2], B
n/2
= 0 for an even n,
and
˜
A
0
= 2A
0
=
2
n
n

t=1
y
t
= 2¯ y.
Up to the factor 2, the coefficients A
k
, B
k
coincide with the empirical
covariances C(k/n) and S(k/n), k = 1, . . . , [(n − 1)/2], defined in
(3.1). This follows from the equations (Exercise 3.2)
n

t=1
cos
_

k
n
t
_
=
n

t=1
sin
_

k
n
t
_
= 0, k = 1, . . . , [n/2]. (3.5)
3.2 The Periodogram 135
3.2 The Periodogram
In the preceding section we exactly fitted a harmonic wave with Fourier
frequencies λ
k
= k/n, k = 0, . . . , [n/2], to a given series y
1
, . . . , y
n
. Ex-
ample 3.1.1 shows that a harmonic wave including only two frequen-
cies already fits the Star Data quite well. There is a general tendency
that a time series is governed by different frequencies λ
1
, . . . , λ
r
with
r < [n/2], which on their part can be approximated by Fourier fre-
quencies k
1
/n, . . . , k
r
/n if n is sufficiently large. The question which
frequencies actually govern a time series leads to the intensity of a
frequency λ. This number reflects the influence of the harmonic com-
ponent with frequency λ on the series. The intensity of the Fourier
frequency λ = k/n, 1 ≤ k ≤ [n/2], is defined via its residual sum of
squares. We have by Lemma 3.1.2, (3.5) and the normal equations
n

t=1
_
y
t
− ¯ y −A
k
cos
_

k
n
t
_
−B
k
sin
_

k
n
t
_
_
2
=
n

t=1
(y
t
− ¯ y)
2

n
2
_
A
2
k
+B
2
k
_
, k = 1, . . . , [(n −1)/2],
and
n

t=1
(y
t
− ¯ y)
2
=
n
2
[n/2]

k=1
_
A
2
k
+B
2
k
_
.
The number (n/2)(A
2
k
+B
2
k
) = 2n(C
2
(k/n)+S
2
(k/n)) is therefore the
contribution of the harmonic component with Fourier frequency k/n,
k = 1, . . . , [(n−1)/2], to the total variation

n
t=1
(y
t
−¯ y)
2
. It is called
the intensity of the frequency k/n. Further insight is gained from the
Fourier analysis in Theorem 3.2.4. For general frequencies λ ∈ R we
define its intensity now by
I(λ) = n
_
C(λ)
2
+S(λ)
2
_
=
1
n
_
_
n

t=1
(y
t
− ¯ y) cos(2πλt)
_
2
+
_
n

t=1
(y
t
− ¯ y) sin(2πλt)
_
2
_
.
(3.6)
This function is called the periodogram. The following Theorem im-
plies in particular that it is sufficient to define the periodogram on the
136 The Frequency Domain Approach of a Time Series
interval [0, 1]. For Fourier frequencies we obtain from (3.3) and (3.5)
I(k/n) =
n
4
_
A
2
k
+B
2
k
_
, k = 1, . . . , [(n −1)/2].
Theorem 3.2.1. We have
1. I(0) = 0,
2. I is an even function, i.e., I(λ) = I(−λ) for any λ ∈ R,
3. I has the period 1.
Proof. Part (i) follows from sin(0) = 0 and cos(0) = 1, while (ii) is a
consequence of cos(−x) = cos(x), sin(−x) = −sin(x), x ∈ R. Part
(iii) follows from the fact that 2π is the fundamental period of sin and
cos.
Theorem 3.2.1 implies that the function I(λ) is completely determined
by its values on [0, 0.5]. The periodogram is often defined on the scale
[0, 2π] instead of [0, 1] by putting I

(ω) := 2I(ω/(2π)); this version is,
for example, used in SAS. In view of Theorem 3.2.4 below we prefer
I(λ), however.
The following figure displays the periodogram of the Star Data from
Example 3.1.1. It has two obvious peaks at the Fourier frequencies
21/600 = 0.035 ≈ 1/28.57 and 25/600 = 1/24 ≈ 0.04167. This
indicates that essentially two cycles with period 24 and 28 or 29 are
inherent in the data. A least squares approach for the determination of
the coefficients A
i
, B
i
, i = 1, 2 with frequencies 1/24 and 1/29 as done
in Program 3.1.1 (star harmonic.sas) then leads to the coefficients in
Example 3.1.1.
3.2 The Periodogram 137
Plot 3.2.1a: Periodogram of the Star Data.
--------------------------------------------------------------
COS_01
34.1933
--------------------------------------------------------------
PERIOD COS_01 SIN_01 P LAMBDA
28.5714 -0.91071 8.54977 11089.19 0.035000
24.0000 -0.06291 7.73396 8972.71 0.041667
30.0000 0.42338 -3.76062 2148.22 0.033333
27.2727 -0.16333 2.09324 661.25 0.036667
31.5789 0.20493 -1.52404 354.71 0.031667
26.0870 -0.05822 1.18946 212.73 0.038333
Listing 3.2.1b: The constant
˜
A
0
= 2A
0
= 2¯ y and the six Fourier
frequencies λ = k/n with largest I(k/n)-values, their inverses and the
Fourier coefficients pertaining to the Star Data.
1 /* star_periodogram.sas */
2 TITLE1 ’Periodogram ’;
3 TITLE2 ’Star Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\star.txt ’;
8 INPUT lumen @@;
138 The Frequency Domain Approach of a Time Series
9
10 /* Compute the periodogram */
11 PROC SPECTRA DATA=data1 COEF P OUT=data2;
12 VAR lumen;
13
14 /* Adjusting different periodogram definitions */
15 DATA data3;
16 SET data2(FIRSTOBS =2);
17 p=P_01 /2;
18 lambda=FREQ /(2* CONSTANT(’PI ’));
19 DROP P_01 FREQ;
20
21 /* Graphical options */
22 SYMBOL1 V=NONE C=GREEN I=JOIN;
23 AXIS1 LABEL=(’I(’ F=CGREEK ’l)’);
24 AXIS2 LABEL=(F=CGREEK ’l’);
25
26 /* Plot the periodogram */
27 PROC GPLOT DATA=data3(OBS =50);
28 PLOT p*lambda =1 / VAXIS=AXIS1 HAXIS=AXIS2;
29
30 /* Sort by periodogram values */
31 PROC SORT DATA=data3 OUT=data4;
32 BY DESCENDING p;
33
34 /* Print largest periodogram values */
35 PROC PRINT DATA=data2(OBS=1) NOOBS;
36 VAR COS_01;
37 PROC PRINT DATA=data4(OBS=6) NOOBS;
38 RUN; QUIT;
Program 3.2.1: Computing and plotting the periodogram.
The first step is to read the star data from an ex-
ternal file into a data set. Using the SAS proce-
dure SPECTRA with the options P (periodogram),
COEF (Fourier coefficients), OUT=data2 and the
VAR statement specifying the variable of in-
terest, an output data set is generated. It
contains periodogram data P 01 evaluated at
the Fourier frequencies, a FREQ variable for
this frequencies, the pertaining period in the
variable PERIOD and the variables COS 01 and
SIN 01 with the coefficients for the harmonic
waves. Because SAS uses different definitions
for the frequencies and the periodogram, here
in data3 new variables lambda (dividing FREQ
by 2π to eliminate an additional factor 2π) and
p (dividing P 01 by 2) are created and the no
more needed ones are dropped. By means of
the data set option FIRSTOBS=2 the first obser-
vation of data2 is excluded from the resulting
data set.
The following PROC GPLOT just takes the first
50 observations of data3 into account. This
means a restriction of lambda up to 50/600 =
1/12, the part with the greatest peaks in the pe-
riodogram.
The procedure SORT generates a new data set
data4 containing the same observations as the
input data set data3, but they are sorted in de-
scending order of the periodogram values. The
two PROC PRINT statements at the end make
SAS print to datasets data2 and data4.
3.2 The Periodogram 139
The first part of the output is the coefficient
˜
A
0
which is equal to
two times the mean of the lumen data. The results for the Fourier
frequencies with the six greatest periodogram values constitute the
second part of the output. Note that the meaning of COS 01 and
SIN 01 are slightly different from the definitions of A
k
and B
k
in
(3.3), because SAS lets the index run from 0 to n −1 instead of 1 to
n.
The Fourier Transform
From Euler’s equation e
iz
= cos(z) + i sin(z), z ∈ R, we obtain for
λ ∈ R
D(λ) := C(λ) −iS(λ) =
1
n
n

t=1
(y
t
− ¯ y)e
−i2πλt
.
The periodogram is a function of D(λ), since I(λ) = n|D(λ)|
2
. Unlike
the periodogram, the number D(λ) contains the complete information
about C(λ) and S(λ), since both values can be recovered from the
complex number D(λ), being its real and negative imaginary part. In
the following we view the data y
1
, . . . , y
n
again as a clipping from an
infinite series y
t
, t ∈ Z. Let a := (a
t
)
t∈Z
be an absolutely summable
sequence of real numbers. For such a sequence a the complex valued
function
f
a
(λ) =

t∈Z
a
t
e
−i2πλt
, λ ∈ R,
is said to be its Fourier transform. It links the empirical autoco-
variance function to the periodogram as it will turn out in Theorem
3.2.3 that the latter is the Fourier transform of the first. Note that

t∈Z
|a
t
e
−i2πλt
| =

t∈Z
|a
t
| < ∞, since |e
ix
| = 1 for any x ∈ R. The
Fourier transform of a
t
= (y
t
− ¯ y)/n, t = 1, . . . , n, and a
t
= 0 else-
where is then given by D(λ). The following elementary properties of
the Fourier transform are immediate consequences of the arguments
in the proof of Theorem 3.2.1. In particular we obtain that the Fourier
transform is already determined by its values on [0, 0.5].
Theorem 3.2.2. We have
1. f
a
(0) =

t∈Z
a
t
,
140 The Frequency Domain Approach of a Time Series
2. f
a
(−λ) and f
a
(λ) are conjugate complex numbers i.e., f
a
(−λ) =
f
a
(λ),
3. f
a
has the period 1.
Autocorrelation Function and Periodogram
Information about cycles that are inherent in given data, can also be
deduced from the empirical autocorrelation function. The following
figure displays the autocorrelation function of the Bankruptcy Data,
introduced in Exercise 1.17.
Plot 3.2.2a: Autocorrelation function of the Bankruptcy Data.
1 /* bankruptcy_correlogram .sas */
2 TITLE1 ’Correlogram ’;
3 TITLE2 ’Bankruptcy Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt ’;
3.2 The Periodogram 141
8 INPUT year bankrupt;
9
10 /* Compute autocorrelation function */
11 PROC ARIMA DATA=data1;
12 IDENTIFY VAR=bankrupt NLAG =64 OUTCOV=corr NOPRINT;
13
14 /* Graphical options */
15 AXIS1 LABEL=(’r(k)’);
16 AXIS2 LABEL=(’k’);
17 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.4 W=1;
18
19 /* Plot auto correlation function */
20 PROC GPLOT DATA=corr;
21 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF =0;
22 RUN; QUIT;
Program 3.2.2: Computing the autocorrelation function.
After reading the data from an external file into
a data step, the procedure ARIMA calculates the
empirical autocorrelation function and stores
them into a new data set. The correlogram is
generated using PROC GPLOT.
The next figure displays the periodogram of the Bankruptcy Data.
142 The Frequency Domain Approach of a Time Series
Plot 3.2.3a: Periodogram of the Bankruptcy Data.
1 /* bankruptcy_periodogram .sas */
2 TITLE1 ’Periodogram ’;
3 TITLE2 ’Bankruptcy Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt ’;
8 INPUT year bankrupt;
9
10 /* Compute the periodogram */
11 PROC SPECTRA DATA=data1 P OUT=data2;
12 VAR bankrupt;
13
14 /* Adjusting different periodogram definitions */
15 DATA data3;
16 SET data2(FIRSTOBS =2);
17 p=P_01 /2;
18 lambda=FREQ /(2* CONSTANT(’PI ’));
19
20 /* Graphical options */
21 SYMBOL1 V=NONE C=GREEN I=JOIN;
22 AXIS1 LABEL=(’I’ F=CGREEK ’(l)’) ;
23 AXIS2 ORDER =(0 TO 0.5 BY 0.05) LABEL=(F=CGREEK ’l’);
24
25 /* Plot the periodogram */
26 PROC GPLOT DATA=data3;
27 PLOT p*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
3.2 The Periodogram 143
28 RUN; QUIT;
Program 3.2.3: Computing the periodogram.
This program again first reads the data
and then starts a spectral analysis by
PROC SPECTRA. Due to the reasons men-
tioned in the comments to Program 3.2.1
(star periodogram.sas) there are some trans-
formations of the periodogram and the fre-
quency values generated by PROC SPECTRA
done in data3. The graph results from the
statements in PROC GPLOT.
The autocorrelation function of the Bankruptcy Data has extreme
values at about multiples of 9 years, which indicates a period of length
9. This is underlined by the periodogram in Plot 3.2.3a, which has
a peak at λ = 0.11, corresponding to a period of 1/0.11 ∼ 9 years
as well. As mentioned above, there is actually a close relationship
between the empirical autocovariance function and the periodogram.
The corresponding result for the theoretical autocovariances is given
in Chapter 4.
Theorem 3.2.3. Denote by c the empirical autocovariance function
of y
1
, . . . , y
n
, i.e., c(k) = n
−1

n−k
j=1
(y
j
−¯ y)(y
j+k
−¯ y), k = 0, . . . , n−1,
where ¯ y := n
−1

n
j=1
y
j
. Then we have with c(−k) := c(k)
I(λ) = c(0) + 2
n−1

k=1
c(k) cos(2πλk)
=
n−1

k=−(n−1)
c(k)e
−i2πλk
.
Proof. From the equation cos(x
1
) cos(x
2
) +sin(x
1
) sin(x
2
) = cos(x
1

x
2
) for x
1
, x
2
∈ R we obtain
I(λ) =
1
n
n

s=1
n

t=1
(y
s
− ¯ y)(y
t
− ¯ y)
×
_
cos(2πλs) cos(2πλt) + sin(2πλs) sin(2πλt)
_
=
1
n
n

s=1
n

t=1
a
st
,
144 The Frequency Domain Approach of a Time Series
where a
st
:= (y
s
− ¯ y)(y
t
− ¯ y) cos(2πλ(s − t)). Since a
st
= a
ts
and
cos(0) = 1 we have moreover
I(λ) =
1
n
n

t=1
a
tt
+
2
n
n−1

k=1
n−k

j=1
a
jj+k
=
1
n
n

t=1
(y
t
− ¯ y)
2
+ 2
n−1

k=1
_
1
n
n−k

j=1
(y
j
− ¯ y)(y
j+k
− ¯ y)
_
cos(2πλk)
= c(0) + 2
n−1

k=1
c(k) cos(2πλk).
The complex representation of the periodogram is then obvious:
n−1

k=−(n−1)
c(k)e
−i2πλk
= c(0) +
n−1

k=1
c(k)
_
e
i2πλk
+e
−i2πλk
_
= c(0) +
n−1

k=1
c(k)2 cos(2πλk) = I(λ).
Inverse Fourier Transform
The empirical autocovariance function can be recovered from the pe-
riodogram, which is the content of our next result. Since the pe-
riodogram is the Fourier transform of the empirical autocovariance
function, this result is a special case of the inverse Fourier transform
in Theorem 3.2.5 below.
Theorem 3.2.4. The periodogram
I(λ) =
n−1

k=−(n−1)
c(k)e
−i2πλk
, λ ∈ R,
satisfies the inverse formula
c(k) =
_
1
0
I(λ)e
i2πλk
dλ, |k| ≤ n −1.
3.2 The Periodogram 145
In particular for k = 0 we obtain
c(0) =
_
1
0
I(λ) dλ.
The sample variance c(0) = n
−1

n
j=1
(y
j
− ¯ y)
2
equals, therefore, the
area under the curve I(λ), 0 ≤ λ ≤ 1. The integral
_
λ
2
λ
1
I(λ) dλ
can be interpreted as that portion of the total variance c(0), which
is contributed by the harmonic waves with frequencies λ ∈ [λ
1
, λ
2
],
where 0 ≤ λ
1
< λ
2
≤ 1. The periodogram consequently shows the
distribution of the total variance among the frequencies λ ∈ [0, 1]. A
peak of the periodogram at a frequency λ
0
implies, therefore, that a
large part of the total variation c(0) of the data can be explained by
the harmonic wave with that frequency λ
0
.
The following result is the inverse formula for general Fourier trans-
forms.
Theorem 3.2.5. For an absolutely summable sequence a := (a
t
)
t∈Z
with Fourier transform f
a
(λ) =

t∈Z
a
t
e
−i2πλt
, λ ∈ R, we have
a
t
=
_
1
0
f
a
(λ)e
i2πλt
dλ, t ∈ Z.
Proof. The dominated convergence theorem implies
_
1
0
f
a
(λ)e
i2πλt
dλ =
_
1
0
_

s∈Z
a
s
e
−i2πλs
_
e
i2πλt

=

s∈Z
a
s
_
1
0
e
i2πλ(t−s)
dλ = a
t
,
since
_
1
0
e
i2πλ(t−s)
dλ =
_
1, if s = t
0, if s = t.
The inverse Fourier transformation shows that the complete sequence
(a
t
)
t∈Z
can be recovered from its Fourier transform. This implies
146 The Frequency Domain Approach of a Time Series
in particular that the Fourier transforms of absolutely summable se-
quences are uniquely determined. The analysis of a time series in
the frequency domain is, therefore, equivalent to its analysis in the
time domain, which is based on an evaluation of its autocovariance
function.
Aliasing
Suppose that we observe a continuous time process (Z
t
)
t∈R
only through
its values at k∆, k ∈ Z, where ∆ > 0 is the sampling interval,
i.e., we actually observe (Y
k
)
k∈Z
= (Z
k∆
)
k∈Z
. Take, for example,
Z
t
:= cos(2π(9/11)t), t ∈ R. The following figure shows that at
k ∈ Z, i.e., ∆ = 1, the observations Z
k
coincide with X
k
, where
X
t
:= cos(2π(2/11)t), t ∈ R. With the sampling interval ∆ = 1,
the observations Z
k
with high frequency 9/11 can, therefore, not be
distinguished from the X
k
, which have low frequency 2/11.
Plot 3.2.4a: Aliasing of cos(2π(9/11)k) and cos(2π(2/11)k).
3.2 The Periodogram 147
1 /* aliasing.sas */
2 TITLE1 ’Aliasing ’;
3
4 /* Generate harmonic waves */
5 DATA data1;
6 DO t=1 TO 14 BY .01;
7 y1=COS(2* CONSTANT(’PI ’) *2/11*t);
8 y2=COS(2* CONSTANT(’PI ’) *9/11*t);
9 OUTPUT;
10 END;
11
12 /* Generate points of intersection */
13 DATA data2;
14 DO t0=1 TO 14;
15 y0=COS(2* CONSTANT(’PI ’) *2/11* t0);
16 OUTPUT;
17 END;
18
19 /* Merge the data sets */
20 DATA data3;
21 MERGE data1 data2;
22
23 /* Graphical options */
24 SYMBOL1 V=DOT C=GREEN I=NONE H=.8;
25 SYMBOL2 V=NONE C=RED I=JOIN;
26 AXIS1 LABEL=NONE;
27 AXIS2 LABEL=(’t’);
28
29 /* Plot the curves with point of intersection */
30 PROC GPLOT DATA=data3;
31 PLOT y0*t0=1 y1*t=2 y2*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 VREF
→=0;
32 RUN; QUIT;
Program 3.2.4: Aliasing.
In the first data step a tight grid for the cosine
waves with frequencies 2/11 and 9/11 is gen-
erated. In the second data step the values of
the cosine wave with frequency 2/11 are gen-
erated just for integer values of t symbolizing
the observation points.
After merging the two data sets the two waves
are plotted using the JOIN option in the SYMBOL
statement while the values at the observation
points are displayed in the same graph by dot
symbols.
This phenomenon that a high frequency component takes on the val-
ues of a lower one, is called aliasing. It is caused by the choice of the
sampling interval ∆, which is 1 in Plot 3.2.4a. If a series is not just a
constant, then the shortest observable period is 2∆. The highest ob-
servable frequency λ

with period 1/λ

, therefore, satisfies 1/λ

≥ 2∆,
148 The Frequency Domain Approach of a Time Series
i.e., λ

≤ 1/(2∆). This frequency 1/(2∆) is known as the Nyquist fre-
quency. The sampling interval ∆ should, therefore, be chosen small
enough, so that 1/(2∆) is above the frequencies under study. If, for
example, a process is a harmonic wave with frequencies λ
1
, . . . , λ
p
,
then ∆ should be chosen such that λ
i
≤ 1/(2∆), 1 ≤ i ≤ p, in or-
der to visualize p different frequencies. For the periodic curves in
Plot 3.2.4a this means to choose 9/11 ≤ 1/(2∆) or ∆ ≤ 11/18.
Exercises
3.1. Let y(t) = Acos(2πλt) +Bsin(2πλt) be a harmonic component.
Show that y can be written as y(t) = αcos(2πλt −ϕ), where α is the
amplitiude, i.e., the maximum departure of the wave from zero and ϕ
is the phase displacement.
3.2. Show that
n

t=1
cos(2πλt) =
_
n, λ ∈ Z
cos(πλ(n + 1))
sin(πλn)
sin(πλ)
, λ ∈ Z
n

t=1
sin(2πλt) =
_
0, λ ∈ Z
sin(πλ(n + 1))
sin(πλn)
sin(πλ)
, λ ∈ Z.
Hint: Compute

n
t=1
e
i2πλt
, where e

= cos(ϕ) + i sin(ϕ) is the com-
plex valued exponential function.
3.3. Verify Lemma 3.1.2. Hint: Exercise 3.2.
3.4. Suppose that the time series (y
t
)
t
satisfies the additive model
with seasonal component
s(t) =
s

k=1
A
k
cos
_

k
s
t
_
+
s

k=1
B
k
sin
_

k
s
t
_
.
Show that s(t) is eliminated by the seasonal differencing ∆
s
y
t
= y
t

y
t−s
.
3.5. Fit a harmonic component with frequency λ to a time series
y
1
, . . . , y
N
, where λ ∈ Z and λ −0.5 ∈ Z. Compute the least squares
estimator and the pertaining residual sum of squares.
3.2 The Periodogram 149
3.6. Put y
t
= t, t = 1, . . . , n. Show that
I(k/n) =
n
4 sin
2
(πk/n)
, k = 1, . . . , [(n −1)/2].
Hint: Use the equations
n−1

t=1
t sin(θt) =
sin(nθ)
4 sin
2
(θ/2)

ncos((n −0.5)θ)
2 sin(θ/2)
n−1

t=1
t cos(θt) =
nsin((n −0.5)θ)
2 sin(θ/2)

1 −cos(nθ)
4 sin
2
(θ/2)
.
3.7. (Unemployed1 Data) Plot the periodogram of the first order dif-
ferences of the numbers of unemployed in the building trade as intro-
duced in Example 1.1.1.
3.8. (Airline Data) Plot the periodogram of the variance stabilized
and trend adjusted Airline Data, introduced in Example 1.3.1. Add
a seasonal adjustment and compare the periodograms.
3.9. The contribution of the autocovariance c(k), k ≥ 1, to the pe-
riodogram can be illustrated by plotting the functions ±cos(2πλk),
λ ∈ [0.5].
(i) Which conclusion about the intensities of large or small frequen-
cies can be drawn from a positive value c(1) > 0 or a negative
one c(1) < 0?
(ii) Which effect has an increase of |c(2)| if all other parameters
remain unaltered?
(iii) What can you say about the effect of an increase of c(k) on the
periodogram at the values 40, 1/k, 2/k, 3/k, . . . and the inter-
mediate values 1/2k, 3/(2k), 5/(2k)? Illustrate the effect at a
time series with seasonal component k = 12.
3.10. Establish a version of the inverse Fourier transform in real
terms.
3.11. Let a = (a
t
)
t∈Z
and b = (b
t
)
t∈Z
be absolute summable se-
quences.
150 The Frequency Domain Approach of a Time Series
(i) Show that for αa +βb := (αa
t
+βb
t
)
t∈Z
, α, β ∈ R,
f
αa+βb
(λ) = αf
a
(λ) +βf
b
(λ).
(ii) For ab := (a
t
b
t
)
t∈Z
we have
f
ab
(λ) = f
a
∗ f
b
(λ) :=
_
1
0
f
a
(µ)f
b
(λ −µ) dµ.
(iii) Show that for a ∗ b := (

s∈Z
a
s
b
t−s
)
t∈Z
(convolution)
f
a∗b
(λ) = f
a
(λ)f
b
(λ).
3.12. (Fast Fourier Transform (FFT)) The Fourier transform of a fi-
nite sequence a
0
, a
1
, . . . , a
N−1
can be represented under suitable con-
ditions as the composition of Fourier transforms. Put
f(s/N) =
N−1

t=0
a
t
e
−i2πst/N
, s = 0, . . . , N −1,
which is the Fourier transform of length N. Suppose that N = KM
with K, M ∈ N. Show that f can be represented as Fourier transform
of length K, computed for a Fourier transform of length M.
Hint: Each t, s ∈ {0, . . . , N −1} can uniquely be written as
t = t
0
+t
1
K, t
0
∈ {0, . . . , K −1}, t
1
∈ {0, . . . , M −1}
s = s
0
+s
1
M, s
0
∈ {0, . . . , M −1}, s
1
∈ {0, . . . , K −1}.
Sum over t
0
and t
1
.
3.13. (Star Data) Suppose that the Star Data are only observed
weekly (i.e., keep only every seventh observation). Is an aliasing effect
observable?
Chapter
4
The Spectrum of a
Stationary Process
In this chapter we investigate the spectrum of a real valued station-
ary process, which is the Fourier transform of its (theoretical) auto-
covariance function. Its empirical counterpart, the periodogram, was
investigated in the preceding sections, cf. Theorem 3.2.3.
Let (Y
t
)
t∈Z
be a (real valued) stationary process with absolutely sum-
mable autocovariance function γ(t), t ∈ Z. Its Fourier transform
f(λ) :=

t∈Z
γ(t)e
−i2πλt
= γ(0) + 2

t∈N
γ(t) cos(2πλt), λ ∈ R,
is called spectral density or spectrum of the process (Y
t
)
t∈Z
. By the
inverse Fourier transform in Theorem 3.2.5 we have
γ(t) =
_
1
0
f(λ)e
i2πλt
dλ =
_
1
0
f(λ) cos(2πλt) dλ.
For t = 0 we obtain
γ(0) =
_
1
0
f(λ) dλ,
which shows that the spectrum is a decomposition of the variance
γ(0). In Section 4.3 we will in particular compute the spectrum of
an ARMA-process. As a preparatory step we investigate properties
of spectra for arbitrary absolutely summable filters.
152 The Spectrum of a Stationary Process
4.1 Characterizations of Autocovariance Func-
tions
Recall that the autocovariance function γ : Z → R of a stationary
process (Y
t
)
t∈Z
is given by
γ(h) = E(Y
t+h
Y
t
) −E(Y
t+h
) E(Y
t
), h ∈ Z,
with the properties
γ(0) ≥ 0, |γ(h)| ≤ γ(0), γ(h) = γ(−h), h ∈ Z. (4.1)
The following result characterizes an autocovariance function in terms
of positive semidefiniteness.
Theorem 4.1.1. A symmetric function K : Z → R is the autoco-
variance function of a stationary process (Y
t
)
t∈Z
iff K is a positive
semidefinite function, i.e., K(−n) = K(n) and

1≤r,s≤n
x
r
K(r −s)x
s
≥ 0 (4.2)
for arbitrary n ∈ N and x
1
, . . . , x
n
∈ R.
Proof. It is easy to see that (4.2) is a necessary condition for K to be
the autocovariance function of a stationary process, see Exercise 4.19.
It remains to show that (4.2) is sufficient, i.e., we will construct a
stationary process, whose autocovariance function is K.
We will define a family of finite-dimensional normal distributions,
which satisfies the consistency condition of Kolmogorov’s theorem, cf.
Theorem 1.2.1 in Brockwell and Davies (1991). This result implies the
existence of a process (V
t
)
t∈Z
, whose finite dimensional distributions
coincide with the given family.
Define the n ×n-matrix
K
(n)
:=
_
K(r −s)
_
1≤r,s≤n
,
which is positive semidefinite. Consequently there exists an n-dimen-
sional normal distributed random vector (V
1
, . . . , V
n
) with mean vector
4.1 Characterizations of Autocovariance Functions 153
zero and covariance matrix K
(n)
. Define now for each n ∈ N and t ∈ Z
a distribution function on R
n
by
F
t+1,...,t+n
(v
1
, . . . , v
n
) := P{V
1
≤ v
1
, . . . , V
n
≤ v
n
}.
This defines a family of distribution functions indexed by consecutive
integers. Let now t
1
< · · · < t
m
be arbitrary integers. Choose t ∈ Z
and n ∈ N such that t
i
= t + n
i
, where 1 ≤ n
1
< · · · < n
m
≤ n. We
define now
F
t
1
,...,t
m
((v
i
)
1≤i≤m
) := P{V
n
i
≤ v
i
, 1 ≤ i ≤ m}.
Note that F
t
1
,...,t
m
does not depend on the special choice of t and n
and thus, we have defined a family of distribution functions indexed
by t
1
< · · · < t
m
on R
m
for each m ∈ N, which obviously satisfies the
consistency condition of Kolmogorov’s theorem. This result implies
the existence of a process (V
t
)
t∈Z
, whose finite dimensional distribution
at t
1
< · · · < t
m
has distribution function F
t
1
,...,t
m
. This process
has, therefore, mean vector zero and covariances E(V
t+h
V
t
) = K(h),
h ∈ Z.
Spectral Distribution Function and Spectral Density
The preceding result provides a characterization of an autocovariance
function in terms of positive semidefiniteness. The following char-
acterization of positive semidefinite functions is known as Herglotz’s
theorem. We use in the following the notation
_
1
0
g(λ) dF(λ) in place
of
_
(0,1]
g(λ) dF(λ).
Theorem 4.1.2. A symmetric function γ : Z → R is positive semi-
definite iff it can be represented as an integral
γ(h) =
_
1
0
e
i2πλh
dF(λ) =
_
1
0
cos(2πλh) dF(λ), h ∈ Z, (4.3)
where F is a real valued measure generating function on [0, 1] with
F(0) = 0. The function F is uniquely determined.
The uniquely determined function F, which is a right-continuous, in-
creasing and bounded function, is called the spectral distribution func-
tion of γ. If F has a derivative f and, thus, F(λ) = F(λ) − F(0) =
154 The Spectrum of a Stationary Process
_
λ
0
f(x) dx for 0 ≤ λ ≤ 1, then f is called the spectral density of γ.
Note that the property

h≥0
|γ(h)| < ∞already implies the existence
of a spectral density of γ, cf. Theorem 3.2.5.
Recall that γ(0) =
_
1
0
dF(λ) = F(1) and thus, the autocorrelation
function ρ(h) = γ(h)/γ(0) has the above integral representation, but
with F replaced by the distribution function F/γ(0).
Proof of Theorem 4.1.2. We establish first the uniqueness of F. Let
G be another measure generating function with G(λ) = 0 for λ ≤ 0
and constant for λ ≥ 1 such that
γ(h) =
_
1
0
e
i2πλh
dF(λ) =
_
1
0
e
i2πλh
dG(λ), h ∈ Z.
Let now ψ be a continuous function on [0, 1]. From calculus we know
(cf. Section 4.24 in Rudin (1974)) that we can find for arbitrary ε > 0
a trigonometric polynomial p
ε
(λ) =

N
h=−N
a
h
e
i2πλh
, 0 ≤ λ ≤ 1, such
that
sup
0≤λ≤1
|ψ(λ) −p
ε
(λ)| ≤ ε.
As a consequence we obtain that
_
1
0
ψ(λ) dF(λ) =
_
1
0
p
ε
(λ) dF(λ) +r
1
(ε)
=
_
1
0
p
ε
(λ) dG(λ) +r
1
(ε)
=
_
1
0
ψ(λ) dG(λ) +r
2
(ε),
where r
i
(ε) →0 as ε →0, i = 1, 2, and, thus,
_
1
0
ψ(λ) dF(λ) =
_
1
0
ψ(λ) dG(λ).
Since ψ was an arbitrary continuous function, this in turn together
with F(0) = G(0) = 0 implies F = G.
4.1 Characterizations of Autocovariance Functions 155
Suppose now that γ has the representation (4.3). We have for arbi-
trary x
i
∈ R, i = 1, . . . , n

1≤r,s≤n
x
r
γ(r −s)x
s
=
_
1
0

1≤r,s≤n
x
r
x
s
e
i2πλ(r−s)
dF(λ)
=
_
1
0
¸
¸
¸
n

r=1
x
r
e
i2πλr
¸
¸
¸
2
dF(λ) ≥ 0,
i.e., γ is positive semidefinite.
Suppose conversely that γ : Z →R is a positive semidefinite function.
This implies that for 0 ≤ λ ≤ 1 and N ∈ N (Exercise 4.2)
f
N
(λ) : =
1
N

1≤r,s≤N
e
−i2πλr
γ(r −s)e
i2πλs
=
1
N

|m|<N
(N −|m|)γ(m)e
−i2πλm
≥ 0.
Put now
F
N
(λ) :=
_
λ
0
f
N
(x) dx, 0 ≤ λ ≤ 1.
Then we have for each h ∈ Z
_
1
0
e
i2πλh
dF
N
(λ) =

|m|<N
_
1 −
|m|
N
_
γ(m)
_
1
0
e
i2πλ(h−m)

=
_
_
1 −
|h|
N
_
γ(h), if |h| < N
0, if |h| ≥ N.
(4.4)
Since F
N
(1) = γ(0) < ∞ for any N ∈ N, we can apply Helly’s
selection theorem (cf. Billingsley (1968), page 226ff) to deduce the ex-
istence of a measure generating function
˜
F and a subsequence (F
N
k
)
k
such that F
N
k
converges weakly to
˜
F i.e.,
_
1
0
g(λ) dF
N
k
(λ)
k→∞
−→
_
1
0
g(λ) d
˜
F(λ)
for every continuous and bounded function g : [0, 1] →R (cf. Theorem
2.1 in Billingsley (1968)). Put now F(λ) :=
˜
F(λ) −
˜
F(0). Then F is
156 The Spectrum of a Stationary Process
a measure generating function with F(0) = 0 and
_
1
0
g(λ) d
˜
F(λ) =
_
1
0
g(λ) dF(λ).
If we replace N in (4.4) by N
k
and let k tend to infinity, we now obtain
representation (4.3).
Example 4.1.3. A white noise (ε
t
)
t∈Z
has the autocovariance function
γ(h) =
_
σ
2
, h = 0
0, h ∈ Z \ {0}.
Since
_
1
0
σ
2
e
i2πλh
dλ =
_
σ
2
, h = 0
0, h ∈ Z \ {0},
the process (ε
t
) has by Theorem 4.1.2 the constant spectral density
f(λ) = σ
2
, 0 ≤ λ ≤ 1. This is the name giving property of the white
noise process: As the white light is characteristically perceived to
belong to objects that reflect nearly all incident energy throughout the
visible spectrum, a white noise process weighs all possible frequencies
equally.
Corollary 4.1.4. A symmetric function γ : Z →R is the autocovari-
ance function of a stationary process (Y
t
)
t∈Z
, iff it satisfies one of the
following two (equivalent) conditions:
1. γ(h) =
_
1
0
e
i2πλh
dF(λ), h ∈ Z, where F is a measure generating
function on [0, 1] with F(0) = 0.
2.

1≤r,s≤n
x
r
γ(r −s)x
s
≥ 0 for each n ∈ N and x
1
, . . . , x
n
∈ R.
Proof. Theorem 4.1.2 shows that (i) and (ii) are equivalent. The
assertion is then a consequence of Theorem 4.1.1.
Corollary 4.1.5. A symmetric function γ : Z →R with

t∈Z
|γ(t)| <
∞ is the autocovariance function of a stationary process iff
f(λ) :=

t∈Z
γ(t)e
−i2πλt
≥ 0, λ ∈ [0, 1].
The function f is in this case the spectral density of γ.
4.1 Characterizations of Autocovariance Functions 157
Proof. Suppose first that γ is an autocovariance function. Since γ is in
this case positive semidefinite by Theorem 4.1.1, and

t∈Z
|γ(t)| < ∞
by assumption, we have (Exercise 4.2)
0 ≤ f
N
(λ) : =
1
N

1≤r,s≤N
e
−i2πλr
γ(r −s)e
i2πλs
=

|t|<N
_
1 −
|t|
N
_
γ(t)e
−i2πλt
→f(λ) as N →∞,
see Exercise 4.8. The function f is consequently nonnegative. The in-
verse Fourier transform in Theorem 3.2.5 implies γ(t) =
_
1
0
f(λ)e
i2πλt
dλ,
t ∈ Z i.e., f is the spectral density of γ.
Suppose on the other hand that f(λ) =

t∈Z
γ(t)e
i2πλt
≥ 0, 0 ≤
λ ≤ 1. The inverse Fourier transform implies γ(t) =
_
1
0
f(λ)e
i2πλt

=
_
1
0
e
i2πλt
dF(λ), where F(λ) =
_
λ
0
f(x) dx, 0 ≤ λ ≤ 1. Thus we have
established representation (4.3), which implies that γ is positive semi-
definite, and, consequently, γ is by Corollary 4.1.4 the autocovariance
function of a stationary process.
Example 4.1.6. Choose a number ρ ∈ R. The function
γ(h) =
_
¸
_
¸
_
1, if h = 0
ρ, if h ∈ {−1, 1}
0, elsewhere
is the autocovariance function of a stationary process iff |ρ| ≤ 0.5.
This follows from
f(λ) =

t∈Z
γ(t)e
i2πλt
= γ(0) +γ(1)e
i2πλ
+γ(−1)e
−i2πλ
= 1 + 2ρ cos(2πλ) ≥ 0
for λ ∈ [0, 1] iff |ρ| ≤ 0.5. Note that the function γ is the autocorre-
lation function of an MA(1)-process, cf. Example 2.2.2.
The spectral distribution function of a stationary process satisfies (Ex-
ercise 4.10)
F(0.5 +λ) −F(0.5

) = F(0.5) −F((0.5 −λ)

), 0 ≤ λ < 0.5,
158 The Spectrum of a Stationary Process
where F(x

) := lim
ε↓0
F(x − ε) is the left-hand limit of F at x ∈
(0, 1]. If F has a derivative f, we obtain from the above symmetry
f(0.5 +λ) = f(0.5 −λ) or, equivalently, f(1 −λ) = f(λ) and, hence,
γ(h) =
_
1
0
cos(2πλh) dF(λ) = 2
_
0.5
0
cos(2πλh)f(λ) dλ.
The autocovariance function of a stationary process is, therefore, de-
termined by the values f(λ), 0 ≤ λ ≤ 0.5, if the spectral density
exists. Recall, moreover, that the smallest nonconstant period P
0
visible through observations evaluated at time points t = 1, 2, . . . is
P
0
= 2 i.e., the largest observable frequency is the Nyquist frequency
λ
0
= 1/P
0
= 0.5, cf. the end of Section 3.2. Hence, the spectral
density f(λ) matters only for λ ∈ [0, 0.5].
Remark 4.1.7. The preceding discussion shows that a function f :
[0, 1] →R is the spectral density of a stationary process iff f satisfies
the following three conditions
(i) f(λ) ≥ 0,
(ii) f(λ) = f(1 −λ),
(iii)
_
1
0
f(λ) dλ < ∞.
4.2 Linear Filters and Frequencies
The application of a linear filter to a stationary time series has a quite
complex effect on its autocovariance function, see Theorem 2.1.6. Its
effect on the spectral density, if it exists, turns, however, out to be
quite simple. We use in the following again the notation
_
λ
0
g(x) dF(x)
in place of
_
(0,λ]
g(x) dF(x).
Theorem 4.2.1. Let (Z
t
)
t∈Z
be a stationary process with spectral
distribution function F
Z
and let (a
t
)
t∈Z
be an absolutely summable
filter with Fourier transform f
a
. The linear filtered process Y
t
:=

u∈Z
a
u
Z
t−u
, t ∈ Z, then has the spectral distribution function
F
Y
(λ) :=
_
λ
0
|f
a
(x)|
2
dF
Z
(x), 0 ≤ λ ≤ 1. (4.5)
4.2 Linear Filters and Frequencies 159
If in addition (Z
t
)
t∈Z
has a spectral density f
Z
, then
f
Y
(λ) := |f
a
(λ)|
2
f
Z
(λ), 0 ≤ λ ≤ 1, (4.6)
is the spectral density of (Y
t
)
t∈Z
.
Proof. Theorem 2.1.6 yields that (Y
t
)
t∈Z
is stationary with autoco-
variance function
γ
Y
(t) =

u∈Z

w∈Z
a
u
a
w
γ
Z
(t −u +w), t ∈ Z,
where γ
Z
is the autocovariance function of (Z
t
). Its spectral repre-
sentation (4.3) implies
γ
Y
(t) =

u∈Z

w∈Z
a
u
a
w
_
1
0
e
i2πλ(t−u+w)
dF
Z
(λ)
=
_
1
0
_

u∈Z
a
u
e
−i2πλu
__

w∈Z
a
w
e
i2πλw
_
e
i2πλt
dF
Z
(λ)
=
_
1
0
|f
a
(λ)|
2
e
i2πλt
dF
Z
(λ)
=
_
1
0
e
i2πλt
dF
Y
(λ).
Theorem 4.1.2 now implies that F
Y
is the uniquely determined spec-
tral distribution function of (Y
t
)
t∈Z
. The second to last equality yields
in addition the spectral density (4.6).
Transfer Function and Power Transfer Function
Since the spectral density is a measure of intensity of a frequency λ
inherent in a stationary process (see the discussion of the periodogram
in Section 3.2), the effect (4.6) of applying a linear filter (a
t
) with
Fourier transform f
a
can easily be interpreted. While the intensity of λ
is diminished by the filter (a
t
) iff |f
a
(λ)| < 1, its intensity is amplified
iff |f
a
(λ)| > 1. The Fourier transform f
a
of (a
t
) is, therefore, also
called transfer function and the function g
a
(λ) := |f
a
(λ)|
2
is referred
to as the gain or power transfer function of the filter (a
t
)
t∈Z
.
160 The Spectrum of a Stationary Process
Example 4.2.2. The simple moving average of length three
a
u
=
_
1/3, u ∈ {−1, 0, 1}
0 elsewhere
has the transfer function
f
a
(λ) =
1
3
+
2
3
cos(2πλ)
and the power transfer function
g
a
(λ) =
_
_
_
1, λ = 0
_
sin(3πλ)
3 sin(πλ)
_
2
, λ ∈ (0, 0.5]
(see Exercise 4.13 and Theorem 3.2.2). This power transfer function
is plotted in Plot 4.2.1a below. It shows that frequencies λ close to
zero i.e., those corresponding to a large period, remain essentially
unaltered. Frequencies λ close to 0.5, which correspond to a short
period, are, however, damped by the approximate factor 0.1, when the
moving average (a
u
) is applied to a process. The frequency λ = 1/3
is completely eliminated, since g
a
(1/3) = 0.
4.2 Linear Filters and Frequencies 161
Plot 4.2.1a: Power transfer function of the simple moving average of
length three.
1 /* power_transfer_sma3 .sas */
2 TITLE1 ’Power transfer function ’;
3 TITLE2 ’of the simple moving average of length 3’;
4
5 /* Compute power transfer function */
6 DATA data1;
7 DO lambda =.001 TO .5 BY .001;
8 g=(SIN(3* CONSTANT(’PI ’)*lambda)/(3* SIN(CONSTANT(’PI ’)*lambda)))
→**2;
9 OUTPUT;
10 END;
11
12 /* Graphical options */
13 AXIS1 LABEL=(’g’ H=1 ’a’ H=2 ’(’ F=CGREEK ’l)’);
14 AXIS2 LABEL=(F=CGREEK ’l’);
15 SYMBOL1 V=NONE C=GREEN I=JOIN;
16
17 /* Plot power transfer function */
18 PROC GPLOT DATA=data1;
19 PLOT g*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
20 RUN; QUIT;
Program 4.2.1: Plotting power transfer function.
162 The Spectrum of a Stationary Process
Example 4.2.3. The first order difference filter
a
u
=
_
¸
_
¸
_
1, u = 0
−1, u = 1
0 elsewhere
has the transfer function
f
a
(λ) = 1 −e
−i2πλ
.
Since
f
a
(λ) = e
−iπλ
_
e
iπλ
−e
−iπλ
_
= ie
−iπλ
2 sin(πλ),
its power transfer function is
g
a
(λ) = 4 sin
2
(πλ).
The first order difference filter, therefore, damps frequencies close to
zero but amplifies those close to 0.5.
4.2 Linear Filters and Frequencies 163
Plot 4.2.2a: Power transfer function of the first order difference filter.
Example 4.2.4. The preceding example immediately carries over to
the seasonal difference filter of arbitrary length s ≥ 0 i.e.,
a
(s)
u
=
_
¸
_
¸
_
1, u = 0
−1, u = s
0 elsewhere,
which has the transfer function
f
a
(s) (λ) = 1 −e
−i2πλs
and the power transfer function
g
a
(s) (λ) = 4 sin
2
(πλs).
164 The Spectrum of a Stationary Process
Plot 4.2.3a: Power transfer function of the seasonal difference filter of
order 12.
Since sin
2
(x) = 0 iff x = kπ and sin
2
(x) = 1 iff x = (2k + 1)π/2,
k ∈ Z, the power transfer function g
a
(s) (λ) satisfies for k ∈ Z
g
a
(s) (λ) =
_
0, iff λ = k/s
4 iff λ = (2k + 1)/(2s).
This implies, for example, in the case of s = 12 that those frequencies,
which are multiples of 1/12 = 0.0833, are eliminated, whereas the
midpoint frequencies k/12 +1/24 are amplified. This means that the
seasonal difference filter on the one hand does what we would like it to
do, namely to eliminate the frequency 1/12, but on the other hand it
generates unwanted side effects by eliminating also multiples of 1/12
and by amplifying midpoint frequencies. This observation gives rise
to the problem, whether one can construct linear filters that have
prescribed properties.
4.2 Linear Filters and Frequencies 165
Least Squares Based Filter Design
A low pass filter aims at eliminating high frequencies, a high pass filter
aims at eliminating small frequencies and a band pass filter allows only
frequencies in a certain band [λ
0
−∆, λ
0
+ ∆] to pass through. They
consequently should have the ideal power transfer functions
g
low
(λ) =
_
1, λ ∈ [0, λ
0
]
0, λ ∈ (λ
0
, 0.5]
g
high
(λ) =
_
0, λ ∈ [0, λ
0
)
1, λ ∈ [λ
0
, 0.5]
g
band
(λ) =
_
1, λ ∈ [λ
0
−∆, λ
0
+ ∆]
0 elsewhere,
where λ
0
is the cut off frequency in the first two cases and [λ
0

∆, λ
0
+ ∆] is the cut off interval with bandwidth 2∆ > 0 in the final
one. Therefore, the question naturally arises, whether there actually
exist filters, which have a prescribed power transfer function. One
possible approach for fitting a linear filter with weights a
u
to a given
transfer function f is offered by utilizing least squares. Since only
filters of finite length matter in applications, one chooses a transfer
function
f
a
(λ) =
s

u=r
a
u
e
−i2πλu
with fixed integers r, s and fits this function f
a
to f by minimizing
the integrated squared error
_
0.5
0
|f(λ) −f
a
(λ)|
2

in (a
u
)
r≤u≤s
∈ R
s−r+1
. This is achieved for the choice (Exercise 4.16)
a
u
= 2 Re
_
_
0.5
0
f(λ)e
i2πλu

_
, u = r, . . . , s,
which is formally the real part of the inverse Fourier transform of f.
166 The Spectrum of a Stationary Process
Example 4.2.5. For the low pass filter with cut off frequency 0 <
λ
0
< 0.5 and ideal transfer function
f(λ) = 1
[0,λ
0
]
(λ)
we obtain the weights
a
u
= 2
_
λ
0
0
cos(2πλu) dλ =
_

0
, u = 0
1
πu
sin(2πλ
0
u), u = 0.
Plot 4.2.4a: Transfer function of least squares fitted low pass filter
with cut off frequency λ
0
= 1/10 and r = −20, s = 20.
1 /* transfer.sas */
2 TITLE1 ’Transfer function ’;
3 TITLE2 ’of least squares fitted low pass filter ’;
4
5 /* Compute transfer function */
6 DATA data1;
7 DO lambda =0 TO .5 BY .001;
8 f=2*1/10;
4.3 Spectral Densities of ARMA-Processes 167
9 DO u=1 TO 20;
10 f=f+2*1/( CONSTANT(’PI ’)*u)*SIN (2* CONSTANT(’PI ’) *1/10*u)*COS(2*
→CONSTANT(’PI ’)*lambda*u);
11 END;
12 OUTPUT;
13 END;
14
15 /* Graphical options */
16 AXIS1 LABEL=(’f’ H=1 ’a’ H=2 F=CGREEK ’(l)’);
17 AXIS2 LABEL=(F=CGREEK ’l’);
18 SYMBOL1 V=NONE C=GREEN I=JOIN L=1;
19
20 /* Plot transfer function */
21 PROC GPLOT DATA=data1;
22 PLOT f*lambda / VAXIS=AXIS1 HAXIS=AXIS2 VREF =0;
23 RUN; QUIT;
Program 4.2.4: Transfer function of least squares fitted low pass filter.
The programs in Section 4.2 (Linear Filters and
Frequencies) are just made for the purpose
of generating graphics, which demonstrate the
shape of power transfer functions or, in case of
Program 4.2.4 (transfer.sas), of a transfer func-
tion. They all consist of two parts, a DATA step
and a PROC step.
In the DATA step values of the power transfer
function are calculated and stored in a variable
g by a DO loop over lambda from 0 to 0.5 with
a small increment. In Program 4.2.4 (trans-
fer.sas) it is necessary to use a second DO loop
within the first one to calculate the sum used in
the definition of the transfer function f.
Two AXIS statements defining the axis labels
and a SYMBOL statement precede the procedure
PLOT, which generates the plot of g or f versus
lambda.
The transfer function in Plot 4.2.4a, which approximates the ideal
transfer function 1
[0,0.1]
, shows higher oscillations near the cut off point
λ
0
= 0.1. This is known as Gibbs’ phenomenon and requires further
smoothing of the fitted transfer function if these oscillations are to be
damped (cf. Section 6.4 of Bloomfield (1976)).
4.3 Spectral Densities of ARMA-Processes
Theorem 4.2.1 enables us to compute the spectral density of an ARMA-
process.
Theorem 4.3.1. Suppose that
Y
t
= a
1
Y
t−1
+· · · +a
p
Y
t−p

t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
, t ∈ Z,
168 The Spectrum of a Stationary Process
is a stationary ARMA(p, q)-process, where (ε
t
) is a white noise with
variance σ
2
. Put
A(z) := 1 −a
1
z −a
2
z
2
−· · · −a
p
z
p
,
B(z) := 1 +b
1
z +b
2
z
2
+· · · +b
q
z
q
and suppose that the process (Y
t
) satisfies the stationarity condition
(2.3), i.e., the roots of the equation A(z) = 0 are outside of the unit
circle. The process (Y
t
) then has the spectral density
f
Y
(λ) = σ
2
|B(e
−i2πλ
)|
2
|A(e
−i2πλ
)|
2
= σ
2
|1 +

q
v=1
b
v
e
−i2πλv
|
2
|1 −

p
u=1
a
u
e
−i2πλu
|
2
. (4.7)
Proof. Since the process (Y
t
) is supposed to satisfy the stationarity
condition (2.3) it is causal, i.e., Y
t
=

v≥0
α
v
ε
t−v
, t ∈ Z, for some
absolutely summable constants α
v
, v ≥ 0, see Section 2.2. The white
noise process (ε
t
) has by Example 4.1.3 the spectral density f
ε
(λ) = σ
2
and, thus, (Y
t
) has by Theorem 4.2.1 a spectral density f
Y
. The
application of Theorem 4.2.1 to the process
X
t
:= Y
t
−a
1
Y
t−1
−· · · −a
p
Y
t−p
= ε
t
+b
1
ε
t−1
+· · · +b
q
ε
t−q
then implies that (X
t
) has the spectral density
f
X
(λ) = |A(e
−i2πλ
)|
2
f
Y
(λ) = |B(e
−i2πλ
)|
2
f
ε
(λ).
Since the roots of A(z) = 0 are assumed to be outside of the unit
circle, we have |A(e
−i2πλ
)| = 0 and, thus, the assertion of Theorem
4.3.1 follows.
The preceding result with a
1
= · · · = a
p
= 0 implies that an MA(q)-
process has the spectral density
f
Y
(λ) = σ
2
|B(e
−i2πλ
)|
2
.
With b
1
= · · · = b
q
= 0, Theorem 4.3.1 implies that a stationary
AR(p)-process, which satisfies the stationarity condition (2.3), has
the spectral density
f
Y
(λ) = σ
2
1
|A(e
−i2πλ
)|
2
.
4.3 Spectral Densities of ARMA-Processes 169
Example 4.3.2. The stationary ARMA(1, 1)-process
Y
t
= aY
t−1

t
+bε
t−1
with |a| < 1 has the spectral density
f
Y
(λ) = σ
2
1 + 2b cos(2πλ) +b
2
1 −2a cos(2πλ) +a
2
.
The MA(1)-process, in which case a = 0, has, consequently the spec-
tral density
f
Y
(λ) = σ
2
(1 + 2b cos(2πλ) +b
2
),
and the stationary AR(1)-process with |a| < 1, for which b = 0, has
the spectral density
f
Y
(λ) = σ
2
1
1 −2a cos(2πλ) +a
2
.
The following figures display spectral densities of ARMA(1, 1)-processes
for various a and b with σ
2
= 1.
170 The Spectrum of a Stationary Process
Plot 4.3.1a: Spectral densities of ARMA(1, 1)-processes Y
t
= aY
t−1
+
ε
t
+bε
t−1
with fixed a and various b; σ
2
= 1.
1 /* arma11_sd.sas */
2 TITLE1 ’Spectral densities of ARMA (1,1)-processes ’;
3
4 /* Compute spectral densities of ARMA (1,1)-processes */
5 DATA data1;
6 a=.5;
7 DO b=-.9, -.2, 0, .2, .5;
8 DO lambda =0 TO .5 BY .005;
9 f=(1+2*b*COS(2* CONSTANT(’PI ’)*lambda)+b*b)/(1-2*a*COS(2*
→CONSTANT(’PI ’)*lambda)+a*a);
10 OUTPUT;
11 END;
12 END;
13
14 /* Graphical options */
15 AXIS1 LABEL=(’f’ H=1 ’Y’ H=2 F=CGREEK ’(l)’);
16 AXIS2 LABEL=(F=CGREEK ’l’);
17 SYMBOL1 V=NONE C=GREEN I=JOIN L=4;
18 SYMBOL2 V=NONE C=GREEN I=JOIN L=3;
19 SYMBOL3 V=NONE C=GREEN I=JOIN L=2;
20 SYMBOL4 V=NONE C=GREEN I=JOIN L=33;
21 SYMBOL5 V=NONE C=GREEN I=JOIN L=1;
22 LEGEND1 LABEL=(’a=0.5, b=’);
23
24 /* Plot spectral densities of ARMA (1,1)-processes */
4.3 Spectral Densities of ARMA-Processes 171
25 PROC GPLOT DATA=data1;
26 PLOT f*lambda=b / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
27 RUN; QUIT;
Program 4.3.1: Computation of spectral densities.
Like in section 4.2 (Linear Filters and Frequen-
cies) the programs here just generate graphics.
In the DATA step some loops over the varying
parameter and over lambda are used to calcu-
late the values of the spectral densities of the
corresponding processes. Here SYMBOL state-
ments are necessary and a LABEL statement
to distinguish the different curves generated by
PROC GPLOT.
Plot 4.3.2a: Spectral densities of ARMA(1, 1)-processes with parame-
ter b fixed and various a. Corresponding file: arma11 sd2.sas.
172 The Spectrum of a Stationary Process
Plot 4.3.3a: Spectral densities of MA(1)-processes Y
t
= ε
t
+ bε
t−1
for
various b. Corresponding file: ma1 sd.sas.
Exercises 173
Plot 4.3.4a: Spectral densities of AR(1)-processes Y
t
= aY
t−1
+ ε
t
for
various a. Corresponding file: ar1 sd.sas.
Exercises
4.1. Formulate and prove Theorem 4.1.1 for Hermitian functions K
and complex-valued stationary processes. Hint for the sufficiency part:
Let K
1
be the real part and K
2
be the imaginary part of K. Consider
the real-valued 2n ×2n-matrices
M
(n)
=
1
2
_
K
(n)
1
K
(n)
2
−K
(n)
2
K
(n)
1
_
, K
(n)
l
=
_
K
l
(r −s)
_
1≤r,s≤n
, l = 1, 2.
Then M
(n)
is a positive semidefinite matrix (check that z
T
K¯ z =
(x, y)
T
M
(n)
(x, y), z = x + iy, x, y ∈ R
n
). Proceed as in the proof
of Theorem 4.1.1: Let (V
1
, . . . , V
n
, W
1
, . . . , W
n
) be a 2n-dimensional
normal distributed random vector with mean vector zero and covari-
ance matrix M
(n)
and define for n ∈ N the family of finite dimensional
174 Chapter 4. The Spectrum of a Stationary Process
distributions
F
t+1,...,t+n
(v
1
, w
1
, . . . , v
n
, w
n
) := P{V
1
≤ v
1
, W
1
≤ w
1
, . . . , V
n
≤ v
n
, W
n
≤ w
n
},
t ∈ Z. By Kolmogorov’s theorem there exists a bivariate Gaussian
process (V
t
, W
t
)
t∈Z
with mean vector zero and covariances
E(V
t+h
V
t
) = E(W
t+h
W
t
) =
1
2
K
1
(h)
E(V
t+h
W
t
) = −E(W
t+h
V
t
) =
1
2
K
2
(h).
Conclude by showing that the complex-valued process Y
t
:= V
t
−iW
t
,
t ∈ Z, has the autocovariance function K.
4.2. Suppose that A is a real positive semidefinite n ×n-matrix i.e.,
x
T
Ax ≥ 0 for x ∈ R
n
. Show that A is also positive semidefinite for
complex numbers i.e., z
T
A¯ z ≥ 0 for z ∈ C
n
.
4.3. Use (4.3) to show that for 0 < a < 0.5
γ(h) =
_
sin(2πah)
2πh
, h ∈ Z \ {0}
a, h = 0
is the autocovariance function of a stationary process. Compute its
spectral density.
4.4. Compute the autocovariance function of a stationary process with
spectral density
f(λ) =
0.5 −|λ −0.5|
0.5
2
, 0 ≤ λ ≤ 1.
4.5. Suppose that F and G are measure generating functions defined
on some interval [a, b] with F(a) = G(a) = 0 and
_
[a,b]
ψ(x) F(dx) =
_
[a,b]
ψ(x) G(dx)
for every continuous function ψ : [a, b] → R. Show that F=G. Hint:
Approximate the indicator function 1
[a,t]
(x), x ∈ [a, b], by continuous
functions.
Exercises 175
4.6. Generate a white noise process and plot its periodogram.
4.7. A real valued stationary process (Y
t
)
t∈Z
is supposed to have the
spectral density f(λ) = a+bλ, λ ∈ [0, 0.5]. Which conditions must be
satisfied by a and b? Compute the autocovariance function of (Y
t
)
t∈Z
.
4.8. (C´esaro convergence) Show that lim
N→∞

N
t=1
a
t
= S implies
lim
N→∞

N−1
t=1
(1−t/N)a
t
= S. Hint:

N−1
t=1
(1−t/N)a
t
= (1/N)

N−1
s=1

s
t=1
a
t
.
4.9. Suppose that (Y
t
)
t∈Z
and (Z
t
)
t∈Z
are stationary processes such
that Y
r
and Z
s
are uncorrelated for arbitrary r, s ∈ Z. Denote by F
Y
and F
Z
the pertaining spectral distribution functions and put X
t
:=
Y
t
+ Z
t
, t ∈ Z. Show that the process (X
t
) is also stationary and
compute its spectral distribution function.
4.10. Let (Y
t
) be a real valued stationary process with spectral dis-
tribution function F. Show that for any function g : [−0.5, 0.5] → C
with
_
1
0
|g(λ −0.5)|
2
dF(λ) < ∞
1
_
0
g(λ −0.5) dF(λ) =
_
1
0
g(0.5 −λ) dF(λ).
In particular we have
F(0.5 +λ) −F(0.5

) = F(0.5) −F((0.5 −λ)

).
Hint: Verify the equality first for g(x) = exp(i2πhx), h ∈ Z, and then
use the fact that, on compact sets, the trigonometric polynomials
are uniformly dense in the space of continuous functions, which in
turn form a dense subset in the space of square integrable functions.
Finally, consider the function g(x) = 1
[0,ξ]
(x), 0 ≤ ξ ≤ 0.5 (cf. the
hint in Exercise 4.5).
4.11. Let (X
t
) and (Y
t
) be stationary processes with mean zero and
absolute summable covariance functions. If their spectral densities f
X
and f
Y
satisfy f
X
(λ) ≤ f
Y
(λ) for 0 ≤ λ ≤ 1, show that
(i) Γ
n,Y
−Γ
n,X
is a positive semidefinite matrix, where Γ
n,X
and Γ
n,Y
are the covariance matrices of (X
1
, . . . , X
n
)
T
and (Y
1
, . . . , Y
n
)
T
respectively, and
176 Chapter 4. The Spectrum of a Stationary Process
(ii) Var(a
T
(X
1
, . . . , X
n
)) ≤ Var(a
T
(Y
1
, . . . , Y
n
)) for all a = (a
1
, . . . , a
n
)
T

R
n
.
4.12. Compute the gain function of the filter
a
u
=
_
¸
_
¸
_
1/4, u ∈ {−1, 1}
1/2, u = 0
0 elsewhere.
4.13. The simple moving average
a
u
=
_
1/(2q + 1), u ≤ |q|
0 elsewhere
has the gain function
g
a
(λ) =
_
_
_
1, λ = 0
_
sin((2q+1)πλ)
(2q+1) sin(πλ)
_
2
, λ ∈ (0, 0.5].
Is this filter for large q a low pass filter? Plot its power transfer
functions for q = 5/10/20. Hint: Exercise 3.2.
4.14. Compute the gain function of the exponential smoothing filter
a
u
=
_
α(1 −α)
u
, u ≥ 0
0, u < 0,
where 0 < α < 1. Plot this function for various α. What is the effect
of α →0 or α →1?
4.15. Let (X
t
)
t∈Z
be a stationary process, (a
u
)
u∈Z
an absolutely sum-
mable filter and put Y
t
:=

u∈Z
a
u
X
t−u
, t ∈ Z. If (b
w
)
w∈Z
is another
absolutely summable filter, then the process Z
t
=

w∈Z
b
w
Y
t−w
has
the spectral distribution function
F
Z
(λ) =
λ
_
0
|F
a
(µ)|
2
|F
b
(µ)|
2
dF
X
(µ)
(cf. Exercise 3.11 (iii)).
Exercises 177
4.16. Show that the function
D(a
r
, . . . , a
s
) =
0.5
_
0
|f(λ) −f
a
(λ)|
2

with f
a
(λ) =

s
u=r
a
u
e
−i2πλu
, a
u
∈ R, is minimized for
a
u
:= 2 Re
_
_
0.5
0
f(λ)e
i2πλu

_
, u = r, . . . , s.
Hint: Put f(λ) = f
1
(λ) +if
2
(λ) and differentiate with respect to a
u
.
4.17. Compute in analogy to Example 4.2.5 the transfer functions of
the least squares high pass and band pass filter. Plot these functions.
Is Gibbs’ phenomenon observable?
4.18. An AR(2)-process
Y
t
= a
1
Y
t−1
+a
2
Y
t−2

t
satisfying the stationarity condition (2.3) (cf. Exercise 2.23) has the
spectral density
f
Y
(λ) =
σ
2
1 +a
2
1
+a
2
2
+ 2(a
1
a
2
−a
1
) cos(2πλ) −2a
2
cos(4πλ)
.
Plot this function for various choices of a
1
, a
2
.
4.19. Show that (4.2) is a necessary condition for K to be the auto-
covariance function of a stationary process.
178 Chapter 4. The Spectrum of a Stationary Process
Chapter
5
Statistical Analysis in
the Frequency Domain
We will deal in this chapter with the problem of testing for a white
noise and the estimation of the spectral density. The empirical coun-
terpart of the spectral density, the periodogram, will be basic for both
problems, though it will turn out that it is not a consistent estimate.
Consistency requires extra smoothing of the periodogram via a linear
filter.
5.1 Testing for a White Noise
Our initial step in a statistical analysis of a time series in the frequency
domain is to test, whether the data are generated by a white noise

t
)
t∈Z
. We start with the model
Y
t
= µ +Acos(2πλt) +Bsin(2πλt) +ε
t
,
where we assume that the ε
t
are independent and normal distributed
with mean zero and variance σ
2
. We will test the nullhypothesis
A = B = 0
against the alternative
A = 0 or B = 0,
where the frequency λ, the variance σ
2
> 0 and the intercept µ ∈ R
are unknown. Since the periodogram is used for the detection of highly
intensive frequencies inherent in the data, it seems plausible to apply
it to the preceding testing problem as well. Note that (Y
t
)
t∈Z
is a
stationary process only under the nullhypothesis A = B = 0.
180 Statistical Analysis in the Frequency Domain
The Distribution of the Periodogram
In the following we will derive the tests by Fisher and Bartlett–
Kolmogorov–Smirnov for the above testing problem. In a preparatory
step we compute the distribution of the periodogram.
Lemma 5.1.1. Let ε
1
, . . . , ε
n
be independent and identically normal
distributed random variables with mean µ ∈ R and variance σ
2
> 0.
Denote by ¯ ε := n
−1

n
t=1
ε
t
the sample mean of ε
1
, . . . , ε
n
and by
C
ε
_
k
n
_
=
1
n
n

t=1

t
− ¯ ε) cos
_

k
n
t
_
,
S
ε
_
k
n
_
=
1
n
n

t=1

t
− ¯ ε) sin
_

k
n
t
_
the cross covariances with Fourier frequencies k/n, 1 ≤ k ≤ [(n −
1)/2], cf. (3.1). Then the 2[(n −1)/2] random variables
C
ε
(k/n), S
ε
(k/n), 1 ≤ k ≤ [(n −1)/2],
are independent and identically N(0, σ
2
/(2n))-distributed.
Proof. Note that with m := [(n −1)/2] we have
v :=
_
C
ε
(1/n), S
ε
(1/n), . . . , C
ε
(m/n), S
ε
(m/n)
_
T
= An
−1

t
− ¯ ε)
1≤t≤n
= A(I
n
−n
−1
E
n
)(ε
t
)
1≤t≤n
,
where the 2m×n-matrix A is given by
A :=
1
n
_
_
_
_
_
_
_
_
_
_
cos
_

1
n
_
cos
_

1
n
2
_
. . . cos
_

1
n
n
_
sin
_

1
n
_
sin
_

1
n
2
_
. . . sin
_

1
n
n
_
.
.
.
.
.
.
cos
_

m
n
_
cos
_

m
n
2
_
. . . cos
_

m
n
n
_
sin
_

m
n
_
sin
_

m
n
2
_
. . . sin
_

m
n
n
_
_
_
_
_
_
_
_
_
_
_
.
5.1 Testing for a White Noise 181
I
n
is the n × n-unity matrix and E
n
is the n × n-matrix with each
entry being 1. The vector v is, therefore, normal distributed with
mean vector zero and covariance matrix
σ
2
A(I
n
−n
−1
E
n
)(I
n
−n
−1
E
n
)
T
A
T
= σ
2
A(I
n
−n
−1
E
n
)A
T
= σ
2
AA
T
=
σ
2
2n
I
n
,
which is a consequence of (3.5) and the orthogonality properties from
Lemma 3.1.2; see e.g. Definition 2.1.2 in Falk et al. (2002).
Corollary 5.1.2. Let ε
1
, . . . , ε
n
be as in the preceding lemma and let
I
ε
(k/n) = n
_
C
2
ε
(k/n) +S
2
ε
(k/n)
_
be the pertaining periodogram, evaluated at the Fourier frequencies
k/n, 1 ≤ k ≤ [(n − 1)/2]. The random variables I
ε
(k/n)/σ
2
are
independent and identically standard exponential distributed i.e.,
P{I
ε
(k/n)/σ
2
≤ x} =
_
1 −exp(−x), x > 0
0, x ≤ 0.
Proof. Lemma 5.1.1 implies that
_
2n
σ
2
C
ε
(k/n),
_
2n
σ
2
S
ε
(k/n)
are independent standard normal random variables and, thus,
2I
ε
(k/n)
σ
2
=
_
_
2n
σ
2
C
ε
(k/n)
_
2
+
_
_
2n
σ
2
S
ε
(k/n)
_
2
is χ
2
-distributed with two degrees of freedom. Since this distribution
has the distribution function 1 − exp(−x/2), x ≥ 0, the assertion
follows; see e.g. Theorem 2.1.7 in Falk et al. (2002).
182 Statistical Analysis in the Frequency Domain
Denote by U
1:m
≤ U
2:m
≤ · · · ≤ U
m:m
the ordered values pertaining
to independent and uniformly on (0, 1) distributed random variables
U
1
, . . . , U
m
. It is a well known result in the theory of order statistics
that the distribution of the vector (U
j:m
)
1≤j≤m
coincides with that
of ((Z
1
+ · · · + Z
j
)/(Z
1
+ · · · + Z
m+1
))
1≤j≤m
, where Z
1
, . . . , Z
m+1
are
independent and identically exponential distributed random variables;
see, for example, Theorem 1.6.7 in Reiss (1989). The following result,
which will be basic for our further considerations, is, therefore, an
immediate consequence of Corollary 5.1.2; see also Exercise 5.3. By
=
D
we denote equality in distribution.
Theorem 5.1.3. Let ε
1
, . . . , ε
n
be independent N(µ, σ
2
)-distributed
random variables and denote by
S
j
:=

j
k=1
I
ε
(k/n)

m
k=1
I
ε
(k/n)
, j = 1, . . . , m := [(n −1)/2],
the cumulated periodogram. Note that S
m
= 1. Then we have
_
S
1
, . . . , S
m−1
_
=
D
_
U
1:m−1
, . . . , U
m−1:m−1
).
The vector (S
1
, . . . , S
m−1
) has, therefore, the Lebesgue-density
f(s
1
, . . . , s
m−1
) =
_
(m−1)!, if 0 < s
1
< · · · < s
m−1
< 1
0 elsewhere.
The following consequence of the preceding result is obvious.
Corollary 5.1.4. The empirical distribution function of S
1
, . . . , S
m−1
is distributed like that of U
1
, . . . , U
m−1
, i.e.,
ˆ
F
m−1
(x) :=
1
m−1
m−1

j=1
1
(0,x]
(S
j
) =
D
1
m−1
m−1

j=1
1
(0,x]
(U
j
), x ∈ [0, 1].
Corollary 5.1.5. Put S
0
:= 0 and
M
m
:= max
1≤j≤m
(S
j
−S
j−1
) =
max
1≤j≤m
I
ε
(j/n)

m
k=1
I
ε
(k/n)
.
5.1 Testing for a White Noise 183
The maximum spacing M
m
has the distribution function
G
m
(x) := P{M
m
≤ x} =
m

j=0
(−1)
j
_
m
j
_
(max{0, 1−jx})
m−1
, x > 0.
Proof. Put
V
j
:= S
j
−S
j−1
, j = 1, . . . , m.
By Theorem 5.1.3 the vector (V
1
, . . . , V
m
) is distributed like the length
of the m consecutive intervals into which [0, 1] is partitioned by the
m−1 random points U
1
, . . . , U
m−1
:
(V
1
, . . . , V
m
) =
D
(U
1:m−1
, U
2:m−1
−U
1:m−1
, . . . , 1 −U
m−1:m−1
).
The probability that M
m
is less than or equal to x equals the prob-
ability that all spacings V
j
are less than or equal to x, and this is
provided by the covering theorem as stated in Theorem 3 in Section
I.9 of Feller (1971).
Fisher’s Test
The preceding results suggest to test the hypothesis Y
t
= ε
t
with ε
t
independent and N(µ, σ
2
)-distributed, by testing for the uniform dis-
tribution on [0, 1]. Precisely, we will reject this hypothesis if Fisher’s
κ-statistic
κ
m
:=
max
1≤j≤m
I(j/n)
(1/m)

m
k=1
I(k/n)
= mM
m
is significantly large, i.e., if one of the values I(j/n) is significantly
larger than the average over all. The hypothesis is, therefore, rejected
at error level α if
κ
m
> c
α
with 1 −G
m
(c
α
) = α.
This is Fisher’s test for hidden periodicities. Common values are
α = 0.01 and = 0.05. Table 5.1.1, taken from Fuller (1976), lists
several critical values c
α
.
Note that these quantiles can be approximated by corresponding quan-
tiles of a Gumbel distribution if m is large (Exercise 5.12).
184 Statistical Analysis in the Frequency Domain
m c
0.05
c
0.01
m c
0.05
c
0.01
10 4.450 5.358 150 7.832 9.372
15 5.019 6.103 200 8.147 9.707
20 5.408 6.594 250 8.389 9.960
25 5.701 6.955 300 8.584 10.164
30 5.935 7.237 350 8.748 10.334
40 6.295 7.663 400 8.889 10.480
50 6.567 7.977 500 9.123 10.721
60 6.785 8.225 600 9.313 10.916
70 6.967 8.428 700 9.473 11.079
80 7.122 8.601 800 9.612 11.220
90 7.258 8.750 900 9.733 11.344
100 7.378 8.882 1000 9.842 11.454
Table 5.1.1: Critical values c
α
of Fisher’s test for hidden periodicities.
The Bartlett–Kolmogorov–Smirnov Test
Denote again by S
j
the cumulated periodogram as in Theorem 5.1.3.
If actually Y
t
= ε
t
with ε
t
independent and identically N(µ, σ
2
)-
distributed, then we know from Corollary 5.1.4 that the empirical
distribution function
ˆ
F
m−1
of S
1
, . . . , S
m−1
behaves stochastically ex-
actly like that of m−1 independent and uniformly on (0, 1) distributed
random variables. Therefore, with the Kolmogorov–Smirnov statistic

m−1
:= sup
x∈[0,1]
|
ˆ
F
m−1
(x) −x|
we can measure the maximum difference between the empirical dis-
tribution function and the theoretical one F(x) = x, x ∈ [0, 1]. The
following rule is quite common. For m > 30 i.e., n > 62, the hy-
pothesis Y
t
= ε
t
with ε
t
being independent and N(µ, σ
2
)-distributed
is rejected if ∆
m−1
> c
α
/

m−1, where c
0.05
= 1.36 and c
0.01
= 1.63
are the critical values for the levels α = 0.05 and α = 0.01.
This Bartlett-Kolmogorov-Smirnov test can also be carried out visu-
ally by plotting for x ∈ [0, 1] the sample distribution function
ˆ
F
m−1
(x)
and the band
y = x ±
c
α

m−1
.
5.1 Testing for a White Noise 185
The hypothesis Y
t
= ε
t
is rejected if
ˆ
F
m−1
(x) is for some x ∈ [0, 1]
outside of this confidence band.
Example 5.1.6. (Airline Data). We want to test, whether the vari-
ance stabilized, trend eliminated and seasonally adjusted Airline Data
from Example 1.3.1 were generated from a white noise (ε
t
)
t∈Z
, where
ε
t
are independent and identically normal distributed. The Fisher
test statistic has the value κ
m
= 6.573 and does, therefore, not reject
the hypothesis at the levels α = 0.05 and α = 0.01, where m = 65.
The Bartlett–Kolmogorov–Smirnov test, however, rejects this hypoth-
esis at both levels, since ∆
64
= 0.2319 > 1.36/

64 = 0.17 and also

64
> 1.63/

64 = 0.20375.
SPECTRA Procedure
----- Test for White Noise for variable DLOGNUM -----
Fisher ’s Kappa: M*MAX(P(*))/SUM(P(*))
Parameters: M = 65
MAX(P(*)) = 0.028
SUM(P(*)) = 0.275
Test Statistic: Kappa = 6.5730
Bartlett ’s Kolmogorov -Smirnov Statistic:
Maximum absolute difference of the standardized
partial sums of the periodogram and the CDF of a
uniform (0,1) random variable.
Test Statistic = 0.2319
Listing 5.1.1a: Fisher’s κ and the Bartlett-Kolmogorov-Smirnov test
with m = 65 for testing a white noise generation of the adjusted
Airline Data.
1 /* airline_whitenoise.sas */
2 TITLE1 ’Tests for white noise ’;
3 TITLE2 ’for the trend und seasonal ’;
4 TITLE3 ’adjusted Airline Data ’;
5
6 /* Read in the data and compute log -transformation as well as
→seasonal and trend adjusted data */
7 DATA data1;
8 INFILE ’c:\data\airline.txt ’;
186 Statistical Analysis in the Frequency Domain
9 INPUT num @@;
10 dlognum=DIF12(DIF(LOG(num)));
11
12 /* Compute periodogram and test for white noise */
13 PROC SPECTRA DATA=data1 P WHITETEST OUT=data2;
14 VAR dlognum;
15 RUN; QUIT;
Program 5.1.1: Testing for white noise.
In the DATA step the raw data of the airline pas-
sengers are read into the variable num. A log-
transformation, building the fist order difference
for trend elimination and the 12th order differ-
ence for elimination of a seasonal component
lead to the variable dlognum, which is supposed
to be generated by a stationary process.
Then PROC SPECTRA is applied to this variable,
whereby the options P and OUT=data2 gener-
ate a data set containing the periodogram data.
The option WHITETEST causes SAS to carry out
the two tests for a white noise, Fisher’s test
and the Bartlett-Kolmogorov-Smirnov test. SAS
only provides the values of the test statistics but
no decision. One has to compare these values
with the critical values from Table 5.1.1 (Criti-
cal values for Fisher’s Test in the script) and the
approximative ones c
α
/

m−1.
The following figure visualizes the rejection at both levels by the
Bartlett-Kolmogorov-Smirnov test.
5.1 Testing for a White Noise 187
Plot 5.1.2a: Bartlett-Kolmogorov-Smirnov test with m = 65 test-
ing for a white noise generation of the adjusted Airline Data. Solid
line/broken line = confidence bands for
ˆ
F
m−1
(x), x ∈ [0, 1], at levels
α = 0.05/0.01.
1 /* airline_whitenoise_plot .sas */
2 TITLE1 ’Visualisation of the test for white noise ’;
3 TITLE2 ’for the trend und seasonal adjusted ’;
4 TITLE3 ’Airline Data ’;
5 /* Note that this program needs data2 generated by the previous
→program (airline_whitenoise.sas) */
6
7 /* Calculate the sum of the periodogram */
8 PROC MEANS DATA=data2(FIRSTOBS =2) NOPRINT;
9 VAR P_01;
10 OUTPUT OUT=data3 SUM=psum;
11
12 /* Compute empirical distribution function of cumulated periodogram
→ and its confidence bands */
13 DATA data4;
14 SET data2(FIRSTOBS =2);
15 IF _N_=1 THEN SET data3;
16 RETAIN s 0;
17 s=s+P_01/psum;
18 fm=_N_/(_FREQ_ -1);
19 yu_01=fm +1.63/ SQRT(_FREQ_ -1);
20 yl_01=fm -1.63/ SQRT(_FREQ_ -1);
21 yu_05=fm +1.36/ SQRT(_FREQ_ -1);
188 Statistical Analysis in the Frequency Domain
22 yl_05=fm -1.36/ SQRT(_FREQ_ -1);
23
24 /* Graphical options */
25 SYMBOL1 V=NONE I=STEPJ C=GREEN;
26 SYMBOL2 V=NONE I=JOIN C=RED L=2;
27 SYMBOL3 V=NONE I=JOIN C=RED L=1;
28 AXIS1 LABEL=(’x’) ORDER =(.0 TO 1.0 BY .1);
29 AXIS2 LABEL=NONE;
30
31 /* Plot empirical distribution function of cumulated periodogram
→with its confidence bands */
32 PROC GPLOT DATA=data4;
33 PLOT fm*s=1 yu_01*fm=2 yl_01*fm=2 yu_05*fm=3 yl_05*fm=3 / OVERLAY
→ HAXIS=AXIS1 VAXIS=AXIS2;
34 RUN; QUIT;
Program 5.1.2: Testing for white noise with confidence bands.
This program uses the data set data2 cre-
ated by Program 5.1.1 (airline whitenoise.sas),
where the first observation belonging to the fre-
quency 0 is dropped. PROC MEANS calculates
the sum (keyword SUM) of the SAS periodogram
variable P 0 and stores it in the variable psum of
the data set data3. The NOPRINT option sup-
presses the printing of the output.
The next DATA step combines every observa-
tion of data2 with this sum by means of the
IF statement. Furthermore a variable s is ini-
tialized with the value 0 by the RETAIN state-
ment and then the portion of each periodogram
value from the sum is cumulated. The variable
fm contains the values of the empirical distrib-
ution function calculated by means of the auto-
matically generated variable N containing the
number of observation and the variable FREQ ,
which was created by PROC MEANS and contains
the number m. The values of the upper and
lower band are stored in the y variables.
The last part of this program contains SYMBOL
and AXIS statements and PROC GPLOT to visu-
alize the Bartlett-Kolmogorov-Smirnov statistic.
The empirical distribution of the cumulated peri-
odogram is represented as a step function due
to the I=STEPJ option in the SYMBOL1 statement.
5.2 Estimating Spectral Densities
We suppose in the following that (Y
t
)
t∈Z
is a stationary real valued
process with mean µ and absolutely summable autocovariance func-
tion γ. According to Corollary 4.1.5, the process (Y
t
) has the contin-
uous spectral density
f(λ) =

h∈Z
γ(h)e
−i2πλh
.
In the preceding section we computed the distribution of the empirical
counterpart of a spectral density, the periodogram, in the particular
5.2 Estimating Spectral Densities 189
case, when (Y
t
) is a Gaussian white noise. In this section we will inves-
tigate the limit behavior of the periodogram for arbitrary independent
random variables (Y
t
).
Asymptotic Properties of the Periodogram
In order to establish asymptotic properties of the periodogram, its
following modification is quite useful. For Fourier frequencies k/n,
0 ≤ k ≤ [n/2] we put
I
n
(k/n) =
1
n
¸
¸
¸
n

t=1
Y
t
e
−i2π(k/n)t
¸
¸
¸
2
=
1
n
_
_
n

t=1
Y
t
cos
_

k
n
t
__
2
+
_
n

t=1
Y
t
sin
_

k
n
t
__
2
_
.
(5.1)
Up to k = 0, this coincides by (3.6) with the definition of the peri-
odogram as given in (3.7). From Theorem 3.2.3 we obtain the repre-
sentation
I
n
(k/n) =
_
n
¯
Y
2
n
, k = 0

|h|<n
c(h)e
−i2π(k/n)h
, k = 1, . . . , [n/2]
(5.2)
with
¯
Y
n
:= n
−1

n
t=1
Y
t
and the sample autocovariance function
c(h) =
1
n
n−|h|

t=1
_
Y
t

¯
Y
n
__
Y
t+|h|

¯
Y
n
_
.
By representation (5.1) and the equations (3.5), the value I
n
(k/n)
does not change for k = 0 if we replace the sample mean
¯
Y
n
in c(h) by
the theoretical mean µ. This leads to the equivalent representation of
190 Statistical Analysis in the Frequency Domain
the periodogram for k = 0
I
n
(k/n) =

|h|<n
1
n
_
n−|h|

t=1
(Y
t
−µ)(Y
t+|h|
−µ)
_
e
−i2π(k/n)h
=
1
n
n

t=1
(Y
t
−µ)
2
+ 2
n−1

h=1
1
n
_
n−|h|

t=1
(Y
t
−µ)(Y
t+|h|
−µ)
_
cos
_

k
n
h
_
.
(5.3)
We define now the periodogram for λ ∈ [0, 0.5] as a piecewise constant
function
I
n
(λ) = I
n
(k/n) if
k
n

1
2n
< λ ≤
k
n
+
1
2n
. (5.4)
The following result shows that the periodogram I
n
(λ) is for λ = 0 an
asymptotically unbiased estimator of the spectral density f(λ).
Theorem 5.2.1. Let (Y
t
)
t∈Z
be a stationary process with absolutely
summable autocovariance function γ. Then we have with µ = E(Y
t
)
E(I
n
(0)) −nµ
2
n→∞
−→ f(0),
E(I
n
(λ))
n→∞
−→ f(λ), λ = 0.
If µ = 0, then the convergence E(I
n
(λ))
n→∞
−→ f(λ) holds uniformly on
[0, 0.5].
Proof. By representation (5.2) and the C´esaro convergence result (Ex-
ercise 4.8) we have
E(I
n
(0)) −nµ
2
=
1
n
_
n

t=1
n

s=1
E(Y
t
Y
s
)
_
−nµ
2
=
1
n
n

t=1
n

s=1
Cov(Y
t
, Y
s
)
=

|h|<n
_
1 −
|h|
n
_
γ(h)
n→∞
−→

h∈Z
γ(h) = f(0).
5.2 Estimating Spectral Densities 191
Define now for λ ∈ [0, 0.5] the auxiliary function
g
n
(λ) :=
k
n
, if
k
n

1
2n
< λ ≤
k
n
+
1
2n
, k ∈ Z. (5.5)
Then we obviously have
I
n
(λ) = I
n
(g
n
(λ)). (5.6)
Choose now λ ∈ (0, 0.5]. Since g
n
(λ) −→λ as n →∞, it follows that
g
n
(λ) > 0 for n large enough. By (5.3) and (5.6) we obtain for such n
E(I
n
(λ)) =

|h|<n
1
n
n−|h|

t=1
E
_
(Y
t
−µ)(Y
t+|h|
−µ)
_
e
−i2πg
n
(λ)h
=

|h|<n
_
1 −
|h|
n
_
γ(h)e
−i2πg
n
(λ)h
.
Since

h∈Z
|γ(h)| < ∞, the series

|h|<n
γ(h) exp(−i2πλh) converges
to f(λ) uniformly for 0 ≤ λ ≤ 0.5. Kronecker’s Lemma (Exercise 5.8)
implies moreover
¸
¸
¸

|h|<n
|h|
n
γ(h)e
−i2πλh
¸
¸
¸ ≤

|h|<n
|h|
n
|γ(h)|
n→∞
−→ 0,
and, thus, the series
f
n
(λ) :=

|h|<n
_
1 −
|h|
n
_
γ(h)e
−i2πλh
converges to f(λ) uniformly in λ as well. From g
n
(λ)
n→∞
−→ λ and the
continuity of f we obtain for λ ∈ (0, 0.5]
| E(I
n
(λ)) −f(λ)| = |f
n
(g
n
(λ)) −f(λ)|
≤ |f
n
(g
n
(λ)) −f(g
n
(λ))| +|f(g
n
(λ)) −f(λ)|
n→∞
−→ 0.
Note that |g
n
(λ) − λ| ≤ 1/(2n). The uniform convergence in case of
µ = 0 then follows from the uniform convergence of g
n
(λ) to λ and
the uniform continuity of f on the compact interval [0, 0.5].
192 Statistical Analysis in the Frequency Domain
In the following result we compute the asymptotic distribution of
the periodogram for independent and identically distributed random
variables with zero mean, which are not necessarily Gaussian ones.
The Gaussian case was already established in Corollary 5.1.2.
Theorem 5.2.2. Let Z
1
, . . . , Z
n
be independent and identically dis-
tributed random variables with mean E(Z
t
) = 0 and variance E(Z
2
t
) =
σ
2
< ∞. Denote by I
n
(λ) the pertaining periodogram as defined in
(5.6).
1. The random vector (I
n

1
), . . . , I
n

r
)) with 0 < λ
1
< · · · <
λ
r
< 0.5 converges in distribution for n →∞ to the distribution
of r independent and identically exponential distributed random
variables with mean σ
2
.
2. If E(Z
4
t
) = ησ
4
< ∞, then we have for k = 0, . . . , [n/2]
Var(I
n
(k/n)) =
_

4
+n
−1
(η −3)σ
4
, k = 0 or k = n/2, if n even
σ
4
+n
−1
(η −3)σ
4
elsewhere
(5.7)
and
Cov(I
n
(j/n), I
n
(k/n)) = n
−1
(η −3)σ
4
, j = k. (5.8)
For N(0, σ
2
)-distributed random variables Z
t
we have η = 3 (Exer-
cise 5.9) and, thus, I
n
(k/n) and I
n
(j/n) are for j = k uncorrelated.
Actually, we established in Corollary 5.1.2 that they are independent
in this case.
Proof. Put for λ ∈ (0, 0.5)
A
n
(λ) := A
n
(g
n
(λ)) :=
_
2/n
n

t=1
Z
t
cos(2πg
n
(λ)t),
B
n
(λ) := B
n
(g
n
(λ)) :=
_
2/n
n

t=1
Z
t
sin(2πg
n
(λ)t),
with g
n
defined in (5.5). Since
I
n
(λ) =
1
2
_
A
2
n
(λ) +B
2
n
(λ)
_
,
5.2 Estimating Spectral Densities 193
it suffices, by repeating the arguments in the proof of Corollary 5.1.2,
to show that
_
A
n

1
), B
n

1
), . . . , A
n

r
), B
n

r
)
_
−→
D
N(0,σ
2
I
2r
),
where I
2r
denotes the 2r ×2r-unity matrix and −→
D
convergence in
distribution. Since g
n
(λ)
n→∞
−→ λ, we have g
n
(λ) > 0 for λ ∈ (0, 0.5) if
n is large enough. The independence of Z
t
together with the definition
of g
n
and the orthogonality equations in Lemma 3.1.2 imply
Var(A
n
(λ)) = Var(A
n
(g
n
(λ)))
= σ
2
2
n
n

t=1
cos
2
(2πg
n
(λ)t) = σ
2
.
For ε > 0 we have
1
n
n

t=1
E
_
Z
2
t
cos
2
(2πg
n
(λ)t)1
{|Z
t
cos(2πg
n
(λ)t)|>ε


2
}
_

1
n
n

t=1
E
_
Z
2
t
1
{|Z
t
|>ε


2
}
_
= E
_
Z
2
1
1
{|Z
1
|>ε


2
}
_
n→∞
−→ 0
i.e., the triangular array (2/n)
1/2
Z
t
cos(2πg
n
(λ)t), 1 ≤ t ≤ n, n ∈
N, satisfies the Lindeberg condition implying A
n
(λ) −→
D
N(0, σ
2
);
see, for example, Theorem 7.2 in Billingsley (1968). Similarly one
shows that B
n
(λ) −→
D
N(0, σ
2
) as well. Since the random vector
(A
n

1
), B
n

1
), . . . , A
n

r
), B
n

r
)) has by (3.2) the covariance ma-
trix σ
2
I
2r
, its asymptotic joint normality follows easily by applying
the Cram´er-Wold device, cf. Theorem 7.7 in Billingsley (1968), and
proceeding as before. This proves part (i) of Theorem 5.2.2.
From the definition of the periodogram in (5.1) we conclude as in the
proof of Theorem 3.2.3
I
n
(k/n) =
1
n
n

s=1
n

t=1
Z
s
Z
t
e
−i2π(k/n)(s−t)
194 Statistical Analysis in the Frequency Domain
and, thus, we obtain
E(I
n
(j/n)I
n
(k/n))
=
1
n
2
n

s=1
n

t=1
n

u=1
n

v=1
E(Z
s
Z
t
Z
u
Z
v
)e
−i2π(j/n)(s−t)
e
−i2π(k/n)(u−v)
.
We have
E(Z
s
Z
t
Z
u
Z
v
) =
_
¸
_
¸
_
ησ
4
, s = t = u = v
σ
4
, s = t = u = v, s = u = t = v, s = v = t = u
0 elsewhere
and
e
−i2π(j/n)(s−t)
e
−i2π(k/n)(u−v)
=
_
¸
_
¸
_
1, s = t, u = v
e
−i2π((j+k)/n)s
e
i2π((j+k)/n)t
, s = u, t = v
e
−i2π((j−k)/n)s
e
i2π((j−k)/n)t
, s = v, t = u.
This implies
E(I
n
(j/n)I
n
(k/n))
=
ησ
4
n
+
σ
4
n
2
_
n(n −1) +
¸
¸
¸
n

t=1
e
−i2π((j+k)/n)t
¸
¸
¸
2
+
¸
¸
¸
n

t=1
e
−i2π((j−k)/n)t
¸
¸
¸
2
−2n
_
=
(η −3)σ
4
n

4
_
1 +
1
n
2
¸
¸
¸
n

t=1
e
i2π((j+k)/n)t
¸
¸
¸
2
+
1
n
2
¸
¸
¸
n

t=1
e
i2π((j−k)/n)t
¸
¸
¸
2
_
.
From E(I
n
(k/n)) = n
−1

n
t=1
E(Z
2
t
) = σ
2
we finally obtain
Cov(I
n
(j/n), I
n
(k/n))
=
(η −3)σ
4
n
+
σ
4
n
2

¸
¸
n

t=1
e
i2π((j+k)/n)t
¸
¸
¸
2
+
¸
¸
¸
n

t=1
e
i2π((j−k)/n)t
¸
¸
¸
2
_
,
from which (5.7) and (5.8) follow by using (3.5)).
Remark 5.2.3. Theorem 5.2.2 can be generalized to filtered processes
Y
t
=

u∈Z
a
u
Z
t−u
, with (Z
t
)
t∈Z
as in Theorem 5.2.2. In this case one
has to replace σ
2
, which equals by Example 4.1.3 the constant spectral
5.2 Estimating Spectral Densities 195
density f
Z
(λ), in (i) by the spectral density f
Y

i
), 1 ≤ i ≤ r. If in
addition

u∈Z
|a
u
||u|
1/2
< ∞, then we have in (ii) the expansions
Var(I
n
(k/n)) =
_
2f
2
Y
(k/n) +O(n
−1/2
), k = 0 or k = n/2, if n is even
f
2
Y
(k/n) +O(n
−1/2
) elsewhere,
and
Cov(I
n
(j/n), I
n
(k/n)) = O(n
−1
), j = k,
where I
n
is the periodogram pertaining to Y
1
, . . . , Y
n
. The above terms
O(n
−1/2
) and O(n
−1
) are uniformly bounded in k and j by a constant
C. We omit the highly technical proof and refer to Section 10.3 of
Brockwell and Davis (1991).
Recall that the class of processes Y
t
=

u∈Z
a
u
Z
t−u
is a fairly rich
one, which contains in particular ARMA-processes, see Section 2.2
and Remark 2.1.12.
Discrete Spectral Average Estimator
The preceding results show that the periodogram is not a consistent
estimator of the spectral density. The law of large numbers together
with the above remark motivates, however, that consistency can be
achieved for a smoothed version of the periodogram such as a simple
moving average

|j|≤m
1
2m+ 1
I
n
_
k +j
n
_
,
which puts equal weights 1/(2m + 1) on adjacent values. Dropping
the condition of equal weights, we define a general linear smoother by
the linear filter
ˆ
f
n
_
k
n
_
:=

|j|≤m
a
jn
I
n
_
k +j
n
_
. (5.9)
The sequence m = m(n), defining the adjacent points of k/n, has to
satisfy
m
n→∞
−→ ∞ and m/n
n→∞
−→ 0, (5.10)
and the weights a
jn
have the properties
(a) a
jn
≥ 0,
196 Statistical Analysis in the Frequency Domain
(b) a
jn
= a
−jn
,
(c)

|j|≤m
a
jn
= 1,
(d)

|j|≤m
a
2
jn
n→∞
−→ 0.
(5.11)
For the simple moving average we have, for example
a
jn
=
_
1/(2m+ 1), |j| ≤ m
0 elsewhere
and

|j|≤m
a
2
jn
= 1/(2m + 1)
n→∞
−→ 0. For λ ∈ [0, 0.5] we put I
n
(0.5 +
λ) := I
n
(0.5 − λ), which defines the periodogram also on [0.5, 1].
If (k + j)/n is outside of the interval [0, 1], then I
n
((k + j)/n) is
understood as the periodic extension of I
n
with period 1. This also
applies to the spectral density f. The estimator
ˆ
f
n
(λ) :=
ˆ
f
n
(g
n
(λ)),
with g
n
as defined in (5.5), is called the discrete spectral average esti-
mator. The following result states its consistency for linear processes.
Theorem 5.2.4. Let Y
t
=

u∈Z
b
u
Z
t−u
, t ∈ Z, where Z
t
are iid with
E(Z
t
) = 0, E(Z
4
t
) < ∞ and

u∈Z
|b
u
||u|
1/2
< ∞. Then we have for
0 ≤ µ, λ ≤ 0.5
(i) lim
n→∞
E
_
ˆ
f
n
(λ)
_
= f(λ),
(ii) lim
n→∞
Cov
_
ˆ
f
n
(λ),
ˆ
f
n
(µ)
_
_

|j|≤m
a
2
jn
_
=
_
¸
_
¸
_
2f
2
(λ), λ = µ = 0 or 0.5
f
2
(λ), 0 < λ = µ < 0.5
0, λ = µ.
Condition (d) in (5.11) on the weights together with (ii) in the preced-
ing result entails that Var(
ˆ
f
n
(λ))
n→∞
−→ 0 for any λ ∈ [0, 0.5]. Together
with (i) we, therefore, obtain that the mean squared error of
ˆ
f
n
(λ)
vanishes asymptotically:
MSE(
ˆ
f
n
(λ)) = E
_
_
ˆ
f
n
(λ)−f(λ)
_
2
_
= Var(
ˆ
f
n
(λ))+Bias
2
(
ˆ
f
n
(λ)) −→
n→∞
0.
5.2 Estimating Spectral Densities 197
Proof. By the definition of the spectral density estimator in (5.9) we
have
| E(
ˆ
f
n
(λ)) −f(λ)| =
¸
¸
¸

|j|≤m
a
jn
_
E
_
I
n
(g
n
(λ) +j/n)) −f(λ)

¸
¸
=
¸
¸
¸

|j|≤m
a
jn
_
E
_
I
n
(g
n
(λ) +j/n)) −f(g
n
(λ) +j/n)
+f(g
n
(λ) +j/n) −f(λ)

¸
¸,
where (5.10) together with the uniform convergence of g
n
(λ) to λ
implies
max
|j|≤m
|g
n
(λ) +j/n −λ|
n→∞
−→ 0.
Choose ε > 0. The spectral density f of the process (Y
t
) is continuous
(Exercise 5.16), and hence we have
max
|j|≤m
|f(g
n
(λ) +j/n) −f(λ)| < ε/2
if n is sufficiently large. From Theorem 5.2.1 we know that in the case
E(Y
t
) = 0
max
|j|≤m
| E(I
n
(g
n
(λ) +j/n)) −f(g
n
(λ) +j/n)| < ε/2
if n is large. Condition (c) in (5.11) together with the triangular
inequality implies | E(
ˆ
f
n
(λ)) −f(λ)| < ε for large n. This implies part
(i).
From the definition of
ˆ
f
n
we obtain
Cov(
ˆ
f
n
(λ),
ˆ
f
n
(µ))
=

|j|≤m

|k|≤m
a
jn
a
kn
Cov
_
I
n
(g
n
(λ) +j/n), I
n
(g
n
(µ) +k/n)
_
.
If λ = µ and n sufficiently large we have g
n
(λ) + j/n = g
n
(µ) + k/n
for arbitrary |j|, |k| ≤ m. According to Remark 5.2.3 there exists a
198 Statistical Analysis in the Frequency Domain
universal constant C
1
> 0 such that
| Cov(
ˆ
f
n
(λ),
ˆ
f
n
(µ))| =
¸
¸
¸

|j|≤m

|k|≤m
a
jn
a
kn
O(n
−1
)
¸
¸
¸
≤ C
1
n
−1
_

|j|≤m
a
jn
_
2
≤ C
1
2m+ 1
n

|j|≤m
a
2
jn
,
where the final inequality is an application of the Cauchy–Schwarz
inequality. Since m/n
n→∞
−→ 0, we have established (ii) in the case
λ = µ. Suppose now 0 < λ = µ < 0.5. Utilizing again Remark 5.2.3
we have
Var(
ˆ
f
n
(λ)) =

|j|≤m
a
2
jn
_
f
2
(g
n
(λ) +j/n) +O(n
−1/2
)
_
+

|j|≤m

|k|≤m
a
jn
a
kn
O(n
−1
) +o(n
−1
)
=: S
1
(λ) +S
2
(λ) +o(n
−1
).
Repeating the arguments in the proof of part (i) one shows that
S
1
(λ) =
_

|j|≤m
a
2
jn
_
f
2
(λ) +o
_

|j|≤m
a
2
jn
_
.
Furthermore, with a suitable constant C
2
> 0, we have
|S
2
(λ)| ≤ C
2
1
n
_

|j|≤m
a
jn
_
2
≤ C
2
2m+ 1
n

|j|≤m
a
2
jn
.
Thus we established the assertion of part (ii) also in the case 0 < λ =
µ < 0.5. The remaining cases λ = µ = 0 and λ = µ = 0.5 are shown
in a similar way (Exercise 5.17).
The preceding result requires zero mean variables Y
t
. This might, how-
ever, be too restrictive in practice. Due to (3.5), the periodograms
of (Y
t
)
1≤t≤n
, (Y
t
− µ)
1≤t≤n
and (Y
t

¯
Y )
1≤t≤n
coincide at Fourier fre-
quencies different from zero. At frequency λ = 0, however, they
5.2 Estimating Spectral Densities 199
will differ in general. To estimate f(0) consistently also in the case
µ = E(Y
t
) = 0, one puts
ˆ
f
n
(0) := a
0n
I
n
(1/n) + 2
m

j=1
a
jn
I
n
_
(1 +j)/n
_
. (5.12)
Each time the value I
n
(0) occurs in the moving average (5.9), it is
replaced by
ˆ
f
n
(0). Since the resulting estimator of the spectral density
involves only Fourier frequencies different from zero, we can assume
without loss of generality that the underlying variables Y
t
have zero
mean.
Example 5.2.5. (Sunspot Data). We want to estimate the spectral
density underlying the Sunspot Data. These data, also known as the
Wolf or W¨ olfer (a student of Wolf) Data, are the yearly sunspot num-
bers between 1749 and 1924. For a discussion of these data and further
literature we refer to Wei (1990), Example 6.2. Plot 5.2.1a shows the
pertaining periodogram and Plot 5.2.1b displays the discrete spectral
average estimator with weights a
0n
= a
1n
= a
2n
= 3/21, a
3n
= 2/21
and a
4n
= 1/21, n = 176. These weights pertain to a simple mov-
ing average of length 3 of a simple moving average of length 7. The
smoothed version joins the two peaks close to the frequency λ = 0.1
visible in the periodogram. The observation that a periodogram has
the tendency to split a peak is known as the leakage phenomenon.
200 Statistical Analysis in the Frequency Domain
Plot 5.2.1a: Periodogram for Sunspot Data.
Plot 5.2.1b: Discrete spectral average estimate for Sunspot Data.
1 /* sunspot_dsae.sas */
5.2 Estimating Spectral Densities 201
2 TITLE1 ’Periodogram and spectral density estimate ’;
3 TITLE2 ’Woelfer Sunspot Data ’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\sunspot.txt ’;
8 INPUT num @@;
9
10 /* Computation of peridogram and estimation of spectral density */
11 PROC SPECTRA DATA=data1 P S OUT=data2;
12 VAR num;
13 WEIGHTS 1 2 3 3 3 2 1;
14
15 /* Adjusting different periodogram definitions */
16 DATA data3;
17 SET data2(FIRSTOBS =2);
18 lambda=FREQ /(2* CONSTANT(’PI ’));
19 p=P_01 /2;
20 s=S_01 /2*4* CONSTANT(’PI ’);
21
22 /* Graphical options */
23 SYMBOL1 I=JOIN C=RED V=NONE L=1;
24 AXIS1 LABEL=(F=CGREEK ’l’) ORDER =(0 TO .5 BY .05);
25 AXIS2 LABEL=NONE;
26
27 /* Plot periodogram and estimated spectral density */
28 PROC GPLOT DATA=data3;
29 PLOT p*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
30 PLOT s*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
31 RUN; QUIT;
Program 5.2.1: Periodogram and discrete spectral average estimate
for Sunspot Data.
In the DATA step the data of the sunspots are
read into the variable num.
Then PROC SPECTRA is applied to this variable,
whereby the options P (see Program 5.1.1,
airline whitenoise.sas) and S generate a data
set stored in data2 containing the periodogram
data and the estimation of the spectral density
which SAS computes with the weights given in
the WEIGHTS statement. Note that SAS auto-
matically normalizes these weights.
In following DATA step the slightly different de-
finition of the periodogram by SAS is being
adjusted to the definition used here (see Pro-
gram 3.2.1, star periodogram.sas). Both plots
are then printed with PROC GPLOT.
A mathematically convenient way to generate weights a
jn
, which sat-
isfy the conditions (5.11), is the use of a kernel function. Let K :
[−1, 1] → [0, ∞) be a symmetric function i.e., K(−x) = K(x), x ∈
[−1, 1], which satisfies
_
1
−1
K
2
(x) dx < ∞. Let now m = m(n)
n→∞
−→ ∞
202 Statistical Analysis in the Frequency Domain
be an arbitrary sequence of integers with m/n
n→∞
−→ 0 and put
a
jn
:=
K(j/m)

m
i=−m
K(i/m)
, −m ≤ j ≤ m. (5.13)
These weights satisfy the conditions (5.11) (Exercise 5.18).
Take for example K(x) := 1 −|x|, −1 ≤ x ≤ 1. Then we obtain
a
jn
=
m−|j|
m
2
, −m ≤ j ≤ m.
Example 5.2.6. (i) The truncated kernel is defined by
K
T
(x) =
_
1, |x| ≤ 1
0 elsewhere.
(ii) The Bartlett or triangular kernel is given by
K
B
(x) :=
_
1 −|x|, |x| ≤ 1
0 elsewhere.
(iii) The Blackman–Tukey kernel (1959) is defined by
K
BT
(x) =
_
1 −2a + 2a cos(x), |x| ≤ 1
0 elsewhere,
where 0 < a ≤ 1/4. The particular choice a = 1/4 yields the Tukey–
Hanning kernel .
(iv) The Parzen kernel (1957) is given by
K
P
(x) :=
_
¸
_
¸
_
1 −6|x|
2
+ 6|x|
3
, |x| < 1/2
2(1 −|x|)
3
, 1/2 ≤ |x| ≤ 1
0 elsewhere.
We refer to Andrews (1991) for a discussion of these kernels.
Example 5.2.7. We consider realizations of the MA(1)-process Y
t
=
ε
t
−0.6ε
t−1
with ε
t
independent and standard normal for t = 1, . . . , n =
160. Example 4.3.2 implies that the process (Y
t
) has the spectral
density f(λ) = 1 − 1.2 cos(2πλ) + 0.36. We estimate f(λ) by means
of the Tukey–Hanning kernel.
5.2 Estimating Spectral Densities 203
Plot 5.2.2a: Discrete spectral average estimator (broken line) with
Blackman–Tukey kernel with parameters r = 10, a = 1/4 and under-
lying spectral density f(λ) = 1 −1.2 cos(2πλ) + 0.36 (solid line).
1 /* ma1_blackman_tukey.sas */
2 TITLE1 ’Spectral density and Blackman -Tukey estimator ’;
3 TITLE2 ’of MA(1)-process ’;
4
5 /* Generate MA(1)-process */
6 DATA data1;
7 DO t=0 TO 160;
8 e=RANNOR (1);
9 y=e-.6* LAG(e);
10 OUTPUT;
11 END;
12
13 /* Estimation of spectral density */
14 PROC SPECTRA DATA=data1(FIRSTOBS =2) S OUT=data2;
15 VAR y;
16 WEIGHTS TUKEY 10 0;
17 RUN;
18
19 /* Adjusting different definitions */
20 DATA data3;
204 Statistical Analysis in the Frequency Domain
21 SET data2;
22 lambda=FREQ /(2* CONSTANT(’PI ’));
23 s=S_01 /2*4* CONSTANT(’PI ’);
24
25 /* Compute underlying spectral density */
26 DATA data4;
27 DO l=0 TO .5 BY .01;
28 f=1 -1.2* COS(2* CONSTANT(’PI ’)*l)+.36;
29 OUTPUT;
30 END;
31
32 /* Merge the data sets */
33 DATA data5;
34 MERGE data3(KEEP=s lambda) data4;
35
36 /* Graphical options */
37 AXIS1 LABEL=NONE;
38 AXIS2 LABEL=(F=CGREEK ’l’) ORDER =(0 TO .5 BY .1);
39 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
40 SYMBOL2 I=JOIN C=RED V=NONE L=3;
41
42 /* Plot underlying and estimated spectral density */
43 PROC GPLOT DATA=data5;
44 PLOT f*l=1 s*lambda =2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
45 RUN; QUIT;
Program 5.2.2: Computing discrete spectral average estimator with
Blackman–Tukey kernel.
In the first DATA step the realizations of an
MA(1)-process with the given parameters are
created. Thereby the function RANNOR, which
generates standard normally distributed data,
and LAG, which accesses the value of e of the
preceding loop, are used.
As in Program 5.2.1 (sunspot dsae.sas) PROC
SPECTRA computes the estimator of the spec-
tral density (after dropping the first observa-
tion) by the option S and stores them in data2.
The weights used here come from the Tukey–
Hanning kernel with a specified bandwidth of
m = 10. The second number after the TUKEY
option can be used to refine the choice of the
bandwidth. Since this is not needed here it is
set to 0.
The next DATA step adjusts the different
definitions of the spectral density used
here and by SAS (see Program 3.2.1,
star periodogram.sas). The following DATA step
generates the values of the underlying spec-
tral density. These are merged with the values
of the estimated spectral density and then dis-
played by PROC GPLOT.
Confidence Intervals for Spectral Densities
The random variables I
n
((k+j)/n)/f((k+j)/n), 0 < k+j < n/2, will
by Remark 5.2.3 for large n approximately behave like independent
5.2 Estimating Spectral Densities 205
and standard exponential distributed random variables X
j
. This sug-
gests that the distribution of the discrete spectral average estimator
ˆ
f(k/n) =

|j|≤m
a
jn
I
n
_
k +j
n
_
can be approximated by that of the weighted sum

|j|≤m
a
jn
X
j
f((k+
j)/n). Tukey (1949) showed that the distribution of this weighted sum
can in turn be approximated by that of cY with a suitably chosen
c > 0, where Y follows a gamma distribution with parameters p :=
ν/2 > 0 and b = 1/2 i.e.,
P{Y ≤ t} =
b
p
Γ(p)
_
t
0
x
p−1
exp(−bx) dx, t ≥ 0,
where Γ(p) :=
_

0
x
p−1
exp(−x) dx denotes the gamma function. The
parameters ν and c are determined by the method of moments as
follows: ν and c are chosen such that cY has mean f(k/n) and its
variance equals the leading term of the variance expansion of
ˆ
f(k/n)
in Theorem 5.2.4 (Exercise 5.21):
E(cY ) = cν = f(k/n),
Var(cY ) = 2c
2
ν = f
2
(k/n)

|j|≤m
a
2
jn
.
The solutions are obviously
c =
f(k/n)
2

|j|≤m
a
2
jn
and
ν =
2

|j|≤m
a
2
jn
.
Note that the gamma distribution with parameters p = ν/2 and
b = 1/2 equals the χ
2
-distribution with ν degrees of freedom if ν
206 Statistical Analysis in the Frequency Domain
is an integer. The number ν is, therefore, called the equivalent de-
gree of freedom. Observe that ν/f(k/n) = 1/c; the random vari-
able ν
ˆ
f(k/n)/f(k/n) =
ˆ
f(k/n)/c now approximately follows a χ
2
(ν)-
distribution with the convention that χ
2
(ν) is the gamma distribution
with parameters p = ν/2 and b = 1/2 if ν is not an integer. The in-
terval
_
ν
ˆ
f(k/n)
χ
2
1−α/2
(ν)
,
ν
ˆ
f(k/n)
χ
2
α/2
(ν)
_
(5.14)
is a confidence interval for f(k/n) of approximate level 1 − α, α ∈
(0, 1). By χ
2
q
(ν) we denote the q-quantile of the χ
2
(ν)-distribution
i.e., P{Y ≤ χ
2
q
(ν)} = q, 0 < q < 1. Taking logarithms in (5.14), we
obtain the confidence interval for log(f(k/n))
C
ν,α
(k/n) :=
_
log(
ˆ
f(k/n)) + log(ν) −log(χ
2
1−α/2
(ν)),
log(
ˆ
f(k/n)) + log(ν) −log(χ
2
α/2
(ν))
_
.
This interval has constant length log(χ
2
1−α/2
(ν)/χ
2
α/2
(ν)). Note that
C
ν,α
(k/n) is a level (1 − α)-confidence interval only for log(f(λ)) at
a fixed Fourier frequency λ = k/n, with 0 < k < [n/2], but not
simultaneously for λ ∈ (0, 0.5).
Example 5.2.8. In continuation of Example 5.2.7 we want to esti-
mate the spectral density f(λ) = 1−1.2 cos(2πλ)+0.36 of the MA(1)-
process Y
t
= ε
t
−0.6ε
t−1
using the discrete spectral average estimator
ˆ
f
n
(λ) with the weights 1, 3, 6, 9, 12, 15, 18, 20, 21, 21, 21, 20, 18, 15,
12, 9, 6, 3, 1, each divided by 231. These weights are generated by
iterating simple moving averages of lengths 3, 7 and 11. Plot 5.2.3a
displays the logarithms of the estimates, of the true spectral density
and the pertaining confidence intervals.
5.2 Estimating Spectral Densities 207
Plot 5.2.3a: Logarithms of discrete spectral average estimates (broken
line), of spectral density f(λ) = 1 −1.2 cos(2πλ) +0.36 (solid line) of
MA(1)-process Y
t
= ε
t
− 0.6ε
t−1
, t = 1, . . . , n = 160, and confidence
intervals of level 1 −α = 0.95 for log(f(k/n)).
1 /* ma1_logdsae.sas */
2 TITLE1 ’Logarithms of spectral density ,’;
3 TITLE2 ’of their estimates and confidence intervals ’;
4 TITLE3 ’of MA(1)-process ’;
5
6 /* Generate MA(1)-process */
7 DATA data1;
8 DO t=0 TO 160;
9 e=RANNOR (1);
10 y=e-.6* LAG(e);
11 OUTPUT;
12 END;
13
14 /* Estimation of spectral density */
15 PROC SPECTRA DATA=data1(FIRSTOBS =2) S OUT=data2;
16 VAR y; WEIGHTS 1 3 6 9 12 15 18 20 21 21 21 20 18 15 12 9 6 3 1;
17 RUN;
18
19 /* Adjusting different definitions and computation of confidence
→bands */
20 DATA data3; SET data2;
21 lambda=FREQ /(2* CONSTANT(’PI ’));
22 log_s_01=LOG(S_01 /2*4* CONSTANT(’PI ’));
208 Chapter 5. Statistical Analysis in the Frequency Domain
23 nu =2/(3763/53361);
24 c1=log_s_01+LOG(nu)-LOG(CINV (.975,nu));
25 c2=log_s_01+LOG(nu)-LOG(CINV (.025,nu));
26
27 /* Compute underlying spectral density */
28 DATA data4;
29 DO l=0 TO .5 BY 0.01;
30 log_f=LOG ((1 -1.2* COS(2* CONSTANT(’PI ’)*l)+.36));
31 OUTPUT;
32 END;
33
34 /* Merge the data sets */
35 DATA data5;
36 MERGE data3(KEEP=log_s_01 lambda c1 c2) data4;
37
38 /* Graphical options */
39 AXIS1 LABEL=NONE;
40 AXIS2 LABEL=(F=CGREEK ’l’) ORDER =(0 TO .5 BY .1);
41 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
42 SYMBOL2 I=JOIN C=RED V=NONE L=2;
43 SYMBOL3 I=JOIN C=GREEN V=NONE L=33;
44
45 /* Plot underlying and estimated spectral density */
46 PROC GPLOT DATA=data5;
47 PLOT log_f*l=1 log_s_01*lambda =2 c1*lambda =3 c2*lambda =3 /
→OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
48 RUN; QUIT;
Program 5.2.3: Computation of logarithms of discrete spectral average
estimates, of spectral density and confidence intervals.
This program starts identically to Program 5.2.2
(ma1 blackman tukey.sas) with the generation
of an MA(1)-process and of the computation
the spectral density estimator. Only this time
the weights are directly given to SAS.
In the next DATA step the usual adjustment of
the frequencies is done. This is followed by the
computation of ν according to its definition. The
logarithm of the confidence intervals is calcu-
lated with the help of the function CINV which
returns quantiles of a χ
2
-distribution with ν de-
grees of freedom.
The rest of the program which displays the
logarithm of the estimated spectral density,
of the underlying density and of the confi-
dence intervals is analogous to Program 5.2.2
(ma1 blackman tukey.sas).
Exercises
5.1. For independent random variables X, Y having continuous dis-
tribution functions it follows that P{X = Y } = 0. Hint: Fubini’s
theorem.
Exercises 209
5.2. Let X
1
, . . . , X
n
be iid random variables with values in R and
distribution function F. Denote by X
1:n
≤ · · · ≤ X
n:n
the pertaining
order statistics. Then we have
P{X
k:n
≤ t} =
n

j=k
_
n
j
_
F(t)
j
(1 −F(t))
n−j
, t ∈ R.
The maximum X
n:n
has in particular the distribution function F
n
,
and the minimum X
1:n
has distribution function 1 −(1 −F)
n
. Hint:
{X
k:n
≤ t} =
_
n
j=1
1
(−∞,t]
(X
j
) ≥ k
_
.
5.3. Suppose in addition to the conditions in Exercise 5.2 that F has
a (Lebesgue) density f. The ordered vector (X
1:n
, . . . , X
n:n
) then has
a density
f
n
(x
1
, . . . , x
n
) = n!
n

j=1
f(x
j
), x
1
< · · · < x
n
,
and zero elsewhere. Hint: Denote by S
n
the group of permutations
of {1, . . . , n} i.e., (τ(1), . . . , (τ(n)) with τ ∈ S
n
is a permutation of
(1, . . . , n). Put for τ ∈ S
n
the set B
τ
:= {X
τ(1)
< · · · < X
τ(n)
}. These
sets are disjoint and we have P(

τ∈S
n
B
τ
) = 1 since P{X
j
= X
k
} = 0
for i = j (cf. Exercise 5.1).
5.4. (i) Let X and Y be independent, standard normal distributed
random variables. Show that the vector (X, Z)
T
:= (X, ρX+
_
1 −ρ
2
Y )
T
,
−1 < ρ < 1, is normal distributed with mean vector (0, 0) and co-
variance matrix
_
1 ρ
ρ 1
_
, and that X and Z are independent if and
only if they are uncorrelated (i.e., ρ = 0).
(ii) Suppose that X and Y are normal distributed and uncorrelated.
Does this imply the independence of X and Y ? Hint: Let X N(0, 1)-
distributed and define the random variable Y = V X with V inde-
pendent of X and P{V = −1} = 1/2 = P{V = 1}.
5.5. Generate 100 independent and standard normal random variables
ε
t
and plot the periodogram. Is the hypothesis that the observations
210 Chapter 5. Statistical Analysis in the Frequency Domain
were generated by a white noise rejected at level α = 0.05(0.01)? Visu-
alize the Bartlett-Kolmogorov-Smirnov test by plotting the empirical
distribution function of the cumulated periodograms S
j
, 1 ≤ j ≤ 48,
together with the pertaining bands for α = 0.05 and α = 0.01
y = x ±
c
α

m−1
, x ∈ (0, 1).
5.6. Generate the values
Y
t
= cos
_

1
6
t
_

t
, t = 1, . . . , 300,
where ε
t
are independent and standard normal. Plot the data and the
periodogram. Is the hypothesis Y
t
= ε
t
rejected at level α = 0.01?
5.7. (Share Data) Test the hypothesis that the share data were gener-
ated by independent and identically normal distributed random vari-
ables and plot the periodogramm. Plot also the original data.
5.8. (Kronecker’s lemma) Let (a
j
)
j≥0
be an absolute summable com-
plexed valued filter. Show that lim
n→∞

n
j=0
(j/n)|a
j
| = 0.
5.9. The normal distribution N(0, σ
2
) satisfies
(i)
_
x
2k+1
dN(0, σ
2
)(x) = 0, k ∈ N ∪ {0}.
(ii)
_
x
2k
dN(0, σ
2
)(x) = 1 · 3 · · · · · (2k −1)σ
2k
, k ∈ N.
(iii)
_
|x|
2k+1
dN(0, σ
2
)(x) =
2
k+1


k!σ
2k+1
, k ∈ N ∪ {0}.
5.10. Show that a χ
2
(ν)-distributed random variable satisfies E(Y ) =
ν and Var(Y ) = 2ν. Hint: Exercise 5.9.
5.11. (Slutzky’s lemma) Let X, X
n
, n ∈ N, be random variables
in R with distribution functions F
X
and F
X
n
, respectively. Suppose
that X
n
converges in distribution to X (denoted by X
n

D
X) i.e.,
F
X
n
(t) → F
X
(t) for every continuity point of F
X
as n → ∞. Let
Y
n
, n ∈ N, be another sequence of random variables which converges
stochastically to some constant c ∈ R, i.e., lim
n→∞
P{|Y
n
−c| > ε} = 0
for arbitrary ε > 0. This implies
Exercises 211
(i) X
n
+Y
n

D
X +c.
(ii) X
n
Y
n

D
cX.
(iii) X
n
/Y
n

D
X/c, if c = 0.
This entails in particular that stochastic convergence implies conver-
gence in distribution. The reverse implication is not true in general.
Give an example.
5.12. Show that the distribution function F
m
of Fisher’s test statistic
κ
m
satisfies under the condition of independent and identically normal
observations ε
t
F
m
(x+ln(m)) = P{κ
m
≤ x+ln(m)}
m→∞
−→ exp(−e
−x
) =: G(x), x ∈ R.
The limiting distribution G is known as the Gumbel distribution.
Hence we have P{κ
m
> x} = 1 − F
m
(x) ≈ 1 − exp(−me
−x
). Hint:
Exercise 5.2 and 5.11.
5.13. Which effect has an outlier on the periodogram? Check this for
the simple model (Y
t
)
t,...,n
(t
0
∈ {1, . . . , n})
Y
t
=
_
ε
t
, t = t
0
ε
t
+c, t = t
0
,
where the ε
t
are independent and identically normal N(0, σ
2
)-distributed
and c = 0 is an arbitrary constant. Show to this end
E(I
Y
(k/n)) = E(I
ε
(k/n)) +c
2
/n
Var(I
Y
(k/n)) = Var(I
ε
(k/n)) + 2c
2
σ
2
/n, k = 1, . . . , [(n −1)/2].
5.14. Suppose that U
1
, . . . , U
n
are uniformly distributed on (0, 1) and
let
ˆ
F
n
denote the pertaining empirical distribution function. Show
that
sup
0≤x≤1
|
ˆ
F
n
(x) −x| = max
1≤k≤n
_
max
_
U
k:n

(k −1)
n
,
k
n
−U
k:n
_
_
.
212 Chapter 5. Statistical Analysis in the Frequency Domain
5.15. (Monte Carlo Simulation) For m large we have under the hy-
pothesis
P{

m−1∆
m−1
> c
α
} ≈ α.
For different values of m(> 30) generate 1000 times the test statistic

m−1∆
m−1
based on independent random variables and check, how
often this statistic exceeds the critical values c
0.05
= 1.36 and c
0.01
=
1.63. Hint: Exercise 5.14.
5.16. In the situation of Theorem 5.2.4 show that the spectral density
f of (Y
t
)
t
is continuous.
5.17. Complete the proof of Theorem 5.2.4 (ii) for the remaining cases
λ = µ = 0 and λ = µ = 0.5.
5.18. Verify that the weights (5.13) defined via a kernel function sat-
isfy the conditions (5.11).
5.19. Use the IML function ARMASIM to simulate the process
Y
t
= 0.3Y
t−1

t
−0.5ε
t−1
, 1 ≤ t ≤ 150,
where ε
t
are independent and standard normal. Plot the periodogram
and estimates of the log spectral density together with confidence
intervals. Compare the estimates with the log spectral density of
(Y
t
)
t∈Z
.
5.20. Compute the distribution of the periodogram I
ε
(1/2) for inde-
pendent and identically normal N(0, σ
2
)-distributed random variables
ε
1
, . . . , ε
n
in case of an even sample size n.
5.21. Suppose that Y follows a gamma distribution with parameters
p and b. Calculate the mean and the variance of Y .
5.22. Compute the length of the confidence interval C
ν,α
(k/n) for
fixed α (preferably α = 0.05) but for various ν. For the calculation of
ν use the weights generated by the kernel K(x) = 1−|x|, −1 ≤ x ≤ 1
(see equation (5.13).
Exercises 213
5.23. Show that for y ∈ R
1
n
¸
¸
¸
n−1

t=0
e
i2πyt
¸
¸
¸
2
=

|t|<n
_
1 −
|t|
n
_
e
i2πyt
= K
n
(y),
where
K
n
(y) =
_
_
_
n, y ∈ Z
1
n
_
sin(πyn)
sin(πy)
_
2
, y / ∈ Z,
is the Fejer kernel of order n. Verify that it has the properties
(i) K
n
(y) ≥ 0,
(ii) the Fejer kernel is a periodic function of period length one,
(iii) K
n
(y) = K
n
(−y),
(iv)
_
0.5
−0.5
K
n
(y) dy = 1,
(v)
_
δ
−δ
K
n
(y) dy
n→∞
−→ 1, δ > 0.
5.24. (Nile Data) Between 715 and 1284 the river Nile had its lowest
annual minimum levels. These data are among the longest time series
in hydrology. Can the trend removed Nile Data be considered as being
generated by a white noise, or are these hidden periodicities? Estimate
the spectral density in this case. Use discrete spectral estimators as
well as lag window spectral density estimators. Compare with the
spectral density of an AR(1)-process.
214 Chapter 5. Statistical Analysis in the Frequency Domain
Bibliography
[1] Andrews, D.W.K. (1991). Heteroscedasticity and autocorrelation consistent
covariance matrix estimation. Econometrica 59, 817–858.
[2] Billingsley, P. (1968). Convergence of Probability Measures. John Wiley, New
York.
[3] Bloomfield, P. (1976). Fourier Analysis of Time Series: An Introduction. John
Wiley, New York.
[4] Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity.
J. Econometrics 31, 307–327.
[5] Box, G.E.P. and Cox, D.R. (1964). An analysis of transformations (with dis-
cussion). J.R. Stat. Sob. B. 26, 211–252.
[6] Box, G.E.P. and Jenkins, G.M. (1976). Times Series Analysis: Forecasting and
Control. Rev. ed. Holden–Day, San Francisco.
[7] Box, G.E.P. and Pierce, D.A. (1970). Distribution of residual autocorrelation in
autoregressive-integrated moving average time series models. J. Amex. Statist.
Assoc. 65, 1509–1526.
[8] Box, G.E.P. and Tiao, G.C. (1977). A canonical analysis of multiple time series.
Biometrika 64, 355–365.
[9] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods.
2nd ed. Springer Series in Statistics, Springer, New York.
[10] Brockwell, P.J. and Davis, R.A. (2002). Introduction to Time Series and Fore-
casting. 2nd ed. Springer Texts in Statistics, Springer, New York.
[11] Cleveland, W.P. and Tiao, G.C. (1976). Decomposition of seasonal time series:
a model for the census X–11 program. J. Amex. Statist. Assoc. 71, 581–587.
[12] Conway, J.B. (1975). Functions of One Complex Variable, 2nd. printing.
Springer, New York.
[13] Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with esti-
mates of the variance of UK inflation. Econometrica 50, 987–1007.
216 BIBLIOGRAPHY
[14] Engle, R.F. and Granger, C.W.J. (1987). Co–integration and error correction:
representation, estimation and testing. Econometrica 55, 251–276.
[15] Falk, M., Marohn, F. and Tewes, B. (2002). Foundations of Statistical Analyses
and Applications with SAS. Birkh¨ auser, Basel.
[16] Fan, J. and Gijbels, I. (1996). Local Polynominal Modeling and Its Application
— Theory and Methodologies. Chapman and Hall, New York.
[17] Feller, W. (1971). An Introduction to Probability Theory and Its Applications.
Vol. II, 2nd. ed. John Wiley, New York.
[18] Fuller, W.A. (1976). Introduction to Statistical Time Series. John Wiley, New
York.
[19] Granger, C.W.J. (1981). Some properties of time series data and their use in
econometric model specification. J. Econometrics 16, 121–130.
[20] Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press,
Princeton.
[21] Hannan, E.J. and Quinn, B.G. (1979). The determination of the order of an
autoregression. J.R. Statist. Soc. B 41, 190–195.
[22] Janacek, G. and Swift, L. (1993). Time Series. Forecasting, Simulation, Ap-
plications. Ellis Horwood, New York.
[23] Kalman, R.E. (1960). A new approach to linear filtering. Trans. ASME J. of
Basic Engineering 82, 35–45.
[24] Kendall, M.G. and Ord, J.K. (1993). Time Series. 3rd ed. Arnold, Sevenoaks.
[25] Ljung, G.M. and Box, G.E.P. (1978). On a measure of lack of fit in time series
models. Biometrika 65, 297–303.
[26] Murray, M.P. (1994). A drunk and her dog: an illustration of cointegration
and error correction. The American Statistician 48, 37–39.
[27] Newton, H.J. (1988). Timeslab: A Time Series Analysis Laboratory. Brooks/-
Cole, Pacific Grove.
[28] Parzen, E. (1957). On consistent estimates of the spectrum of the stationary
time series. Ann. Math. Statist. 28, 329–348.
[29] Phillips, P.C. and Ouliaris, S. (1990). Asymptotic properties of residual based
tests for cointegration. Econometrica 58, 165–193.
[30] Phillips, P.C. and Perron, P. (1988). Testing for a unit root in time series
regression. Biometrika 75, 335–346.
[31] Quenouille, M.H. (1957). The Analysis of Multiple Time Series. Griffin, Lon-
don.
BIBLIOGRAPHY 217
[32] Reiss, R.D. (1989). Approximate Distributions of Order Statistics. With Ap-
plications to Nonparametric Statistics. Springer Series in Statistics, Springer,
New York.
[33] Rudin, W. (1974). Real and Complex Analysis, 2nd. ed. McGraw-Hill, New
York.
[34] SAS Institute Inc. (1992). SAS Procedures Guide, Version 6, Third Edition.
SAS Institute Inc. Cary, N.C.
[35] Shiskin, J. and Eisenpress, H. (1957). Seasonal adjustment by electronic com-
puter methods. J. Amer. Statist. Assoc. 52, 415–449.
[36] Shiskin, J., Young, A.H. and Musgrave, J.C. (1967). The X–11 variant of census
method II seasonal adjustment program. Technical paper no. 15, Bureau of the
Census, U.S. Dept. of Commerce.
[37] Simonoff, J.S. (1966). Smoothing Methods in Statistics. Springer Series in Sta-
tistics, Springer, New York.
[38] Tintner, G. (1958). Eine neue Methode f¨ ur die Sch¨ atzung der logistischen Funk-
tion. Metrika 1, 154–157.
[39] Tukey, J. (1949). The sampling theory of power spectrum estimates. Proc.
Symp. on Applications of Autocorrelation Analysis to Physical Problems,
NAVEXOS-P-735, Office of Naval Research, Washington, 47–67.
[40] Wallis, K.F. (1974). Seasonal adjustment and relations between variables. J.
Amer. Statist. Assoc. 69, 18–31.
[41] Wei, W.W.S. (1990). Time Series Analysis. Unvariate and Multivariate Meth-
ods. Addison–Wesley, New York.
Index
Autocorrelation
-function, 34
Autocovariance
-function, 34
Bandwidth, 165
Box–Cox Transformation, 39
Cauchy–Schwarz inequality, 36
Census
U.S. Bureau of the, 21
X-11 Program, 21
X-12 Program, 23
Cointegration, 79
regression, 80
Complex number, 45
Confidence band, 185
Conjugate complex number, 45
Correlogram, 34
Covariance generating function,
52, 53
Covering theorem, 183
Cram´er–Wold device, 193
Critical value, 183
Cycle
Juglar, 23
C´esaro convergence, 175
Data
Airline, 36, 114, 185
Bankruptcy, 43, 140
Car, 124
Electricity, 29
Gas, 125
Hog, 80, 83
Hogprice, 80
Hogsuppl, 80
Hongkong, 88
Income, 13
Nile, 213
Population1, 9
Population2, 40
Public Expenditures, 41
Share, 210
Star, 128, 136
Sunspot, iii, 35, 199
temperatures, 20
Unemployed Females, 41
Unemployed1, 2, 19, 24, 43
Unemployed2, 41
Wolf, see Sunspot Data
W¨ olfer, see Sunspot Data
Difference
seasonal, 31
Discrete spectral average estima-
tor, 195, 196
Distribution
χ
2
-, 181, 205
t-, 86
exponential, 181, 182
gamma, 205
INDEX 219
Gumbel, 183
uniform, 182
Distribution function
empirical, 182
Drunkard’s walk, 79
Equivalent degree of freedom, 206
Error correction, 79, 83
Euler’s equation, 139
Exponential Smoother, 32
Fatou’s lemma, 49
Filter
absolutely summable, 47
band pass, 165
difference, 28
exponential smoother, 32
high pass, 165
inverse, 55
linear, 16, 195
low pass, 165
low-pass, 17
Fisher’s kappa statistic, 183
Forecast
by an exponential smoother,
34
Fourier transform, 139
inverse, 145
Function
quadratic, 97
allometric, 13
Cobb–Douglas, 13
logistic, 6
Mitscherlich, 11
positive semidefinite, 152
transfer, 159
Gain, see Power transfer func-
tion
Gamma function, 86, 205
Gibb’s phenomenon, 167
Gumbel distribution, 211
Hang Seng
closing index, 88
Helly’s selection theorem, 155
Henderson moving average, 23
Herglotz’s theorem, 153
Imaginary part
of a complex number, 45
Information criterion
Akaike’s, 95
Bayesian, 95
Hannan-Quinn, 95
Innovation, 86
Input, 17
Intercept, 13
Kalman
filter, 109, 112
h-step prediction, 113
prediction step of, 112
updating step of, 113
gain, 111, 112
recursions, 109
Kernel
Bartlett, 202
Blackman–Tukey, 202, 203
Fejer, 213
function, 201
Parzen, 202
triangular, 202
truncated, 202
Tukey–Hanning, 202
Kolmogorov’s theorem, 152, 153
Kolmogorov–Smirnov statistic, 184
Kronecker’s lemma, 210
220 INDEX
Leakage phenomenon, 199
Least squares, 98
estimate, 80
filter design, 165
Likelihood function, 97
Lindeberg condition, 193
Linear process, 196
Log returns, 88
Loglikelihood function, 97
Maximum impregnation, 8
Maximum likelihood estimator,
88, 97
Moving average, 195
North Rhine-Westphalia, 8
Nyquist frequency, 158
Observation equation, 105
Order statistics, 182
Output, 17
Periodogram, 136, 181
cumulated, 182
Power transfer function, 159
Principle of parsimony, 94
Process
ARCH(p), 86
GARCH(p, q), 88
autoregressive , 60
stationary, 47
autoregressive moving aver-
age, 71
Random walk, 78
Real part
of a complex number, 45
Residual sum of squares, 99
Slope, 13
Slutzky’s lemma, 210
Spectral
density, 151, 154
of AR(p)-processes, 168
of ARMA-processes, 167
of MA(q)-processes, 168
distribution function, 153
Spectrum, 151
State, 105
equation, 105
State-space
model, 105
representation, 105
Test
augmented Dickey–Fuller, 80
Bartlett–Kolmogorov–Smirnov,
184
Bartlett-Kolmogorov-Smirnov,
184
Box–Ljung, 100
Box–Pierce, 100
Dickey–Fuller, 80
Durbin–Watson, 80
of Fisher for hidden period-
icities, 183
of Phillips–Ouliaris for coin-
tegration, 83
Portmanteau, 100
Time series
seasonally adjusted, 19
Time Series Analysis, 1
Unit root test, 80
Volatility, 85
Weights, 16
White noise
INDEX 221
spectral density of a, 156
X-11 Program, see Census
X-12 Program, see Census
Yule–Walker equations, 65
SAS-Index
| |, 8
@@, 36
FREQ , 188
N , 21, 38, 91, 130, 188
ADDITIVE, 25
ANGLE, 5
AXIS, 5, 25, 36, 63, 65, 167, 188
C=color, 5
CGREEK, 63
CINV, 208
COEF, 138
COMPLEX, 63
COMPRESS, 8, 77
CONSTANT, 130
CORR, 36
DATA, 4, 8, 91
DATE, 25
DECOMP, 21
DELETE, 31
DISPLAY, 31
DIST, 93
DO, 8, 167
DOT, 5
F=font, 8
FIRSTOBS, 138
FORMAT, 21
FREQ, 138
FTEXT, 63
GARCH, 93
GOPTION, 63
GOPTIONS, 31
GOUT=, 31
GREEK, 8
GREEN, 5
H=height, 5, 8
I=display style, 5, 188
IDENTIFY, 36, 71, 93
IF, 188
IGOUT, 31
INPUT, 25, 36
INTNX, 21, 25
JOIN, 5, 147
Kernel
Tukey–Hanning, 204
L=line type, 8
LABEL, 5, 63, 171
LAG, 36, 71, 204
LEAD=, 116
LEGEND, 8, 25, 63
LOG, 39
MERGE, 11
MINOR, 36, 65
MODEL, 11, 84, 93, 130
MONTHLY, 25
SAS-INDEX 223
NDISPLAY, 31
NLAG=, 36
NOFS, 31
NOINT, 93
NONE, 65
NOOBS, 4
NOPRINT, 188
OBS, 4
ORDER, 36
OUT, 21, 138, 186
OUTCOV, 36
OUTDECOMP, 21
OUTEST, 11
OUTPUT, 8, 70, 130
P (periodogram), 138, 186, 201
P 0 (periodogram variable), 188
PARAMETERS, 11
PARTCORR, 71
PERIOD, 138
PHILLIPS, 84
PI, 130
PLOT, 6, 8, 91, 167
PROC, 4
ARIMA, 36, 71, 93, 141
AUTOREG, 84, 93
GPLOT, 5, 8, 11, 31, 36, 63,
65, 70, 77, 82, 91, 93, 138,
141, 143, 171, 188, 201,
204
GREPLAY, 31, 82, 91
MEANS, 188
NLIN, 11
PRINT, 4, 138
REG, 130
SPECTRA, 138, 143, 186, 201,
204
STATESPACE, 116
TIMESERIES, 21
X11, 25
QUIT, 4
RANNOR, 65, 204
RETAIN, 188
RUN, 4, 31
S (spectral density), 201, 204
SEASONALITY, 21
SHAPE, 63
SORT, 138
STATIONARITY, 84
STEPJ (display style), 188
SUM, 188
SYMBOL, 5, 8, 25, 63, 147, 167,
171, 188
T, 93
TC=template catalog, 31
TEMPLATE, 31
TITLE, 4
TREPLAY, 31
TUKEY, 204
V3 (template), 82
V= display style, 5
VAR, 4, 116, 138
VREF=display style, 36, 82
W=width, 5
WEIGHTS, 201
WHERE, 11, 65
WHITE, 31
WHITETEST, 186
224 SAS-INDEX
GNU Free
Documentation License
Version 1.2, November 2002
Copyright c 2000,2001,2002 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other functional and
useful document ”free” in the sense of freedom: to assure everyone the effective
freedom to copy and redistribute it, with or without modifying it, either commer-
cially or noncommercially. Secondarily, this License preserves for the author and
publisher a way to get credit for their work, while not being considered responsible
for modifications made by others.
This License is a kind of ”copyleft”, which means that derivative works of the
document must themselves be free in the same sense. It complements the GNU
General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software,
because free software needs free documentation: a free program should come with
manuals providing the same freedoms that the software does. But this License is
not limited to software manuals; it can be used for any textual work, regardless of
subject matter or whether it is published as a printed book. We recommend this
License principally for works whose purpose is instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains a
notice placed by the copyright holder saying it can be distributed under the terms
of this License. Such a notice grants a world-wide, royalty-free license, unlimited in
duration, to use that work under the conditions stated herein. The ”Document”,
226 GNU Free Documentation Licence
below, refers to any such manual or work. Any member of the public is a licensee,
and is addressed as ”you”. You accept the license if you copy, modify or distribute
the work in a way requiring permission under copyright law.
A ”Modified Version” of the Document means any work containing the Document
or a portion of it, either copied verbatim, or with modifications and/or translated
into another language.
A ”Secondary Section” is a named appendix or a front-matter section of the
Document that deals exclusively with the relationship of the publishers or authors of
the Document to the Document’s overall subject (or to related matters) and contains
nothing that could fall directly within that overall subject. (Thus, if the Document
is in part a textbook of mathematics, a Secondary Section may not explain any
mathematics.) The relationship could be a matter of historical connection with the
subject or with related matters, or of legal, commercial, philosophical, ethical or
political position regarding them.
The ”Invariant Sections” are certain Secondary Sections whose titles are desig-
nated, as being those of Invariant Sections, in the notice that says that the Docu-
ment is released under this License. If a section does not fit the above definition of
Secondary then it is not allowed to be designated as Invariant. The Document may
contain zero Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.
The ”Cover Texts” are certain short passages of text that are listed, as Front-
Cover Texts or Back-Cover Texts, in the notice that says that the Document is
released under this License. A Front-Cover Text may be at most 5 words, and a
Back-Cover Text may be at most 25 words.
A ”Transparent” copy of the Document means a machine-readable copy, rep-
resented in a format whose specification is available to the general public, that is
suitable for revising the document straightforwardly with generic text editors or (for
images composed of pixels) generic paint programs or (for drawings) some widely
available drawing editor, and that is suitable for input to text formatters or for
automatic translation to a variety of formats suitable for input to text formatters.
A copy made in an otherwise Transparent file format whose markup, or absence
of markup, has been arranged to thwart or discourage subsequent modification by
readers is not Transparent. An image format is not Transparent if used for any
substantial amount of text. A copy that is not ”Transparent” is called ”Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without
markup, Texinfo input format, LaTeX input format, SGML or XML using a pub-
licly available DTD, and standard-conforming simple HTML, PostScript or PDF
designed for human modification. Examples of transparent image formats include
PNG, XCF and JPG. Opaque formats include proprietary formats that can be
read and edited only by proprietary word processors, SGML or XML for which the
DTD and/or processing tools are not generally available, and the machine-generated
HTML, PostScript or PDF produced by some word processors for output purposes
only.
The ”Title Page” means, for a printed book, the title page itself, plus such fol-
lowing pages as are needed to hold, legibly, the material this License requires to
GNU Free Documentation Licence 227
appear in the title page. For works in formats which do not have any title page
as such, ”Title Page” means the text near the most prominent appearance of the
work’s title, preceding the beginning of the body of the text.
A section ”Entitled XYZ” means a named subunit of the Document whose title
either is precisely XYZ or contains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a specific section name mentioned
below, such as ”Acknowledgements”, ”Dedications”, ”Endorsements”, or
”History”.) To ”Preserve the Title” of such a section when you modify the
Document means that it remains a section ”Entitled XYZ” according to this defi-
nition.
The Document may include Warranty Disclaimers next to the notice which states
that this License applies to the Document. These Warranty Disclaimers are con-
sidered to be included by reference in this License, but only as regards disclaiming
warranties: any other implication that these Warranty Disclaimers may have is void
and has no effect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license
notice saying this License applies to the Document are reproduced in all copies, and
that you add no other conditions whatsoever to those of this License. You may not
use technical measures to obstruct or control the reading or further copying of the
copies you make or distribute. However, you may accept compensation in exchange
for copies. If you distribute a large enough number of copies you must also follow
the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers)
of the Document, numbering more than 100, and the Document’s license notice
requires Cover Texts, you must enclose the copies in covers that carry, clearly and
legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover
Texts on the back cover. Both covers must also clearly and legibly identify you as
the publisher of these copies. The front cover must present the full title with all
words of the title equally prominent and visible. You may add other material on
the covers in addition. Copying with changes limited to the covers, as long as they
preserve the title of the Document and satisfy these conditions, can be treated as
verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue
the rest onto adjacent pages.
228 GNU Free Documentation Licence
If you publish or distribute Opaque copies of the Document numbering more than
100, you must either include a machine-readable Transparent copy along with each
Opaque copy, or state in or with each Opaque copy a computer-network location
from which the general network-using public has access to download using public-
standard network protocols a complete Transparent copy of the Document, free of
added material. If you use the latter option, you must take reasonably prudent
steps, when you begin distribution of Opaque copies in quantity, to ensure that this
Transparent copy will remain thus accessible at the stated location until at least
one year after the last time you distribute an Opaque copy (directly or through your
agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide
you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the condi-
tions of sections 2 and 3 above, provided that you release the Modified Version under
precisely this License, with the Modified Version filling the role of the Document,
thus licensing distribution and modification of the Modified Version to whoever pos-
sesses a copy of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of
the Document, and from those of previous versions (which should, if there
were any, be listed in the History section of the Document). You may use
the same title as a previous version if the original publisher of that version
gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible
for authorship of the modifications in the Modified Version, together with at
least five of the principal authors of the Document (all of its principal authors,
if it has fewer than five), unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modified Version,
as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the
other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the
public permission to use the Modified Version under the terms of this License,
in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required
Cover Texts given in the Document’s license notice.
GNU Free Documentation Licence 229
H. Include an unaltered copy of this License.
I. Preserve the section Entitled ”History”, Preserve its Title, and add to it
an item stating at least the title, year, new authors, and publisher of the
Modified Version as given on the Title Page. If there is no section Entitled
”History” in the Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add an item
describing the Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access
to a Transparent copy of the Document, and likewise the network locations
given in the Document for previous versions it was based on. These may be
placed in the ”History” section. You may omit a network location for a work
that was published at least four years before the Document itself, or if the
original publisher of the version it refers to gives permission.
K. For any section Entitled ”Acknowledgements” or ”Dedications”, Preserve the
Title of the section, and preserve in the section all the substance and tone of
each of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text
and in their titles. Section numbers or the equivalent are not considered part
of the section titles.
M. Delete any section Entitled ”Endorsements”. Such a section may not be
included in the Modified Version.
N. Do not retitle any existing section to be Entitled ”Endorsements” or to con-
flict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may
at your option designate some or all of these sections as invariant. To do this, add
their titles to the list of Invariant Sections in the Modified Version’s license notice.
These titles must be distinct from any other section titles.
You may add a section Entitled ”Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties–for example, statements
of peer review or that the text has been approved by an organization as the author-
itative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage
of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in
the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover
Text may be added by (or through arrangements made by) any one entity. If the
Document already includes a cover text for the same cover, previously added by
you or by arrangement made by the same entity you are acting on behalf of, you
230 GNU Free Documentation Licence
may not add another; but you may replace the old one, on explicit permission from
the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give per-
mission to use their names for publicity for or to assert or imply endorsement of
any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that
you include in the combination all of the Invariant Sections of all of the original
documents, unmodified, and list them all as Invariant Sections of your combined
work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple iden-
tical Invariant Sections may be replaced with a single copy. If there are multiple
Invariant Sections with the same name but different contents, make the title of
each such section unique by adding at the end of it, in parentheses, the name of
the original author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of Invariant Sections in
the license notice of the combined work.
In the combination, you must combine any sections Entitled ”History” in the various
original documents, forming one section Entitled ”History”; likewise combine any
sections Entitled ”Acknowledgements”, and any sections Entitled ”Dedications”.
You must delete all sections Entitled ”Endorsements”.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various
documents with a single copy that is included in the collection, provided that you
follow the rules of this License for verbatim copying of each of the documents in all
other respects.
You may extract a single document from such a collection, and distribute it in-
dividually under this License, provided you insert a copy of this License into the
extracted document, and follow this License in all other respects regarding verbatim
copying of that document.
7. AGGREGATION WITH INDEPENDENT
WORKS
A compilation of the Document or its derivatives with other separate and indepen-
dent documents or works, in or on a volume of a storage or distribution medium, is
called an ”aggregate” if the copyright resulting from the compilation is not used to
limit the legal rights of the compilation’s users beyond what the individual works
GNU Free Documentation Licence 231
permit. When the Document is included in an aggregate, this License does not ap-
ply to the other works in the aggregate which are not themselves derivative works
of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Doc-
ument, then if the Document is less than one half of the entire aggregate, the Doc-
ument’s Cover Texts may be placed on covers that bracket the Document within
the aggregate, or the electronic equivalent of covers if the Document is in elec-
tronic form. Otherwise they must appear on printed covers that bracket the whole
aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations
of the Document under the terms of section 4. Replacing Invariant Sections with
translations requires special permission from their copyright holders, but you may
include translations of some or all Invariant Sections in addition to the original
versions of these Invariant Sections. You may include a translation of this License,
and all the license notices in the Document, and any Warranty Disclaimers, provided
that you also include the original English version of this License and the original
versions of those notices and disclaimers. In case of a disagreement between the
translation and the original version of this License or a notice or disclaimer, the
original version will prevail.
If a section in the Document is Entitled ”Acknowledgements”, ”Dedications”, or
”History”, the requirement (section 4) to Preserve its Title (section 1) will typically
require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as ex-
pressly provided for under this License. Any other attempt to copy, modify, sub-
license or distribute the Document is void, and will automatically terminate your
rights under this License. However, parties who have received copies, or rights, from
you under this License will not have their licenses terminated so long as such parties
remain in full compliance.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free
Documentation License from time to time. Such new versions will be similar in
spirit to the present version, but may differ in detail to address new problems or
concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Doc-
ument specifies that a particular numbered version of this License ”or any later
version” applies to it, you have the option of following the terms and conditions
232 GNU Free Documentation Licence
either of that specified version or of any later version that has been published (not
as a draft) by the Free Software Foundation. If the Document does not specify a
version number of this License, you may choose any version ever published (not as
a draft) by the Free Software Foundation.
ADDENDUM: How to use this License for
your documents
To use this License in a document you have written, include a copy of the License
in the document and put the following copyright and license notices just after the
title page:
Copyright c YEAR YOUR NAME. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.2 or any later version pub-
lished by the Free Software Foundation; with no Invariant Sections,
no Front-Cover Texts, and no Back-Cover Texts. A copy of the li-
cense is included in the section entitled ”GNU Free Documentation
License”.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace
the ”with...Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other combination of
the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend
releasing these examples in parallel under your choice of free software license, such
as the GNU General Public License, to permit their use in free software.

A First Course on Time Series Analysis— Examples with SAS by Chair of Statistics, University of W¨rzburg. u Version 2006.Feb.01 Copyright c 2006 Michael Falk. Editors Programs Layout and Design Michael Falk, Frank Marohn, Ren´ Michel, Daniel Hofe mann, Maria Macke Bernward Tewes, Ren´ Michel, Daniel Hofmann e Peter Dinges

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no FrontCover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ”GNU Free Documentation License”.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. Windows is a trademark, Microsoft is a registered trademark of the Microsoft Corporation. The authors accept no responsibility for errors in the programs mentioned of their consequences.

Preface
The analysis of real data by means of statistical methods with the aid of a software package common in industry and administration usually is not an integral part of mathematics studies, but it will certainly be part of a future professional work. The practical need for an investigation of time series data is exemplified by the following plot, which displays the yearly sunspot numbers between 1749 and 1924. These data are also known as the Wolf or W¨lfer (a student of Wolf) Data. For a discussion of these data and o further literature we refer to Wei (1990), Example 5.2.5.

Plot 1: Sunspot data The present book links up elements from time series analysis with a selection of statistical procedures used in general practice including the

iv statistical software package SAS (Statistical Analysis System). Consequently this book addresses students of statistics as well as students of other branches such as economics, demography and engineering, where lectures on statistics belong to their academic training. But it is also intended for the practician who, beyond the use of statistical tools, is interested in their mathematical background. Numerous problems illustrate the applicability of the presented statistical procedures, where SAS gives the solutions. The programs used are explicitly listed and explained. No previous experience is expected neither in SAS nor in a special computer system so that a short training period is guaranteed. This book is meant for a two semester course (lecture, seminar or practical training) where the first two chapters can be dealt with in the first semester. They provide the principal components of the analysis of a time series in the time domain. Chapters 3, 4 and 5 deal with its analysis in the frequency domain and can be worked through in the second term. In order to understand the mathematical background some terms are useful such as convergence in distribution, stochastic convergence, maximum likelihood estimator as well as a basic knowledge of the test theory, so that work on the book can start after an introductory lecture on stochastics. Each chapter includes exercises. An exhaustive treatment is recommended. Due to the vast field a selection of the subjects was necessary. Chapter 1 contains elements of an exploratory time series analysis, including the fit of models (logistic, Mitscherlich, Gompertz curve) to a series of data, linear filters for seasonal and trend adjustments (difference filters, Census X − 11 Program) and exponential filters for monitoring a system. Autocovariances and autocorrelations as well as variance stabilizing techniques (Box–Cox transformations) are introduced. Chapter 2 provides an account of mathematical models of stationary sequences of random variables (white noise, moving averages, autoregressive processes, ARIMA models, cointegrated sequences, ARCH- and GARCH-processes, state-space models) together with their mathematical background (existence of stationary processes, covariance generating function, inverse and causal filters, stationarity condition, Yule–Walker equations, partial autocorrelation). The Box–Jenkins program for the specification of ARMAmodels is discussed in detail (AIC, BIC and HQC information cri-

Chapter 4 gives an account of the analysis of the spectrum of the stationary process (spectral distribution function. filter design) and the spectral densities of ARMA-processes are computed. Extra . kernel estimator.v terion). the line . The analysis of a series of data in the frequency domain starts in Chapter 3 (harmonic waves. discrete spectral average estimator.font . periodogram. Some basic elements of a statistical analysis of a series of data in the frequency domain are provided in Chapter 5. Gaussian processes and maximum likelihod estimation in Gaussian models are introduced as well as least squares estimators as a nonparametric alternative. The Kalman filter as a unified prediction technique closes the analysis of a time series in the time domain. ( Also . Bartlett–Kolmogorov–Smirnov test) together with the estimation of the spectral density (periodogram. For better clearness the SAS-specific part. including the diagrams generated with SAS.) 7 Program 2: Sample program .long lines will be broken into smaller lines with →continuation marked by an arrow and indentation . low pass and high pass filters. Fourier frequencies. which are introduced at the end of Chapter 2. This book is consecutively subdivided in a statistical part and a SASspecific part. The proof of the fact that the periodogram is the Fourier transform of the empirical autocovariance function is given. confidence intervals). spectral density. Fourier transform and its inverse). This links the analysis in the time domain with the analysis in the frequency domain. Herglotz’s theorem). The diagnostic check includes the Box– Ljung test. is between two horizontal bars. The problem of testing for a white noise is dealt with (Fisher’s κ-statistic. The effects of a linear filter are studied (transfer and power transfer function. separating it from the rest of the text. */ Program code will be set in typewriter . Many models of time series can be embedded in statespace models. 1 2 3 4 5 6 /* This is a sample comment . */ /* The first comment in each program will be its name .number is missing in this case .

you will find a step-by-step explanation of the above program. The keywords will be set in typewriter-font. . Please note that SAS cannot be explained as a whole this way. Only the actually used commands will be mentioned.vi In this area.

. . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . .2 Moving Averages and Autoregressive Processes . . . 135 Exercises . . 1 2 16 34 40 45 45 58 2. . . . . 2 Models of Time Series 2. . . . . . . .4 State-Space Models . . 116 3 The Frequency Domain Approach of a Time Series 127 3. . 1. . . . 2. . .1 Least Squares Approach with Known Frequencies . . . . . .2 Linear Filtering of Time Series . 152 4. . . . . . . . . . . . . . .1 Linear Filters and Stochastic Processes . . . . . . . 105 Exercises .Contents 1 Elements of Exploratory Time Series Analysis 1. . . . . 128 3. . . . . . . . . . .1 The Additive Model for a Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4 The Spectrum of a Stationary Process 151 4. . . . . 158 . . . . . . . . . .3 Specification of ARMA-Models: The Box–Jenkins Program . . . . . . . . . .2 Linear Filters and Frequencies . . .3 Autocovariances and Autocorrelations . . . . . . . . . . . . .2 The Periodogram . . . . . . . . . . . 94 2. . . . . . . . . . . . . . . . . .1 Characterizations of Autocovariance Functions . . . . . . . . 1. . . .

. . . . . . . . . . . .3 Spectral Densities of ARMA-Processes . . 173 5 Statistical Analysis in the Frequency Domain 179 5. 188 Exercises . . . . . . . . . . . . . .1 Testing for a White Noise . . . . . . . . . . . . 208 Bibliography Index SAS-Index GNU Free Documentation Licence 215 217 222 225 . . . . . . . . . . . . . . . . . . . 179 5. . . . . . . . . . . .2 Estimating Spectral Densities . . . . 167 Exercises . . . . . . . . . .viii Contents 4. . . . . . .

Geophysics is continuously observing the shaking or trembling of the earth in order to predict possibly impending earthquakes. therefore. Among these is the wish to gain a better understanding of the data generating mechanism. The newspapers’ business sections report daily stock prices.Chapter Elements of Exploratory Time Series Analysis A time series is a sequence of observations that are arranged according to the time of their outcome. daily maximum and minimum temperatures and annual rainfall. This requires proper methods that are summarized under time series analysis. numerous reasons to record and to analyze the data of a time series. their dispersion varies in time. There are. weekly interest rates. excluded from the analysis of time series. they are often governed by a trend and they have cyclic components. the prediction of future values or the optimal control of a system. Parameters in a manufacturing process are permanently monitored in order to carry out an on-line inspection in quality assurance. 1 . The characteristic property of a time series is the fact that the data are not generated independently. obviously. monthly rates of unemployment and annual turnovers. An electroencephalogram traces brain waves made by an electroencephalograph in order to detect a cerebral disease. an electrocardiogram traces heart waves. The social sciences survey annual death and birth rates. The annual crop yield of sugar-beets and their price per ton for example is recorded in agriculture. Statistical procedures that suppose independent and identically distributed data are. Meteorology records hourly wind speeds. the number of accidents in the home and various forms of criminal activities.

and decline. are the monthly numbers of unemployed workers in the building trade in Germany from July 1975 to September 1979. and Zt reflects some nonrandom long term cyclic influence. St describes some nonrandom short term cyclic influence like a seasonal component whereas Rt is a random variable grasping all the deviations from the ideal non-stochastic model yt = Tt + Zt + St . (1. yn is the assumption that these data are realizations of random variables Yt that are themselves sums of four components Yt = T t + Z t + S t + R t . n. MONTH July August September October November December January February March T 1 2 3 4 5 6 7 8 9 UNEMPLYD 60572 52461 47357 48320 60219 84418 119916 124350 87309 .2 Elements of Exploratory Time Series Analysis 1. . 51. (1. . (Unemployed1 Data).2) describing the long term behavior of the time series. . . The variables Tt and Zt are often summarized as Gt = T t + Z t . . t = 1. .1 The Additive Model for a Time Series The additive model for a given time series y1 . Example 1. . . t = 1. called trend.1. .1) where Tt is a (monotone) function of t. growth. We suppose in the following that the expectation E(Rt ) of the error variable exists and equals zero. . The following data yt . reflecting the assumption that the random deviations above or below the nonrandom model balance each other on the average. recovery. Note that E(Rt ) = 0 can always be achieved by appropriately modifying one or more of the nonrandom components. Think of the famous business cycle usually consisting of recession.1. . .

txt ’.step ) */ DATA data1 . TITLE2 ’ Unemployed1 Data ’. INPUT month $ t unemplyd .1. 1 2 3 4 5 6 7 8 /* unemployed1_listing . sas */ TITLE1 ’ Listing ’.1.1 The Additive Model for a Time Series April May June July August September October November December January February March April May June July August September October November December January February March April May June July August September October November December January February March April May June July August September 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 57035 39903 34053 29905 28068 26634 29259 38942 65036 110728 108931 71517 54428 42911 37123 33044 30755 28742 31968 41427 63685 99189 104240 75304 43622 33990 26819 25291 24538 22685 23945 28245 47017 90920 89340 47792 28448 19139 16728 16523 16622 15499 3 Listing 1.1a: Unemployed1 Data. /* Read in the data ( Data . . INFILE ’c :\ data \ unemployed1 .

’dress up’ of the display etc. The DATA step started with the DATA statement creates a temporary dataset named data1.4 Elements of Exploratory Time Series Analysis 9 10 11 12 /* Print the data ( Proc . Entering RUN. For each variable one value per line is read from the source into the computer’s memory. The following plot of the Unemployed1 Data shows a seasonal component and a downward trend. The QUIT. we will use the syntax of MS-DOS. The statement PROC procedurename DATA=filename. A line starting with an asterisk * and ending with a semicolon . at any point of the program tells SAS that a unit of work (DATA step or PROC) ended.step ) */ PROC PRINT DATA = data1 NOOBS . Its printing is actually suppressed here and in the following. RUN . where the first one contains character values. An optional VAR statement determines the order (from left to right) in which variables are displayed. This is determined by the $ sign behind the variable name.1. statement at the end terminates the processing of SAS. Three variables are defined here. The purpose of INFILE is to link the DATA step to a raw dataset outside the program. . The SAS internal observation number (OBS) is printed by default. all variables in the data set will be printed in the order they were defined to SAS. These comment statements may occur at any point of the program except within raw data or another statement.1: Listing of Unemployed1 Data. The pathname of this dataset depends on the operating system. it comes with numerous options that allow control of the variables to be printed out. is ignored. This program consists of two main parts. SAS then stops reading the program and begins to execute the unit. QUIT . NOOBS suppresses the column of observation numbers on each line of output. Program 1. a DATA and a PROC step. The TITLE statement generates a title. INPUT tells SAS how to read the data. The PRINT procedure lists the data. The period from July 1975 to September 1979 might be too short to indicate a possibly underlying long term business cycle. invokes a procedure that is linked to the data from filename. which is most commonly known. Without the option DATA=filename the most recently created file is used. If not specified (like here).

/* Read in the data */ DATA data1 . sas */ TITLE1 ’ Plot ’. Variables can be plotted by using the GPLOT procedure. ANGLE=90 causes a rotation of the label of 90◦ so that it parallels the (vertical) axis in this example. txt ’. Program 1.4 W =1. where the graphical output is controlled by numerous options.1. QUIT .2a: Unemployed1 Data. /* Graphical Options */ AXIS1 LABEL =( ANGLE =90 ’ unemployed ’) . TITLE2 ’ Unemployed1 Data ’. V=DOT C=GREEN I=JOIN H=0.4 W=1 tell SAS to plot green dots of height 0. /* Plot the data */ PROC GPLOT DATA = data1 .1 The Additive Model for a Time Series 5 Plot 1. INPUT month $ t unemplyd . The SYMBOL statement defines the manner in which the data are displayed. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 /* unemployed1_plot .1. PLOT unemplyd * t / VAXIS = AXIS1 HAXIS = AXIS2 . The AXIS statements with the LABEL options control labelling of the vertical and horizontal axes. SYMBOL1 V = DOT C = GREEN I = JOIN H =0.4 and to join them with a line of width . INFILE ’c :\ data \ unemployed1 . AXIS2 LABEL =( ’t ’) .2: Plot of Unemployed1 Data.1. RUN .

3) However. βp ) . β3 ) := β3 . . . The parameters β1 . .. . . The PLOT statement in the GPLOT procedure is of the form PLOT y-variable*x-variable / options. . β1 . . A common assumption is that the function f depends on several (unknown) parameters β1 . . A common approach is a least squares estimate β1 . (1. βp ). The Logistic Function The function flog (t) := flog (t. f (t) = f (t. They contain ˆ information about the goodness of the fit of our model to the data.. . β1 .. (1. . . βp satisfying ˆ ˆ yt − f (t. we have E(Yt ) = Tt =: f (t). β2 . . β1 . .βp t yt − f (t.5) with β1 . where the options here define the horizontal and the vertical axes. the type of the function f is known.6 Elements of Exploratory Time Series Analysis 1. if it exists at all. 1 + β2 exp(−β1 t) t ∈ R. . (1. .. and assuming E(Rt ) = 0. .e. . . β3 ∈ R \ {0} is the widely used logistic function. . The ˆ ˆ value yt := f (t. . . . ˆ The observed differences yt − yt are called residuals. βp ) can serve as a prediction of a future yt . i. βp ) 2 = min t β1 . β2 . Models with a Nonlinear Trend In the additive model Yt = Tt +Rt . where the nonstochastic component is only the trend Tt reflecting the growth of a system. is a numerical problem. β1 ..4) 2 whose computation. In the following we list several popular examples of trend functions. . . . . . βp are then to be estimated from the set of realizations yt of the random ˆ ˆ variables Yt . .. . βp . β1 .

sas */ TITLE1 ’ Plots of the Logistic Function ’.1 . AXIS1 LABEL =( H =2 ’f ’ H =1 ’log ’ H =2 ’( t ) ’) . f_log = beta3 /(1+ beta2 * EXP ( . s = COMPRESS ( ’( ’ || beta1 || ’. DO beta2 =0. SYMBOL2 C = GREEN V = NONE I = JOIN L =2.1 The Additive Model for a Time Series 7 Plot 1. DO t = -10 TO 10 BY 0. END . 1.1. /* Graphical Options */ SYMBOL1 C = GREEN V = NONE I = JOIN L =1.beta1 * t ) ) .5. END . 11 12 13 14 15 16 17 18 19 20 21 22 .’ || beta2 || ’. beta3 =1. β2 . 1. SYMBOL3 C = GREEN V = NONE I = JOIN L =3. END .5 . OUTPUT .’ || beta3 →|| ’) ’) . β3 1 2 3 4 5 6 7 8 9 10 /* logistic . DO beta1 = 0. SYMBOL4 C = GREEN V = NONE I = JOIN L =33.3a: The logistic function flog with different values of β1 .1. /* Generate the data for different logistic functions */ DATA data1 .

23 24 25 26 27 28 29 Program 1. β3 by an appropriate linear least squares approach.b ’ → H =1 ’3 ’ H =2 ’) = ’) . Here the label is modified by using a greek font (F=CGREEK) and generating smaller letters of height 1 for the indices. which is a federal state of Germany. This also automatically generates a legend. RUN . where the data file data1 is generated.2 and 1. . β2 . computed at the grid t = −10. OUTPUT then stores the values of interest of f log. .3. which is customized by the LEGEND1 statement. They generate lines of different line types (L=1. A function is plotted by computing its values at numerous grid points and then joining them. 2. .5) to the population growth of the area of North Rhine-Westphalia (NRW). The computation is done in the DATA step. 10 and indexed by the vector s of the different choices of parameters. For each value of s SAS takes a new SYMBOL statement. This is done by nested DO loops. Note that 1 1 + β2 exp(−β1 t) = flog (t) β3 1 − exp(−β1 ) 1 + β2 exp(−β1 (t − 1)) + exp(−β1 ) β3 β3 1 − exp(−β1 ) 1 + exp(−β1 ) = β3 flog (t − 1) b . . while assuming a normal height of 2 (H=1 and H=2). This can serve as a basis for estimating the parameters β1 . PLOT f_log * t = s / VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = LEGEND1 . It contains the values of f log.8 Elements of Exploratory Time Series Analysis AXIS2 LABEL =( ’t ’) . The last feature is also used in the axis statement. 33). . if β1 > 0. see Exercises 1. We obviously have limt→∞ flog (t) = β3 . (1. −9. The value β3 often resembles the maximum impregnation or growth of a system. t and s (and the other variables) in the data set data1.5. LEGEND1 LABEL =( F = CGREEK H =2 ’(b ’ H =1 ’1 ’ H =2 ’. b ’ H =1 ’2 ’ H =2 ’. /* Plot the functions */ PROC GPLOT DATA = data1 . The operator || merges two strings and COMPRESS removes the empty space in the string. In the following example we fit the logistic trend model (1. QUIT . The four functions are plotted by the GPLOT procedure by adding =s in the PLOT statement.1. 3.3: Generating plots of the logistic function flog .6) =a+ flog (t − 1) = This means that there is a linear relationship among 1/flog (t).

1 The Additive Model for a Time Series Example 1.2.1 shows the population sizes yt in millions of the area of North-Rhine-Westphalia in 5 years steps from 1935 to 1980 as well as their predicted values yt .1.200 12.881 8 16.772 10. The following plot shows the data and the fitted logistic curve.1436 exp(−0.1675 t) ˆ with the estimated saturation size β3 = 21.1.176 17. (Population1 Data).158 7 16.5016 = 1 + 1.709 4 12.710 Table 1.827 3 11.661 15.442 14.914 16.4) for a logistic model.5016.565 5 14. .158 10 17.044 17.1: Population1 Data As a prediction of the population size at time t we obtain in the logistic model yt := ˆ ˆ β3 ˆ ˆ 1 + β2 exp(−β1 t) 21. Table 1.384 6 15.1.926 13. 9 Year 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 t Population sizes yt Predicted values yt ˆ (in millions) (in millions) 1 11.548 9 17.930 2 12. ˆ obtained from a least squares estimation as described in (1.1.694 15.059 11.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 /* population1 . /* Compute parameters for fitted logistic function */ PROC NLIN DATA = data1 OUTEST = estimate . PARAMETERS beta1 =1 beta2 =1 beta3 =20. /* Read in the data */ DATA data1 . INPUT year t pop . SET estimate ( WHERE =( _TYPE_ = ’ FINAL ’) ) . INFILE ’c :\ data \ population1 . /* Merge data sets */ DATA data3 . MODEL pop = beta3 /(1+ beta2 * EXP ( .beta1 * t1 ) ) . . /* Generate fitted logistic function */ DATA data2 . txt ’.beta1 * t ) ) . sas */ TITLE1 ’ Population sizes and logistic fit ’. OUTPUT . END .1.2. RUN .10 Elements of Exploratory Time Series Analysis Plot 1.4a: NRW population sizes and fitted logistic function. MERGE data1 data2 . TITLE2 ’ Population1 Data ’. f_log = beta3 /(1+ beta2 * EXP ( . DO t1 =0 TO 11 BY 0.

t ≥ 0. β3 ) := β1 + β2 exp(β3 t). 1).8) where β1 . t ≥ 0. the second data step generates the fitted logistic function values. The procedure NLIN fits nonlinear regression models by least squares. β2 ∈ R and β3 < 0. The MODEL statement defines the prediction equation by declaring the dependent variable and defining an expression that evaluates predicted values. (1. after they were stored together in a new data set data3 merging data1 and data2 with the MERGE statement. QUIT . β2 . Since β3 is negative we have limt→∞ fM (t) = β1 and thus the parameter β1 is the saturation value of the system. (1.7) where β1 .1. A PARAMETERS statement must follow the PROC NLIN statement. . SYMBOL2 V = NONE C = GREEN I = JOIN W =1.1. β2 ∈ R and β3 ∈ (0. /* Plot data with fitted function */ PROC GPLOT DATA = data3 . RUN . 11 28 29 30 31 32 33 34 35 36 37 Program 1. β1 . The Mitscherlich Function The Mitscherlich function is typically used for modelling the long term growth of a system: fM (t) := fM (t. The OUTEST option names the data set to contain the parameter estimates produced by NLIN. β2 . Using the final estimates of PROC NLIN by the SET statement in combination with the WHERE data set option. The (initial) value of the system at the time t = 0 is fM (0) = β1 + β2 . SYMBOL1 V = DOT C = GREEN I = NONE .4: NRW population sizes and fitted logistic function. β1 . β3 ) := exp(β1 + β2 β3 ). The Gompertz Curve A further quite common function for modelling the increase or decrease of a system is the Gompertz curve t fG (t) := fG (t. AXIS2 LABEL =( ’t ’) .1 The Additive Model for a Time Series /* Graphical options */ AXIS1 LABEL =( ANGLE =90 ’ population in millions ’) . The options in the GPLOT statement cause the data points and the predicted function to be shown in one plot. Each parameter=value expression specifies the starting values of the parameter. PLOT pop * t =1 f_log * t1 =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 .

sas */ TITLE1 ’ Gompertz curves ’. 1. L =2. f_g = EXP ( beta1 + beta2 * beta3 ** t ) . 1 2 3 4 5 6 7 8 9 10 /* gompertz . L =33. DO beta1 =1. L =3. END .5a: Gompertz curves with different parameters. DO t =0 TO 4 BY 0. OUTPUT . /* Graphical Options */ SYMBOL1 C = GREEN V = NONE I = JOIN SYMBOL2 C = GREEN V = NONE I = JOIN SYMBOL3 C = GREEN V = NONE I = JOIN SYMBOL4 C = GREEN V = NONE I = JOIN 11 12 13 14 15 16 17 18 19 20 21 22 L =1. .05 .05. END .5. END .1. /* Generate the data for different Gompertz functions */ DATA data1 . DO beta3 =0.12 Elements of Exploratory Time Series Analysis Plot 1. s = COMPRESS ( ’( ’ || beta1 || ’.’ || beta3 →|| ’) ’) . DO beta2 = -1 . END .’ || beta2 || ’. 0.

/* Plot the functions */ PROC GPLOT DATA = data1 . We obviously have t log(fG (t)) = β1 + β2 β3 = β1 + β2 exp(log(β3 )t). and thus log(fG ) is a Mitscherlich function with parameters β1 . log(β3 ).1. β2 > 0. where εt are the error variables. is a common trend function in biometry and economics. t ≥ 1. PLOT f_g * t = s / VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = LEGEND1 . is a linear function of log(t). β2 . The saturation size obviously is exp(β1 ). starting in 1960. we can assume a linear regression model for the logarithmic data log(yt ) log(yt ) = log(β2 ) + β1 log(t) + εt . β1 .9) with β1 ∈ R. t > 0.1. β2 ) = β2 tβ1 . which is a popular econometric model to describe the output produced by a system depending on an input. Example 1.2 shows the (accumulated) annual average increases of gross and net incomes in thousands DM (deutsche mark) in Germany. The Allometric Function The allometric function fa (t) := fa (t. (1.1. (Income Data). It can be viewed as a particular Cobb–Douglas function. . QUIT . t ≥ 0.1. RUN . AXIS2 LABEL =( ’t ’) .5: Plotting the Gompertz curves.3.1 The Additive Model for a Time Series AXIS1 LABEL =( H =2 ’f ’ H =1 ’G ’ H =2 ’( t ) ’) .b ’ →H =1 ’3 ’ H =2 ’) = ’) . Table 1. LEGEND1 LABEL =( F = CGREEK H =2 ’(b ’ H =1 ’1 ’ H =2 ’. with slope β1 and intercept log(β2 ). 13 23 24 25 26 27 28 29 30 Program 1.b ’ H =1 ’2 ’ H =2 ’. Since log(fa (t)) = log(β2 ) + β1 log(t).

2: Income Data.486 1962 2 1. We assume that the increase of the net income yt is an allometric function of the time t and obtain log(yt ) = log(β2 ) + β1 log(t) + εt .188 2.702 1.201 3. ˆ (1.11) .855 4. The predicted value yt corresponds to the time t ˆ yt = 0.866 3.867 1965 5 3.247 0. for example.7549 We estimate β2 therefore by ˆ β2 = exp(−0.625 5. (2002)) ˆ β1 = 10 t=1 (log(t) − log(t))(log(yt ) − 10 2 t=1 (log(t) − log(t)) 10 t=1 log(t) log(y)) = 1.47t1.2. log(y) := 10−1 = ˆ log(β2 ) = log(y) − β1 log(t) = −0.019.627 0. (1.323 1964 4 2.2 in Falk et al.10) The least squares estimates of β1 and log(β2 ) in the above linear regression model are (see.5104. and hence = 1.7849.321 1970 10 7.4700.482 Table 1.7549) = 0.022 1967 7 4. Theorem 3. 10 t=1 log(yt ) where log(t) := 10−1 0.259 1968 8 4.663 1969 9 5.408 1.840 3.14 Elements of Exploratory Time Series Analysis Year t Gross income xt Net income yt 1960 0 0 0 1961 1 0.973 1963 3 1.1.019 .568 1966 6 3.

R 2 is ˆ necessarily between zero and one with the implications R 2 = 1 iff1 n 2 ˆ 2 t=1 (yt − yt ) = 0 (see Exercise 1.178 with ˜ ˜ 1 if and only if . whereas (1.3 in Falk et al. A popular measure for assessing the fit is the squared multiple correlation coefficient or R2 -value R := 1 − 2 n ˆ 2 t=1 (yt − yt ) n ¯2 t=1 (yt − y ) (1.1 The Additive Model for a Time Series t 1 2 3 4 5 6 7 8 9 10 yt − y t ˆ 0.12) where y := n−1 n yt is the average of the observations yt (cf Sec¯ t=1 tion 3. Table 1. in which 2 case R is no longer necessarily between zero and one and has therefore to be viewed with care as a crude measure of fit. (2002)).9789. that the initial model ˆ (1. A value of R close to 1 is in favor of the fitted model.5662 15 Table 1.10) has R2 equal to 0.1.0201 -0.4). The actual average gross and net incomes were therefore xt := xt + 6.3 lists the residuals yt − yt by which one can judge the ˆ goodness of fit of the model (1.0942 0.1.11) has R2 = 0.1583 -0. The model (1.11).9) is not linear and β2 is not the least squares estimates. In the linear regression model with yt based on the least squares estimates of the parameters. The annual average gross income in 1960 was 6148 DM and the corresponding net income was 5178 DM.0159 0.1.148 and yt := yt + 5.9934.1176 -0.1430 0. however.0646 0.1017 -0.3: Residuals of Income Data.2526 -0. Note.

n. therefore.16 Elements of Exploratory Time Series Analysis the estimated model based on the above predicted values yt ˆ ˆ yt = yt + 5. Linear Filters Let a−r . . t = 1. a−r+1 . (1. the decomposition Yt = T t + S t + R t . . 2. . 2. in which case the smooth nonrandom component Gt equals the trend function Tt . .178 = 0. The above models might help judging the average tax payer’s situation between 1960 and 1970 and to predict his future one. The data yt are decomposed in smooth parts and irregular parts that fluctuate around zero. r + s + 1 ≤ n. .13) with E(Rt ) = 0. t = 1.47t1. ˜ ˆ Note that the residuals yt − yt = yt − yt are not influenced by adding ˜ ˆ ˜ ˆ the constant 5. whereas the large increase y10 in 1970 seems to be an outlier. . These series are referred to as the trend or seasonally adjusted time series. s ≥ 0.1) and assume that there is no long term cyclic component. Nevertheless. . . t = s + 1. . . in 1969 the German government had changed and in 1970 a long strike in Germany caused an enormous increase in the income of civil servants. of this time ˆ ˆ series.019 + 5. .1. 1. . Given realizations yt . as be arbitrary real numbers. the aim of this section is the derivation of estimators Tt .3 that the net income yt is an almost perfect multiple of t for t between 1 and 9.178 to yt .2 Linear Filtering of Time Series In the following we consider the additive model (1. St of the nonrandom functions Tt and St and to remove them from the ˆ ˆ time series by considering yt − Tt or yt − St instead. . Our model is. where r. The linear transformation s Yt∗ := u=−r au Yt−u .178. . . It is apparent from the residuals in Table 1. . Actually. we allow a trend. n − r.

2 Linear Filtering of Time Series is referred to as a linear filter with weights a−r . A positive value s > 0 or r > 0 causes a truncation at the beginning or at the end of the time series. with an odd number of equal weights. a simple moving average of order 2s + 1 leads to Yt∗ 1 1 1 ∗ Yt−u = Tt−u + Rt−u =: Tt∗ +Rt . . . . . an analog instrument that comes with a hand and a built-in smoothing filter. . To compute the output of a simple moving average of order 2s + 1. is sufficient in most cases. While for example a digital speedometer in a car can provide its instantaneous velocity. The Yt are called input and the Yt∗ are called output. if (r. as . by a too large choice of s. The latter instrument is much more comfortable to read and its information. The particular cases au = 1/(2s + 1). thus detecting trends or seasonal components. . thus leading. . as )T a (linear) filter. s) = (0. reflecting a trend. s u=−r au = 1. There is a trade-off between the two requirements that the irregular fluctuation should be reduced by a filter. = 2s + 1 u=−s 2s + 1 u=−s 2s + 1 u=−s s s s . . Filtering a time series aims at smoothing the irregular part of a time series. . which might otherwise be covered by fluctuations. u = −s. there are less output data than input data. which preserves the slowly varying trend component of a series but removes from it the rapidly fluctuating or high frequency component. . to a large choice of s in a simple moving average.1. a−s = as = 1/(4s). a time series Yt = Tt + Rt without seasonal component. . i. respectively. we call the vector of weights (au ) = (a−r . for example.1 below. For convenience. reduces these fluctuations but takes a while to adjust. . A filter (au ). Obviously. . . u = −s + 1. are simple moving averages of order 2s + 1 and 2s. aiming at an even number of weights. see Example 1. . the following obvious equation is useful: ∗ Yt+1 = Yt∗ + 17 1 (Yt+s+1 − Yt−s ). s − 1.. . thereby showing considerably large fluctuations. is called moving average. 0).e. and that the long term variation in the data should not be distorted by oversmoothing.2. 2s + 1 This filter is a particular example of a low-pass filter. whose weights sum up to one. If we assume. s. for example. or au = 1/(2s).

Take for instance monthly average temperatures Yt measured at fixed points. St = St+p . . By adding this constant S to the trend function Tt and putting Tt := Tt + S. . however. But Tt∗ might then no longer reflect Tt . Seasonal Adjustment A simple moving average of a time series Yt = Tt + St + Rt now decomposes as ∗ Yt∗ = Tt∗ + St∗ + Rt . t = p. that St is a p-periodic function. hence. . . for t > p.14) D S p j=1 p j=1 . n − p. p.18 Elements of Exploratory Time Series Analysis where by some law of large numbers argument ∗ Rt ∼ E(Rt ) = 0. . Thus we obtain for the differences Dt := Yt − Yt∗ ∼ St + Rt and. moreover. p + 1. has the effect that Rt is not yet close to its expectation. . . if s is large. i. A simple moving average of order p then yields a constant value St∗ = S. . t = 1. ¯ ¯ Dt := Dt−p ¯ where nt is the number of periods available for the computation of Dt . A small choice of ∗ s. Thus. . Suppose. n − p. . p p ¯ j ∼ St − 1 ˆt := Dt − 1 ¯ Sj = S t (1. averaging these differences yields 1 ¯ Dt := nt nt −1 j=0 Dt+jp ∼ St . t = 1.e. . in which case it is reasonable to assume a periodic seasonal component St with period p = 12 months.. we can assume in the following that S = 0. where St∗ is the pertaining moving average of the seasonal components. .

. Table 1. satisfying 1 p p−1 19 ˆ St+j j=0 1 =0= p p−1 St+j . which we assume to be zero by adding this constant to the trend function. Example 1. dj = dt + 12 j=1 12 .1. .1: Table of dt .1 it is obviously reasonable to assume a periodic seasonal component with p = 12 months. We obtain for these data 1 ¯ st = d t − ˆ 12 ¯ ¯ ¯ 3867 = dt + 322. 45. Dt and the estimates St of St .2. . . .2.1.1. For the 51 Unemployed1 Data in Example 1. = Yt−6 + 12 2 2 u=−5 5 t = 7. then has a constant seasonal component.1 contains ¯ ˆ the values of Dt .2 Linear Filtering of Time Series is an estimator of St = St+p = St+2p = . dt and of estimates st of the seasonal compoˆ nent St in the Unemployed1 Data. j=0 ˆ The differences Yt − St with a seasonal component close to zero are then the seasonally adjusted time series.2.25. dt (rounded values) Month January February March April May June July August September October November December 1976 53201 59929 24768 -3848 -19300 -23455 -26413 -27225 -27358 -23967 -14300 11540 1977 56974 54934 17320 42 -11680 -17516 -21058 -22670 -24646 -21397 -10846 12213 1978 48469 54102 25678 -5429 -14189 -20116 -20605 -20393 -20478 -17440 -11889 7923 1979 52611 51727 10808 – – – – – – – – – ¯ dt (rounded) 52814 55173 19643 -3079 -15056 -20362 -22692 -23429 -24161 -20935 -12345 10559 st (rounded) ˆ 53136 55495 19966 -2756 -14734 -20040 -22370 -23107 -23839 -20612 -12023 10881 ¯ Table 1. . A simple moving average of order 12 Yt∗ 1 1 1 Yt−u + Yt+6 .

date = INTNX ( ’ month ’ . 14 15 16 .2. INFILE ’c :\ data \ temperatures . txt ’. INPUT temperature .2. /* Read in the data and generate SAS . TITLE2 ’ Temperature data ’. /* Make seasonal adjustment */ PROC TIMESERIES DATA = temperatures OUT = series SEASONALITY =12 →OUTDECOMP = deseason . ’01 jan95 ’d ..2. The monthly average temperatures near W¨rzburg. 1 2 3 4 5 6 7 8 9 10 11 12 13 /* temperatures .1a. VAR temperature .formatted date */ DATA temperatures . Germany were recorded from the 1st of u January 1995 to the 31st of December 2004. sas */ TITLE1 ’ Original and seasonally adjusted data ’. DECOMP / MODE = ADD .20 Elements of Exploratory Time Series Analysis Example 1. (Temperatures Data).1a: Monthly average temperatures near W¨rzburg and seau sonally adjusted values. The data together with their seasonally adjusted counterparts can be seen in Figure 1. FORMAT date yymon . Plot 1.2. _N_ -1) .

is used to determine the distance from the starting value. With MODE=ADD an additive model of the time series is assumed. which counts the number of cases. called the Census X-11 Program. We give a brief summary of this program following Wallis (1974). QUIT . /* Graphical options */ AXIS1 LABEL =( ANGLE =90 ’ temperatures ’) . The SAS procedure TIMESERIES together with the statement DECOMP computes a seasonally adjusted series. /* Plot data and seasonally adjusted series */ PROC GPLOT data = plotseries . Bureau of the Census has developed a program for seasonal adjustment of economic time series. which results in a moving average with symmetric weights. The Census X-11 Program In the fifties of the 20th century the U. SYMBOL1 V = DOT C = GREEN I = JOIN H =1 W =1. The seasonally adjusted values can be referenced by SA and are plotted together with the original series against the date in the final step.2 Linear Filtering of Time Series /* Merge necessary data for plot */ DATA plotseries . The temporarily created variable N . consisting of four digits for the year and three for the month. The census procedure . which is stored in the file after the OUTDECOMP option. In the option SEASONALITY the underlying period is specified. It is based on monthly observations and assumes an additive model Yt = T t + S t + R t as in (1.1: Seasonal adjustment of Temperatures Data. AXIS2 LABEL =( ’ Date ’) . MERGE temperatures deseason ( KEEP = SA ) . In the data step the values for the variable temperature are read from an external file.1. SYMBOL2 V = STAR C = GREEN I = JOIN H =1 W =1.13) with a seasonal component St of period p = 12. Depending on the data it can be any natural number. The FORMAT statement attributes the format yymon to this variable. The default is a multiplicative model. The original series together with an automated time variable (just a counter) is stored in the file specified in the OUT option. 21 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Program 1. By means of the function INTNX.S. RUN . a new variable in a date format is generated containing monthly data starting from the 1st of January 1995. PLOT temperature * date =1 SA * date =2 / OVERLAY VAXIS = AXIS1 HAXIS = →AXIS2 .2.

(3) Apply a moving average of order 5 to each month separately by computing 1 (1) (1) (1) (1) (1) ¯ (1) Dt := Dt−24 + 2Dt−12 + 3Dt + 2Dt+12 + Dt+24 ∼ St . 3. but it adds iterations and various moving averages. Note that the moving average with weights (1. The X-11 Program essentially works as the seasonal adjustment described above. 2. A theoretical justification based on stochastic models is provided by Cleveland and Tiao (1976). . 2. 1)/9 is a simple moving average of length 3 of simple moving averages of length 3. 12 2 2 (5) The differences Yt (1) (1) ˆ(1) := Yt − St ∼ Tt + Rt then are the preliminary seasonally adjusted series. 9 which gives an estimate of the seasonal component St . ¯ (4) The Dt are adjusted to approximately sum up to 0 over any 12-months period by putting 1 1 ¯ (1) 1 ¯ (1) ¯ (1) ˆ(1) ¯ (1) ¯ (1) St := Dt − Dt−6 + Dt−5 + · · · + Dt+5 + Dt+6 . quite in the manner as before.22 Elements of Exploratory Time Series Analysis is discussed in Shiskin and Eisenpress (1957). The different steps of this program are (1) Compute a simple moving average Yt∗ of order 12 to leave essentially a trend Yt∗ ∼ Tt . (2) The difference Dt := Yt − Yt∗ ∼ St + Rt then leaves approximately the seasonal plus irregular component. Young and Musgrave (1967). a complete description is given by Shiskin.

2 Linear Filtering of Time Series (6) The adjusted data Yt are further smoothed by a Henderson moving average Yt∗∗ of order 9. where seven years is a typical length of business cycles observed in economics (Juglar cycle)2 . 1)/15. Observe that this leads to averages at time t of the past and future seven years. the vector of weights is (1. (8) A moving average of order 7 is applied to each month separately 3 (2) (1) 23 ¯ (2) Dt := u=−3 au Dt−12u . 3. 3.S. (9) Step (4) is repeated yielding approximately centered estimates ˆ(2) St of the seasonal components. we refer to the SAS online documentation for details.1 and higher as PROC X12. This gives a second estimate of the seasonal component St . (2) where the weights au come from a simple moving average of order 3 applied to a simple moving average of order 5 of the original data. 13. 3. (10) The differences Yt (2) ˆ(2) := Yt − St then finally give the seasonally adjusted series.html . i.. The U. roughly. Yt is a moving average of length 165.com/books/231book/ch05j. 2.1.e. It is implemented in SAS version 8.drfurfero. Depending on the length of the Henderson moving average used in (2) step (6). 169 or 179 of the original data. Bureau of Census has recently released an extended version of the X-11 Program called Census X-12-ARIMA. (7) The differences Dt := Yt − Yt∗∗ ∼ St + Rt then leave a second estimate of the sum of the seasonal and irregular components. 2. or 23. 2 http://www.

txt ’. sas */ TITLE1 ’ Original and X11 seasonal adjusted data ’. it is not clear a priori how the seasonal adjustment filter described above behaves. . TITLE2 ’ Unemployed1 Data ’.24 Elements of Exploratory Time Series Analysis We will see in Example 4.4 that linear filters may cause unexpected effects and so. /* Read in the data and generated SAS . end-corrections are necessary.Program */ PROC X11 DATA = data1 . FORMAT date yymon . which cover the important problem of adjusting current observations. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 (2) /* unemployed1_x11 . _N_ -1) .formatted date */ DATA data1 . Moreover. This can be done by some extrapolation. MONTHLY DATE = date ADDITIVE . date = INTNX ( ’ month ’ . ’01 jul75 ’d .. INPUT month $ t upd . /* Apply X -11 .2.2. seasonally adjusted by the X-11 procedure.2a: Plot of the Unemployed1 Data yt and of yt . INFILE ’c :\ data \ unemployed1 . Plot 1.

QUIT . Best Local Polynomial Fit A simple moving average works well for a locally almost linear time series. . SYMBOL1 V = DOT C = GREEN I = JOIN H =1 W =1. A local polynomial estimator of order p < 2k + 1 is the minimizer β0 . yt+k from a time series. In the data step values for the variables month. /* Plot data and adjusted data */ PROC GPLOT DATA = data2 . .2 Linear Filtering of Time Series VAR upd . DATE defines the date variable and ADDITIVE selects an additive model (default: multiplicative model). . Two AXIS and two SYMBOL statements are used to customize the graphic containing two plots. t and upd are read from an external file. . AXIS2 LABEL =( ’ Date ’) . .2. . yt . 25 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Program 1. see Program 1. The results for this analysis for the variable upd (unemployed) are stored in a data set named data2. By means of the function INTNX. Consider 2k + 1 consecutive data yt−k . but it may have problems to reflect a more twisted shape. A LEGEND statement defines the text that explains the symbols. The last part of this SAS program consists of statements for generating the plot. . .1 (temperatures.2. OUTPUT OUT = data2 B1 = upd D11 = updx11 . RUN . where month is defined as a character variable by the succeeding $ in the INPUT statement. PLOT upd * date =1 updx11 * date =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = LEGEND1 .1. SYMBOL2 V = STAR C = GREEN I = JOIN H =1 W =1.15) .sas). . . βp satisfying k u=−k (yt+u − β0 − β1 u − · · · − βp up )2 = min . The MONTHLY statement selects an algorithm for monthly data. This suggests fitting higher order local polynomials. LEGEND1 LABEL = NONE VALUE =( ’ original ’ ’ adjusted ’) .2: Application of the X-11 procedure to the Unemployed1 Data. (1. The SAS procedure X11 applies the Census X11 Program to the data. a date variable is generated. containing the original data in the variable upd and the final results of the X-11 Program in updx11. /* Graphical options */ AXIS1 LABEL =( ANGLE =90 ’ unemployed ’) . . . the original data and the by X11 seasonally adjusted data.

can be written in matrix form as X T Xβ = X T y where 1 −k (−k)2 1 −k + 1 (−k + 1)2 X = .. . . But this is an immediate consequence of the fact that a polynomial of degree p has at most p different roots (Exercise 1. we see that the minimizers satisfy the p + 1 linear equations k k k k β0 u=−k u + β1 u=−k j u j+1 + · · · + βp u u=−k j+p = u=−k uj yt+u for j = 0. . The normal equations (1.. . the unique solution β = (X T X)−1 X T y. .26 Elements of Exploratory Time Series Analysis If we differentiate the left hand side with respect to each βj and set the derivatives equal to zero. The rank of X T X equals that of X. . 1 k k2   .. therefore.17) is the design matrix. . ˆ Choosing u = 0 we obtain in particular that β0 = yt is a predictor of the central observation yt among yt−k . u. . based on u. yt+k )T . .  . .. βp )T and y = (yt−k . The local polynomial approach consists now in replacing yt by the intercept β0 . β = (β0 .16) (1. . (−k + 1)p   .18) The linear prediction of yt+u .. . . . . up . . . u )β = ˆ j=0 p β j uj . which are called normal equations. . (1. . .11). . Though it seems as if this local polynomial fit requires a great deal of computational effort by calculating β0 for each yt . is p yt+u = (1. . . p.. . the matrix X T X is invertible iff the columns of X are linearly independent. since their null spaces coincide (Exercise 1. Thus. . yt+k . . . . . . (−k)p .12). kp (1. These p + 1 equations. . it turns out that .16) have. u2 .

For a book-length treatment of local polynomial estimation we refer to Fan and Gijbels (1996). k.2-3.13) (cu ) = 1 (−3. Next we show that the cu sum up to 1. 17. Then β0 = 1. (cu ) is a linear filter. Theorem 1. Difference Filter We have already seen that we can remove a periodic seasonal component from a time series by utilizing an appropriate linear filter. Fitting locally a polynomial of degree 2 to five consecutive data points leads to the moving average (Exercise 1. An outline of various aspects such as the choice of the degree of the polynomial and further background material is given in Section 5. . Fitting locally by least squares a polynomial of degree p to 2k + 1 > p consecutive data points yt−k . 12.2 of Simonoff (1996).2 Linear Filtering of Time Series it is actually a moving average.4. 35 An extensive discussion of local polynomial fit is in Kendall and Ord (1993). . . given by the first row of the matrix (X T X)−1 X T .2.2. 12. Since this solution is unique. Example 1. Sections 3. We . (cu ) is a moving average.18) k 27 β0 = u=−k cu yt+u with some cu ∈ R which do not depend on the values yu of the time series and hence. we obtain k 1 = β0 = u=−k cu and thus. First observe that we can write by (1. yt+k and predicting yt by the resulting intercept β0 . β1 = · · · = βp = 0 is an obvious solution of the minimization problem (1. . . leads to a moving average (cu ) of order 2k + 1.13 it actually has symmetric weights. As can be seen in Exercise 1. Choose to this end yt+u = 1 for u = −k. . −3)T . We summarize our considerations in the following result.13. . .1.15).3.

. . n. For a polynomial f (t) := c0 + c1 t + · · · + cp tp of degree p. a1 = −1 is the first order difference filter. . t = p. k The preceding lemma shows that differencing reduces the degree of a polynomial. Lemma 1.28 Elements of Exploratory Time Series Analysis will next show that also a polynomial trend function can be removed by a suitable linear filter. and ∆q f (t) := ∆(∆q−1 f (t)). The linear filter ∆Yt = Yt − Yt−1 with weights a0 = 1. weights a0 = 1. for example. is a polynomial of degree at most p − q. The recursively defined filter ∆p Yt = ∆(∆p−1 Yt ). Proof.5. The assertion is an immediate consequence of the binomial expansion p (t − 1) = p k=0 p k t (−1)p−k = tp − ptp−1 + · · · + (−1)p . . . ∆2 f (t) := ∆f (t) − ∆f (t − 1) = ∆(∆f (t)) is a polynomial of degree not greater than p − 2. Hence. a1 = −2. is the difference filter of order p. The difference filter of second order has.2. 1 ≤ q ≤ p. The function ∆p f (t) is therefore a constant. the difference ∆f (t) := f (t) − f (t − 1) is a polynomial of degree at most p − 1. a2 = 1 ∆2 Yt = ∆Yt − ∆Yt−1 = Yt − Yt−1 − Yt−1 + Yt−2 = Yt − 2Yt−1 + Yt−2 .

Time series in economics often have a trend function that can be removed by a first or second order difference filter. While the original data show an increasing trend. (Electricity Data). then the difference filter ∆p Yt of order p removes this trend up to a constant.2. . the second order differences fluctuate around zero having no more trend.6. The following plots show the total annual output of electricity production in Germany between 1955 and 1979 in millions of kilowatt-hours as well as their first and second order differences.1. 29 Example 1. but there is now an increasing variability visible in the data.2 Linear Filtering of Time Series p If a time series Yt has a polynomial trend Tt = k=0 ck tk for some constants ck .

TITLE2 ’ Electricity Data ’. /* Read in the data . INFILE ’c :\ data \ electric . txt ’. sas */ TITLE1 ’ First and second order differences ’.5 W =1. delta2 = DIF ( delta1 ) . /* Graphical options */ AXIS1 LABEL = NONE . SYMBOL1 V = DOT C = GREEN I = JOIN H =0. first and second order differences. delta1 = DIF ( sum ) . compute moving average of length as 12 as well as first and second order differences */ DATA data1 ( KEEP = year sum delta1 delta2 ) .2.30 Elements of Exploratory Time Series Analysis Plot 1. .3a: Annual electricity output. INPUT year t jan feb mar apr may jun jul aug sep oct nov dec . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /* e l e c t r i c i t y _ d i f f e r e n c e s . sum = jan + feb + mar + apr + may + jun + jul + aug + sep + oct + nov + dec .

2 Linear Filtering of Time Series 31 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 /* Generate three plots */ GOPTIONS NODISPLAY . To display the three plots of sum. TEMPLT . The option NOFS (no full-screen) suppresses the opening of a GREPLAY window. The DELETE statement after RUN deletes all entries in the input graphics catalog. PLOT delta2 * year / VAXIS = AXIS1 VREF =0. For a time series Yt = Tt + St + Rt with a periodic seasonal component St = St+p = St+2p = . Using the DIF function. PLOT sum * year / VAXIS = AXIS1 HAXIS = AXIS2 . they are first plotted using the procedure GPLOT. the resulting variables delta1 and delta2 contain the first and second order differences of the original annual sums. An additional differencing of proper length can moreover remove a polynomial trend. the raw data are read from a file. DELETE _ALL_ .3: Computation of first and second order differences for the Electricity Data. RUN . After changing the GOPTIONS back to DISPLAY. The TEMPLATE statement selects a template from this catalog. Here these border lines are suppressed by defining WHITE as the border color.2. Note that SAS by default prints borders. PLOT delta1 * year / VAXIS = AXIS1 VREF =0. QUIT . delta1 and delta2 against the variable year within one graphic. PROC GPLOT DATA = data1 GOUT = fig . /* Display them in one output */ GOPTIONS DISPLAY . GPLOT.TEMPLT causes SAS to take the standard template catalog. in order to separate the different plots. PROC GREPLAY NOFS IGOUT = fig TC = SASHELP . . RUN . . while TC=SASHELP. The option IGOUT determines the input graphics catalog. TEMPLATE = V3 . the procedure GREPLAY is invoked. the sum must be evaluated to get the annual output. too. In the first data step. Program 1. while GOPTIONS NODISPLAY causes no output of this procedure. Here the option GOUT=fig stores the plots in a graphics catalog named fig. The subsequent two line mode statements are read instead. TREPLAY 1: GPLOT 2: GPLOT1 3: GPLOT2 . . the difference Yt∗ := Yt − Yt−p obviously removes the seasonal component. which puts three graphics one below the other. The TREPLAY statement connects the defined areas and the plots of the the graphics catalog. Note that the order of seasonal and trend adjusting makes no difference. Because the electric production is stored in different variables for each month of a year. GPLOT1 and GPLOT2 are the graphical outputs in the chronological order of the GPLOT procedure.1.

If the assertion holds for t. Yn be a time series and let α ∈ [0. . a car having an analog speedometer with a hand. an α close to 0 reduces the influence of Yt and puts most of the weight to the past observations. which can be achieved by α close to zero. . .7. Take. Proof. 2. We have for t = 1 by definition Y1∗ = αY1 + (1 − α)Y0 . . has the effect that an essential alteration of the speed can be read from the speedometer only with a certain delay. .2. 1] we have t−1 Yt∗ =α j=0 (1 − α)j Yt−j + (1 − α)t Y0 . n. with Y0∗ = Y0 is called exponential smoother. resulting in a highly fluctuating series Yt∗ . yielding a smooth series Yt∗ . t ≥ 1. on the other hand. . Lemma 1. On the other hand. It is more convenient for the driver if the movements of this hand are smoothed. For an exponential smoother with constant α ∈ [0. The assertion follows from induction. . But this. . The parameter α determines the smoothness of the filtered time series. A value of α close to 1 puts most of the weight on the actual observation Yt .32 Elements of Exploratory Time Series Analysis Exponential Smoother Let Y0 . t = 1. for example. The linear filter ∗ Yt∗ = αYt + (1 − α)Yt−1 . . 1] be a constant. we obtain for t + 1 ∗ Yt+1 = αYt+1 + (1 − α)Yt∗ t−1 j=0 = αYt+1 + (1 − α) α t (1 − α)j Yt−j + (1 − α)t Y0 =α j=0 (1 − α)j Yt+1−j + (1 − α)t+1 Y0 . An exponential smoother is typically used for monitoring a system.

. Y1 .2. After a change point N . . (i) Suppose that the random variables Y 0 . The influence of these variables on the . (1. then t−1 E((Yt∗ − µ) ) = α 2 2 j=0 (1 − α)2j σ 2 + (1 − α)2t σ 2 1 − (1 − α)2t + (1 − α)2t σ 2 2 1 − (1 − α) 2 t→∞ σ α < σ2.e. .19) = µ(1 − (1 − α)t ) + µ(1 − α)t = µ. t < N . . . which will vanish as t increases.2 Linear Filtering of Time Series Corollary 1. . where the expectation of Yt changes for t ≥ N from µ to λ = µ. satisfy E(Yt ) = µ for 0 ≤ t ≤ N − 1. Then we have for the exponentially smoothed variables with smoothing parameter α ∈ (0. is due to the still inherent influence of past observations Yt . Yn have common expectation µ and common variance σ 2 > 0. the filtered variables Yt∗ are.20) (ii) Suppose that the random variables Y0 .. the smoothness of the filtered series Yt∗ . then this expectation carries over to Yt∗ . and E(Yt ) = λ for t ≥ N . 1) t−1 33 E(Yt∗ ) =α j=0 (1 − α)j µ + µ(1 − α)t (1. This bias.21) The preceding result quantifies the influence of the parameter α on the expectation and on the variance i. however. where we assume for the sake of a simple computation of the variance that the Yt are uncorrelated. Then we have for t ≥ N t−N t−1 E(Yt∗ ) =α j=0 (1 − α) λ + α j j=t−N +1 (1 − α)j µ + (1 − α)t µ = λ(1 − (1 − α)t−N +1 ) + µ (1 − α)t−N +1 (1 − (1 − α)N −1 ) + (1 − α)t t→∞ −→ λ. biased.1.8. If the Yt are in addition uncorrelated. If the variables Yt have common expectation µ. −→ 2−α = σ 2 α2 (1. .

See Exercise 2. Yt ) = E((Yt+k − E(Yt+k ))(Yt − E(Yt ))) of observations with lag k does not depend on t. Y1 ) = Cov(Yk+2 . . 1. The graph of the function r(k). Suppose that Y1 .16). . . . . . Y2 ) = . Yn . is called autocorrelation function. . . .8 (ii) for the particular role of the factor 1/n in place of 1/(n − k) in the definition of c(k). Let y1 . yn be realizations of a time series Y1 . be used for a . An exponential smoother is often also used to make forecasts. k = 0.3 Autocovariances and Autocorrelations Autocovariances and autocorrelations are measures of dependence between variables in a time series. explicitly by predicting Yt+1 through Yt∗ . Then γ(k) := Cov(Yk+1 . n − 1. . It is based on the assumption of equal expectations and should. . γ(0) k = 0. The price for the gain in correctness of the expectation is. . therefore. The empirical counterpart of the autocovariance function is 1 c(k) := n n−k 1 (yt+k − y )(yt − y ) with y = ¯ ¯ ¯ n t=1 n yt t=1 and the empirical autocorrelation is defined by c(k) r(k) := = c(0) n−k ¯ t=1 (yt+k − y )(yt − n ¯2 t=1 (yt − y ) y) ¯ . . 1. however. . . .34 Elements of Exploratory Time Series Analysis current expectation can be reduced by switching to a larger value of α. . . a higher variability of Yt∗ (see Exercise 1. The forecast error Yt+1 −Yt∗ =: et+1 ∗ then satisfies the equation Yt+1 = αet+1 + Yt∗ . . 1. . is called correlogram. Yn are square integrable random variables with the property that the covariance Cov(Yt+k . is called autocovariance function and ρ(k) := γ(k) . .

1a: Correlogram of the first order differences of the Sunspot Data. The following plot is the correlogram of the first order differences of the Sunspot Data. generate year of observation and compute first order differences */ DATA data1 . diff1 = DIF ( spot ) . INPUT spot @@ . The description can be found on page 199. /* Graphical options */ AXIS1 LABEL =( ’ r ( k ) ’) .1. TITLE2 ’ Sunspot Data ’.3. It shows high and decreasing correlations at regular intervals.3 Autocovariances and Autocorrelations trend adjusted series. IDENTIFY VAR = diff1 NLAG =49 OUTCOV = corr NOPRINT . txt ’. /* Read in the data . INFILE ’c :\ data \ sunspot . date =1748+ _N_ . 35 Plot 1. . /* Compute autocorrelation function */ PROC ARIMA DATA = data1 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 /* sunspot_correlogram */ TITLE1 ’ Correlogram of first order differences ’.

In the data step. which displays monthly totals in thousands of international airline passengers from January . QUIT . SYMBOL1 V = DOT C = GREEN I = JOIN H =0.1: Generating the correlogram of first order differences for the Sunspot Data. 19 20 21 22 23 24 25 Program 1.5 W =1. Here we just need the autocorrelation of delta.3. These two are used in the following GPLOT procedure to obtain a plot of the autocorrelation function. RUN . /* Plot autocorrelation function */ PROC GPLOT DATA = corr . PLOT CORR * LAG / VAXIS = AXIS1 HAXIS = AXIS2 VREF =0.1.3. The autocovariance function γ obviously satisfies γ(0) ≥ 0 and. by the Cauchy-Schwarz inequality |γ(k)| = | E((Yt+k − E(Yt+k ))(Yt − E(Yt )))| ≤ E(|Yt+k − E(Yt+k )||Yt − E(Yt )|) ≤ Var(Yt+k )1/2 Var(Yt )1/2 = γ(0) for k ≥ 0. which will be calculated up to a lag of 49 (NLAG=49) by the IDENTIFY statement. Example 1. Plot 1. The specification @@ suppresses the automatic line feed of the INPUT statement after every entry in each row.36 Elements of Exploratory Time Series Analysis AXIS2 LABEL =( ’k ’) ORDER =(0 12 24 36 48) MINOR =( N =11) . The option OUTCOV=corr causes SAS to create a data set corr containing among others the variables LAG and CORR. The following procedure ARIMA is a crucial one in time series analysis. Variance Stabilizing Transformation The scatterplot of the points (t. yt ) sometimes shows a variation of the data yt depending on their height. The ORDER option in the AXIS2 statement specifies the values to appear on the horizontal axis as well as their order.2a. Thus we obtain for the autocovariance function the inequality |ρ(k)| ≤ 1 = ρ(0). (Airline Data). The variable date and the first order differences of the variable of interest spot are calculated. and the MINOR option determines the number of minor tick marks between two major ticks.3. the raw data are read into the variable spot. VREF=0 generates a horizontal reference line through the value 0 on the vertical axis.

a discussion can be found in Section 9. exemplifies the above mentioned dependence. 13 14 15 .2. txt ’. 1 2 3 4 5 6 7 8 9 10 11 12 /* airline_plot . /* Graphical options */ AXIS1 LABEL = NONE ORDER =(0 12 24 36 48 60 72 84 96 108 120 132 144) → MINOR =( N =5) . INFILE ’c :\ data \ airline . SYMBOL1 V = DOT C = GREEN I = JOIN H =0.3 Autocovariances and Autocorrelations 1949 to December 1960. These Airline Data are taken from Box and Jenkins (1976).2 of Brockwell and Davis (1991). TITLE2 ’ Airline Data ’. AXIS2 LABEL =( ANGLE =90 ’ total in thousands ’) . /* Read in the data */ DATA data1 .3.1. INPUT y . sas */ TITLE1 ’ Monthly totals from January 49 to December 60 ’.2a: Monthly totals in thousands of international airline passengers from January 1949 to December 1960. t = _N_ . 37 Plot 1.

which are symbolized by small dots.38 Elements of Exploratory Time Series Analysis /* Plot the data */ PROC GPLOT DATA = data1 . .3a: Logarithm of Airline Data xt = log(yt ).2: Plotting the airline passengers In the first data step. it counts the observations. The variation of the data yt obviously increases with their height. the monthly passenger totals are read into the variable y. RUN . The logtransformed data xt = log(yt ). The passenger totals are plotted against t with a line joining the data points. PLOT y * t / HAXIS = AXIS1 VAXIS = AXIS2 . 16 17 18 19 Program 1. displayed in the following figure. sas */ TITLE1 ’ Logarithmic transformation ’. 1 2 3 /* airline_log .3. show no dependence of variability from height. On the horizontal axis a label is suppressed. the temporarily created SAS variable N is used. To get a time variable t. QUIT .3. Plot 1. however. TITLE2 ’ Airline Data ’.

QUIT . and Zt . are independent copies of a positive random variable Z with variance 2 1. PLOT x * t / HAXIS = AXIS1 VAXIS = AXIS2 .19). The only differences are the log-transformation by means of the LOG function and the suppressed label on the vertical axis. /* Graphical options */ AXIS1 LABEL = NONE ORDER =(0 12 24 36 48 60 72 84 96 108 120 132 144) →MINOR =( N =5) . The plot of the log-transformed data is done in the same manner as for the original data in Program 1.3: Computing and plotting the logarithm of Airline Data. namely the variance of log(Z). The logarithm is a particular case of the general Box–Cox (1964) transformation Tλ of a time series (Yt ). can be illustrated as follows. λ > 0 Yt > 0. Suppose.2 (airline plot. 14 15 16 17 18 19 20 Program 1. The variance of Yt is in this case σt . t ∈ Z. txt ’.3 Autocovariances and Autocorrelations 39 4 5 6 7 8 9 10 11 12 13 /* Read in the data and compute log . RUN . where σt > 0 is a scale factor depending on t. which reduces the dependence of the variability on their height. INFILE ’c \ data \ airline . is called variance stabilizing.transformed data */ DATA data1 . if it exists. /* Plot log .2.1. Yt ≥ 0. for example. = T0 (Yt ) = log(Yt ) if Yt > 0 (Exercise 1.3.3. t = _N_ . whereas the variance of log(Yt ) = log(σt )+log(Zt ) is a constant. INPUT y .sas). which are of the form Yt = σt Zt . . log(Yt ). A transformation of the data. that the data were generated by random variables.transformed data */ PROC GPLOT DATA = data1 . The fact that taking the logarithm of data often reduces their variability. where the parameter λ ≥ 0 is chosen by the statistician: Tλ (Yt ) := Note that limλ 0 Tλ (Yt ) (Ytλ − 1)/λ. SYMBOL1 V = DOT C = GREEN I = JOIN H =0. x = LOG ( y ) . AXIS2 LABEL = NONE . λ = 0.

usually precedes any further data manipulation such as trend or seasonal adjustment. . A variance stabilizing transformation of the data. ˆ 1. use β2 from Exercise 1. . Show that in a linear regression model yt = β1 xt +β2 . . Compute the least ˆ squares estimates a. Exercises 1. β2 and yt := β1 xt + β2 is necessarily between ˆ zero and one with R2 = 1 if and only if yt = yt .5) zt := 1/yt ∼ 1/ E(Yt ) = 1/flog (t). Use PROC NLIN and do an ex post-analysis. . β3 using PROC GPLOT.3. 1.5).1 lists total population numbers of North Rhine-Westphalia between 1961 and 1979. ˆ of a. β3 using PROC ˆ ˜ REG. Plot the Mitscherlich function for different values of β1 . Then we have the linear regression model zt = a + bzt−1 + εt . . t = 1. . . where εt is the error variable. 1. see also Exercise 1.1. n (see ˆ (1. .3. n. if necessary. Compare these two procedures by their residual sums of squares.5. b and motivate the estimates β1 := − log(ˆ ˆ b b). Suppose a logistic ˆ ˆ trend for these data and compute the estimators β1 . the squared multiple correlation coefficient R2 based on the least ˆ ˆ ˆ ˆ squares estimates β1 . Motivate ˆ the following substitute of β2 n ˜ β2 = t=1 ˆ β3 − y t ˆ exp −β1 t yt n t=1 ˆ exp −2β1 t as an estimate of the parameter β2 in the logistic trend model (1. . t = 0. The estimate β2 defined above suffers from the drawback that all ˆ observations yt have to be strictly less than the estimate β3 . (Population2 Data) Table 1. .12)). .3. proposed by Tintner (1958).3 and do an ex post-analysis. ˆ3 := (1 − exp(−β1 ))/ˆ as well as ˆ β a n+1ˆ 1 ˆ β2 := exp β1 + 2 n n t=1 ˆ β3 −1 log yt . . 1.2. t = 1.4. . Since some observations exceed β3 . n. β2 . Put in the logistic trend model (1.40 Elements of Exploratory Time Series Analysis Popular choices of the parameter λ are 0 and 1/2.

(Income Data) Suppose an allometric trend function for the income data in Example 1. .835 5 17.1.10. 1. Compute simple moving averages of order 3 and 5 to estimate a possible trend.7. 1.176 9 17.3.2 lists total numbers of unemployed (in thousands) in West Germany between 1950 and 1993.3 lists West Germany’s public expenditures (in billion D-Marks) between 1961 and 1990.091 7 17. Plot the ˆ ˆ data yt versus β2 tβ1 .9. Estimate the parameters also with PROC NLIN and compare the results.044 6 17. 1. Which one gives the better fit? 1. Compare a logistic trend function with an allometric one.1: Population2 Data.052 10 17.002 t 41 Table 1. (Unemployed Females Data) Use PROC X11 to analyze the monthly unemployed females between ages 16 and 19 in the United States from January 1961 to December 1985 (in thousands).8. Show that the rank of a matrix A equals the rank of AT A. (Unemployed2 Data) Table 1.3 and do a regression analysis.3. 1. Give an update equation for a simple moving average of (even) order 2s.Exercises Year 1961 1963 1965 1967 1969 1971 1973 1975 1977 1979 Total Population in millions 1 15. Plot the original data as well as the filtered ones and compare the curves.6.280 3 16.3.11.661 4 16. To this end compute the R2 -coefficient. (Public Expenditures Data) Table 1. 1.920 2 16.223 8 17.

8 1979 669.6 140.1 Table 1.2 1965 170.3 1987 949.4 1962 129.3 1972 341.7 1978 620. Year Public Expenditures 1961 113.42 Elements of Exploratory Time Series Analysis Year Unemployed 1950 1869 1960 271 1970 149 1074 1975 1980 889 1985 2304 1988 2242 1989 2038 1990 1883 1991 1689 1992 1808 2270 1993 Table 1. .4 1963 1964 153.3: Public Expenditures Data.6 1966 1967 193.4 1981 766.6 1968 211.1 1989 1018.0 1973 386.9 1990 1118.2 181.3 1970 264.1 1969 233.1 Year Public Expenditures 1976 546.2: Unemployed2 Data.5 1986 912.0 1983 816.2 1982 796.8 1975 509.1 1971 304.5 1974 444.2 1977 582.3.8 1980 722.3.0 1985 875.4 1984 849.6 1988 991.

1.13. Compute under the assumptions of Corollary 1.8 the variance of an exponentially filtered variable Yt∗ after a change point t = N with σ 2 := E(Yt − µ)2 for t < N and τ 2 := E(Yt − λ)2 for t ≥ N .. t = 1. . What is the limit for t → ∞? 1. 12.12.2. c−u = cu . −3)T . where b0 .Exercises 1. 17. shares this property. 12. (iii) (cu ) is symmetric. i. if i + j is odd. . Let (cu ) be the moving average derived by the best local polynomial fit.15. .17. 1.4 lists the percentages to annual bancruptcies among all US companies between 1867 and 1932: Compute and plot the empirical autocovariance function and the empirical autocorrelation function using the SAS procedures PROC ARIMA and PROC GPLOT. 1. 1. Show that (i) fitting locally a polynomial of degree 2 to five consecutive data points leads to (cu ) = 1 (−3.14. 35 43 (ii) the inverse matrix A−1 of an invertible m × m-matrix A = (aij )1≤i. (Bankruptcy Data) Table 1. 1) and compare the results.j≤m with the property that aij = 0. Compare the results with those of PROC X11. b1 = 0 and the εt are independent 2 normal random variables with mean µ and variance σ1 if t ≤ 69 but 2 2 variance σ2 = σ1 if t ≥ 70.3. .17) are linear independent. The p + 1 columns of the design matrix X in (1. (Unemployed1 Data) Compute a seasonal and trend adjusted time series for the Unemployed1 Data in the building trade.e.16. To this end compute seasonal differences and first order differences. . Use the SAS function RANNOR to generate a time series Yt = b0 +b1 t+εt . 100. Plot the exponentially filtered variables Yt∗ for different values of the smoothing parameter α ∈ (0.

19.21 1. that the correlogram has no interpretation for non-stationary processes (see Exercise 1.33 0.32 1.18.92 1. .3. λ↓0 Yt > 0 for the Box–Cox transformation Tλ .26 0. Plot the correlogram for different values of n.83 0.59 0. t = 1. r(k) = 1 − 3 + 2 n n(n2 − 1) k = 0. 1.77 1.20 1. n is given by k(k 2 − 1) k . This example shows.4: Bankruptcy Data.25 0.01 Table 1.33 1.93 1.84 0.31 0.36 0.49 1.17).90 0.83 0.00 1.10 1. .80 1.94 0.06 1.08 0.79 0.55 1.04 Elements of Exploratory Time Series Analysis 0.88 1.99 1.01 1.02 1.16 1.01 1.85 1. .21 0.93 0.97 1. n.97 1.53 0.28 0.99 0.95 1. .09 0.58 1.61 1.88 0.38 1. Show that lim Tλ (Yt ) = T0 (Yt ) = log(Yt ).94 1.87 0.44 1. . 1.19 1.83 0.02 1.98 0.81 0.07 0.61 0.00 1.08 0. . .94 0.33 1. Verify that the empirical correlation r(k) at lag k for the trend yt = t.07 0. .10 1.92 0.77 0. .04 0.

In addition we have E(Y ) = E(Y ). In the following we will introduce several models for such a stochastic process Yt with index set Z. . . Yn can be viewed as a clipping from a sequence of random variables . v ∈ R}. and in this case we define the expectation of Y by E(Y ) := E(Y(1) ) + i E(Y(2) ) ∈ C. Y−2 . . Y1 . Y0 . we define the covariance . The random variable Y is called integrable if the real valued random variables Y(1) . Here a and b are complex numbers and Z is a further integrable complex ¯ valued random variable. To carry the equation Var(X) = Cov(X. whose range is the set of complex numbers C = √ {u + iv : u. . up to monotonicity. . Since ¯ |a|2 := u2 + v 2 = a¯ = aa.Chapter Models of Time Series Each time series Y1 . where a = u − iv denotes the conjugate complex number of a = u + iv. . we can decompose Y as Y = Y(1) + iY(2) . X) for a real random variable X over to complex ones. .1 Linear Filters and Stochastic Processes For mathematical convenience we will consider complex valued random variables Y . . Y−1 . The complex random variable Y is called square integrable if this number is finite. where i = −1. the usual properties such as E(aY + bZ) = a E(Y ) + b E(Z) of its real counterpart. . . we define the variance of Y by a ¯ Var(Y ) := E((Y − E(Y ))(Y − E(Y ))) ≥ 0. 2 2. Y2 . where Y(1) = Re(Y ) is the real part of Y and Y(2) = Im(Y ) is its imaginary part. Therefore. This expectation has. Y(2) both have finite expectations.

Z) = Cov(Z. Observe that Re(e−iϑ Y ) = Re (cos(ϑ) − 2 Y(2) )1/2 = |Y | by the Cauchy–Schwarz inequality for real numbers. . 2 2 The second inequality of the lemma follows from |Y | = (Y(1) +Y(2) )1/2 ≤ |Y(1) | + |Y(2) |. Z) := E((Y − E(Y ))(Z − E(Z))). Lemma 2. as it is for real valued random variables. but it satisfies Cov(Y. The next result is a consequence of the preceding lemma and the Cauchy–Schwarz inequality for real valued random variables. | Cov(Y. We write E(Y ) in polar coordinates E(Y ) = reiϑ . 2π). For any square integrable complex valued random variable we have | E(Y Z)| ≤ E(|Y ||Z|) ≤ E(|Y |2 )1/2 E(|Z|2 )1/2 and thus. For any integrable complex valued random variable Y = Y(1) + iY(2) we have | E(Y )| ≤ E(|Y |) ≤ E(|Y(1) |) + E(|Y(2) |). The following lemma implies that the Cauchy–Schwarz inequality carries over to complex valued random variables. Thus we obtain 2 i sin(ϑ))(Y(1) +iY(2) ) = cos(ϑ)Y(1) +sin(ϑ)Y(2) ≤ (cos2 (ϑ)+sin2 (ϑ))1/2 (Y(1) + | E(Y )| = r = E(e−iϑ Y ) = E Re(e−iϑ Y ) ≤ E(|Y |). where r = | E(Y )| and ϑ ∈ [0.1. Z)| ≤ Var(Y )1/2 Var(Z)1/2 . Y ). Z by Cov(Y. Z) is no longer symmetric with respect to Y and Z.46 Models of Time Series of complex square integrable random variables Y. Note that the covariance Cov(Y.1. Corollary 2. Proof.1.2.

In Section 1.2 we defined linear filters of a time series. Y0 ) = γ(s − t). Y0 ) =: γ(t − s) = Cov(Y0 . k ∈ Z E(Yt1 ) = E(Yt1 +k ) and E(Yt1 Y t2 ) = E(Yt1 +k Y t2 +k ). s) : = Cov(Yt . A stationary process (εt )t∈Z of square integrable and uncorrelated real valued random variables is called white noise i. which were based on a finite number of real valued weights.1 Linear Filters and Stochastic Processes 47 Stationary Processes A stochastic process (Yt )t∈Z of square integrable complex valued random variables is said to be (weakly) stationary if for any t1 . Cov(εt .e. Yt = u=−∞ au εt−u is well defined. the autocovariance function of a stationary process can be viewed as a function of a single argument satisfying γ(t) = γ(−t). Yt−s ) = Cov(Ys−t . is called a general linear process. t2 . thus. σ ≥ 0 such that E(εt ) = µ. Existence of General Linear Processes ∞ We will show that u=−∞ |au εt−u | < ∞ with probability one for ∞ t ∈ Z and. and thus. εs ) = 0 for t = s and there exist µ ∈ R. Then (at )t∈Z is said to be an absolutely summable (linear) filter and Yt := ∞ u=−∞ au εt−u := u≥0 au εt−u + u≥1 a−u εt+u . Suppose that (εt )t∈Z is a white noise and let (at )t∈Z be a sequence of complex numbers satisfying ∞ t=−∞ |at | := t≥0 |at |+ t≥1 |a−t | < ∞. t ∈ Z. E((εt − µ)2 ) = σ 2 .. Denote by . In the following we consider linear filters with an infinite number of complex valued weights. t ∈ Z.2. t ∈ Z γ(t. The autocovariance function satisfies moreover for s. Ys ) = Cov(Yt−s . t ∈ Z. The random variables of a stationary process (Yt )t∈Z have identical means and variances.

P) the set of all complex valued square integrable random variables. Then there exists X ∈ L2 such that limn→∞ Xn = X with probability one. Let Xn . and put ||Y ||2 := E(|Y |2 )1/2 . defined on some probability space (Ω. the Cauchy–Schwarz inequality and Corollary 2. we check that X ∈ L2 : E(|X|2 ) = E( lim |Xn |2 ) n→∞ 2 ≤E n→∞ lim k≤n |Xk − Xk−1 | |Xk − Xk−1 | = lim E n→∞ k≤n 2 = lim n→∞ k. This implies that k≥1 |Xk − Xk−1 | < ∞ with probability one and hence. be a sequence in L2 such that ||Xn+1 − Xn ||2 ≤ 2−n for each n ∈ N. P). By the monotone convergence theorem.3. Finally. the limit limn→∞ k≤n (Xk − Xk−1 ) = limn→∞ Xn = X exists in C almost surely.48 Models of Time Series L2 := L2 (Ω. n ∈ N.1. A. Write Xn = k≤n (Xk − Xk−1 ). .j≤n E(|Xk − Xk−1 | |Xj − Xj−1 |) ||Xk − Xk−1 ||2 ||Xj − Xj−1 ||2 ||Xk − Xk−1 ||2 2 ≤ lim sup n→∞ k.2 we have E k≥1 |Xk − Xk−1 | = k≥1 E(|Xk − Xk−1 |) ≤ k≥1 k≥1 ||Xk − Xk−1 ||2 ≤ ||X1 ||2 + 2−k < ∞. which is the L2 -pseudonorm on L2 . Lemma 2. A.j≤n = lim sup n→∞ k≤n = k≥1 ||Xk − Xk−1 ||2 < ∞.1. Proof. where X0 := 0.

Suppose that (Zt )t∈Z is a complex valued stochastic process such that supt E(|Zt |) < ∞ and let (at )t∈Z be an absolutely summable filter. then we have E(|Yt |2 ) < ∞. and (iii) ||Yt − n→∞ n u=−n au Zt−u ||2 −→ 0. and thus we have limn→∞ ||Xn − X||2 = 0.4. Theorem 2.3 there exists a random variable X ∈ L2 such that limk→∞ Xnk = X with probability one. Proof. Yt := u∈Z au Zt−u exists almost surely in C. 49 By Lemma 2. Then there exists a random variable X ∈ L2 such that limn→∞ ||X −Xn ||2 = 0. supt E(|Zt |2 ) < ∞. The space (L2 . t ∈ Z. Fatou’s lemma implies ||Xn − X||2 = E(|Xn − X|2 ) 2 k→∞ = E lim inf |Xn − Xnk |2 ≤ lim inf ||Xn − Xnk ||2 . n→∞ n u=−n au Zt−u |) −→ 0. t ∈ Z. 2 The following result implies in particular that a general linear process is well defined. The monotone convergence theorem implies n E u∈Z |au | |Zt−u | ≤ lim n→∞ u=−n |au | sup E(|Zt−u |) < ∞ t∈Z . We have moreover E(|Yt |) < ∞.1. . suppose that Xn ∈ L2 . n ∈ N. has the property that for arbitrary ε > 0 one can find an integer N (ε) ∈ N such that ||Xn −Xm ||2 < ε if n. Then we have u∈Z |au Zt−u | < ∞ with probability one for t ∈ Z and. thus.e.5. || · ||2 ) is complete i. If.1 Linear Filters and Stochastic Processes Theorem 2. .1. m ≥ N (ε). We can obviously find integers n1 < n2 < .2.. 2 k→∞ The right-hand side of this inequality becomes arbitrarily small if we choose n large enough. and (i) E(Yt ) = limn→∞ (ii) E(|Yt − n u=−n au E(Zt−u ). m ≥ nk . t ∈ Z. in addition.1. such that ||Xn − Xm ||2 ≤ 2−k ifn. Proof.

n→∞ Put K := supt E(|Zt |2 ) < ∞. the dominated convergence theorem implies (ii) and therefore (i): n | E(Yt ) − u=−n au E(Zt−u )| = | E(Yt ) − E(Xn (t))| ≤ E(|Yt − Xn (t)|) −→ 0. This implies P {|Yt − X(t)| ≥ ε} ≤ P {|Yt − Xn (t)| + |Xn (t) − X(t)| ≥ ε} ≤ P {|Yt − Xn (t)| ≥ ε/2} + P {|X(t) − Xn (t)| ≥ ε/2} −→n→∞ 0 ≤ K2   n+m |u|=n+1 |au | ≤ K 2  |u|≥n |au | < ε . By the inequality |Yt − Xn (t)| ≤ 2 u∈Z |au ||Zt−u |. thus. For the proof of (iii) it remains to show that X(t) = Yt almost surely.1. Then we have |Yt − Xn (t)| −→ 0 almost surely. The Cauchy–Schwarz inequality implies for m. and Chebyshev’s inequality yields P {|X(t) − Xn (t)| ≥ ε} ≤ ε−2 ||X(t) − Xn (t)||2 −→n→∞ 0 for arbitrary ε > 0.50 Models of Time Series and. Markov’s inequality implies P {|Yt − Xn (t)| ≥ ε} ≤ ε−1 E(|Yt − Xn (t)|) −→n→∞ 0 by (ii). n ∈ N. Theorem 2. t ∈ Z. Put Xn (t) := n→∞ n u=−n au Zt−u .4 now implies the existence of a random variable X(t) ∈ L2 with limn→∞ ||Xn (t) − X(t)||2 = 0. we have u∈Z |au ||Zt−u | < ∞ with probability one as well as E(|Yt |) ≤ E( u∈Z |au ||Zt−u |) < ∞. n ∈ N and ε > 0    E(|Xn+m (t) − Xn (t)|2 ) = E  = |u|=n+1 |w|=n+1 n+m 2 |u|=n+1 n+m n+m  au Zt−u  2 ¯ au aw E(Zt−u Zt−w ) ¯  2 if n is chosen sufficiently large.

t ∈ Z. therefore. Then Yt = u au Zt−u . ¯ Proof.1.1. which completes the proof of Theorem 2. We can. is also stationary with a u µZ µY = E(Y0 ) = u 51 and autocovariance function γY (t) = u w au aw γZ (t + w − u). ¯ .1. thus.2. t∈Z and.5 immediately implies E(Yt ) = ( u au )µZ and part (iii) implies for t. Theorem 2. s ∈ Z n n E((Yt − µY )(Ys − µY )) = lim Cov n→∞ n au Zt−u .5. Note that E(|Zt |2 ) = E |Zt − µZ + µZ |2 = E (Zt − µZ + µZ )(Zt − µz + µz ) = E |Zt − µZ |2 + |µZ |2 = γZ (0) + |µZ |2 sup E(|Zt |2 ) < ∞.5.6. u=−n n w=−n aw Zs−w = lim = lim = u n→∞ au aw Cov(Zt−u .1 Linear Filters and Stochastic Processes and thus Yt = X(t) almost surely. now apply Theorem 2. Zs−w ) ¯ u=−n w=−n n n u=−n w=−n n→∞ au aw γZ (t − s + w − u) ¯ w au aw γZ (t − s + w − u). Part (i) of Theorem 2. Suppose that (Zt )t∈Z is a stationary process with mean µZ := E(Z0 ) and autocovariance function γZ and let (at ) be an absolutely summable filter.1.

6 implies for t ∈ Z Cov(Yt . Note that |γZ (t)| ≤ γZ (0) < ∞ and thus. Suppose that Yt = u au εt−u . We assume that there exists a real number r > 1 such that G(z) is defined for all z ∈ C in the annulus 1/r < |z| < r. if r −1 < |z| < r for some r > 1. Proof. The process (Yt ) then has the covariance generating function G(z) = σ 2 u au z u u au z −u . the covariance generating function of a stationary process is a constant function if and only if this process is a white noise i. Theorem 2.g. . The covariance generating function will help us to compute the autocovariances of filtered processes. only on the difference t−s.1. i. γ(t) = 0 for t = 0.11 in Conway (1975)). is a general linear process with u |au ||z u | < ∞.e. (Yt ) is a stationary process. Put σ 2 := Var(ε0 ). therefore.7. t ∈ Z. u w |au aw γZ (t−s+ ¯ 2 w − u)| ≤ γZ (0)( u |au |) < ∞.e.. 1. Chapter V.. Y0 ) = =σ u 2 u w au aw γε (t + w − u) ¯ ¯ au au−t .1.52 Models of Time Series The covariance of Yt and Ys depends. Theorem 2. known as a Laurent series in complex analysis. The Covariance Generating Function The covariance generating function of a stationary process with autocovariance function γ is defined as the double series G(z) := t∈Z γ(t)z t = t≥0 γ(t)z t + t≥1 γ(−t)z −t . Since the coefficients of a Laurent series are uniquely determined (see e. ¯ r−1 < |z| < r.

9 σ2 γ(2) = γ(−2) = . ¯ Example 2. The Characteristic Polynomial Let (au ) be an absolutely summable filter.8. The covariance generating function of the simple moving average Yt = u au εt−u with a−1 = a0 = a1 = 1/3 and au = 0 elsewhere is then given by σ 2 −1 G(z) = (z + z 0 + z 1 )(z 1 + z 0 + z −1 ) 9 σ 2 −2 = (z + 2z −1 + 3z 0 + 2z 1 + z 2 ).2. The Laurent series A(z) := u∈Z au z u . z ∈ R.1 Linear Filters and Stochastic Processes This implies G(z) = σ 2 t u 53 au au−t z t ¯ |au |2 + |au |2 + t = σ2 u au au−t z t + ¯ t≥1 u t≤−1 u au au−t z t ¯ au at z u−t ¯ u t≥u+1 = σ2 u au at z u−t + ¯ u t≤u−1 = σ2 u au at z u−t = σ 2 ¯ u au z u t at z −t . 3 2σ 2 . 9 γ(k) = 0 elsewhere. γ(1) = γ(−1) = This explains the name covariance generating function.1. 9 Then the autocovariances are just the coefficients in the above series σ2 γ(0) = . Let (εt )t∈Z be a white noise with Var(ε0 ) =: σ 2 > 0.

Let (au ) and (bu ) be absolutely summable filters with characteristic polynomials A1 (z) and A2 (z). 1. then A(z) exists for all complex z such that |z| ≥ 1. v ∈ Z.54 Models of Time Series is called characteristic polynomial of (au ). which both exist on some annulus r < |z| < R.g. Proof.1.9. then A(z) exists for all z = 0. for example. Filtering (Yt )t∈Z by means of (bu ) leads to bw Yt−w = w w u bw au Zt−w−u = v ( u+w=v bw au )Zt−v . Inverse Filters Let now (au ) and (bu ) be absolutely summable filters and denote by Yt := u au Zt−u . The product filter (cv ) = ( u+w=v bw au ) then has the characteristic polynomial A(z) = A1 (z)A2 (z). the filtered stationary sequence. If. . By repeating the above arguments we obtain A(z) = v u+w=v bw au z v = A1 (z)A2 (z).11 in Conway (1975)). is an absolutely summable filter: |bw au | = ( u v |cv | ≤ v u+w=v |au |)( w |bw |) < ∞. We call (cv ) the product filter of (au ) and (bu ). In the first case the coefficients au are uniquely determined by the function A(z) (see e. If au = 0 for all large |u|. where (Zu )u∈Z is a stationary process. Lemma 2. where cv := u+w=v bw au . Chapter V. (au ) is absolutely summable with au = 0 for u ≥ 1. We know from complex analysis that A(z) exists either for all z in some annulus r < |z| < R or almost nowhere.

then A2 (z) = u≥0 au z u exists for all |z| < 1/|a|. Let a ∈ C.2.10. u≥0 |au | < ∞. The filter (au ) with a0 = 1. t ∈ Z. In this case we obtain for a stationary process (Zt ) that almost surely Yt = u au Zt−u and w bw Yt−w = Zt . z ∈ C. i. which completes the proof. In this case we have bu = au . the uniquely determined coefficients of the characteristic polynomial of the product filter of (au ) and (bu ) are given by bw a u = u+w=v 55 1 if v = 0 0 if v = 0. Proof. Since the characteristic polynomial A2 (z) of an inverse filter satisfies A1 (z)A2 (z) = 1 on some annulus. Lemma 2. . called the inverse filter of (au ). 1 − az u≥0 As a consequence.e.1) The filter (bu ) is. The characteristic polynomial of (au ) is A1 (z) = 1−az. u ≥ 0. Since 1 = v v cv z if c0 = 1 and cv = 0 elsewhere. we have A2 (z) = 1/(1−az).. if |a| < 1. Observe now that 1 = au z u . If |a| ≥ 1. if |z| < 1/|a|. therefore. Causal Filters An absolutely summable filter (au )u∈Z is called causal if au = 0 for u < 0. which both exist on some annulus r < z < R.1 Linear Filters and Stochastic Processes Suppose now that (au ) and (bu ) are absolutely summable filters with characteristic polynomials A1 (z) and A2 (z). where they satisfy A1 (z)A2 (z) = 1. (2. then A2 (z) = u≥0 au z u exists for all |z| < 1 and the inverse causal filter (au )u≥0 is absolutely summable.1. but u≥0 |a|u = ∞. a1 = −a and au = 0 elsewhere has an absolutely summable and causal inverse filter (bu )u≥0 if and only if |a| < 1.

zp ∈ C (see e. where the filter with coefficients −(1/zi )u . u ≤ −1. (z − zp ) z z z =c 1− 1− . . 3. Let a1 . u ≥ 0. we have for |z| > |zi | 1 1− z zi =−z 1 zi 1 1− zi z zi =− z ziu z −u u≥0 =− u≤−1 1 zi u zu.g. zp . . . a1 . . 1 − . u ≥ 0. ap = 0. In case of |zi | < 1. ap and au = 0 elsewhere has an absolutely summable and causal inverse filter if the p roots z1 .. . . . . zp ∈ C of A(z) = 1 + a1 z + a2 z 2 + · · · + ap z p = 0 are outside of the unit circle i. ..1.5 in Conway (1975)). we have for |z| < 1 1 1− z zi = u≥0 1 zi u zu. z 1 − zi zi u≥0 where the coefficients (1/zi )u . which are all different from zero.56 Models of Time Series Theorem 2. We know from the Fundamental Theorem of Algebra that the equation A(z) = 0 has exactly p roots z1 . Proof. where the coefficients (1/zi )u . a2 . . since A(0) = 1. is not a causal one. ap ∈ C. . Chapter IV. A small analysis implies that this argument carries over to the product 1 = A(z) c 1 − 1 z z1 . . ... the u factor 1 − z/zi has an inverse 1/(1 − z/zi ) = u≥0 bu z on some annulus with u≥0 |bu | < ∞ if |zi | > 1.g. In case of |zi | > 1 we can write for |z| < |zi | 1 1 u u = z . Since the coefficients of a Laurent series are uniquely determined. . 3. In case of |zi | = 1. 1 − z zp . Chapter IV. . are absolutely summable. . . The filter (au ) with coefficients a0 = 1. |zi | > 1 for 1 ≤ i ≤ p.6 in Conway (1975)) A(z) = ap (z − z1 ) . are not absolutely summable.. .e. Hence we can write (see e. z1 z2 zp where c := ap (−1)p z1 .11. . .

Theorem 2. .1(z − 2)(z − 5). The filter with coefficients a0 = 1. The preceding proof shows. .. The preceding expansion implies that bv := (10/3)(2−(v+1) −5−(v+1) ). ap ∈ C and au = 0 elsewhere has an absolutely summable inverse filter if no root z ∈ C of the equation A(z) = a0 + a1 z + · · · + ap z p = 0 has length 1 i.11 implies the existence of an absolutely summable inverse causal filter. .2. Example 2. Remark 2.1 Linear Filters and Stochastic Processes which has an expansion 1/A(z) = u≥0 bu z u on some annulus with u≥0 |bu | < ∞ if each factor has such an expansion. . be complex valued as well. The additional condition |z| > 1 for each root then implies that the inverse filter is a causal one. however. .1. z2 = 5 being the roots of A(z) = 0. zp of A(z) = 1 + a1 z + · · ·+ap z p are complex valued and thus. .1.7 and a2 = 0. Note that the roots z1 . the proof is complete. are the coefficients of the inverse causal filter. that a filter (au ) with complex coefficients a0 . . moreover. then the coefficients bu .1z 2 = 0.12. a1 . .1.7z + 0. whose coefficients can be obtained by expanding 1/A(z) as a power series of z: 1 = A(z) 1− = v≥0 u+w=v v 57 1 z 2 1− 1 2 z 5 u = u≥0 1 2 u zu w≥0 1 5 w zw 1 5 1 5 w zv zv 10 3 1 2 v+1 = v≥0 w=0 1 2 v−w w = v≥0 1 2 v 1− 2 5 2 5 v+1 1− zv = v≥0 − 1 5 v+1 zv . the coefficients bu of the inverse causal filter will. u ≥ 0. .1 has the characteristic polynomial A(z) = 1 − 0. and thus. The preceding proof shows. v ≥ 0.13. |z| = 1 for each root. in general. with z1 = 2. are real as well. a1 = −0.e. that if ap and each zi are real numbers.

The coefficients of this expansion provide the autocovariance function γ(v) = Cov(Y0 .2 Moving Averages and Autoregressive Processes Let a1 . Y0 ) = γ(−v) = γ(v). Lemma 2.58 Models of Time Series 2. q−v w=0 σ 2  av+w aw .2.7 imply that a moving average Yt = q u=0 au εt−u is a stationary process with covariance generating function q q G(z) = σ 2 = σ2 au z u u=0 q q w=0 aw z −w au aw z u−w u=0 w=0 q = σ2 v=−q u−w=v q q−v au aw z u−w av+w aw z v . . is a MA(q)u=0 process. denoted by MA(q). Put µ := E(ε0 ) and σ 2 := V ar(ε0 ). v ∈ Z. 0 ≤ v ≤ q. v=−q w=0 = σ2 z ∈ C. . aq ∈ R with aq = 0 and let (εt )t∈Z be a white noise. q 2 w=0 aw . where σ 2 = Var(ε0 ). Suppose that Yt = q au εt−u . Yv ). Theorem 2.1. . The process Yt := εt + a1 εt−1 + · · · + aq εt−q is said to be a moving average of order q.1. (iii) Var(Y0 ) = γ(0) = σ 2  0. Put a0 = 1. (ii) γ(v) = Cov(Yv .6 and 2. . t ∈ Z.  v > q. which cuts off after lag q. Then we have (i) E(Yt ) = µ q u=0 au . .1.

Theorem 2. the autocorrelation functions of the two MA(1)-processes with parameters a and 1/a coincide. |ρ(1)| ≤ 1/2 for an arbitrary MA(1)-process and thus. might indicate that an MA(1)-model for a given data set is not a correct assumption.11 and representation (2. v > q. Since a/(1 + a2 ) = (1/a)/(1 + (1/a)2 ). t ∈ Z. u≥0 . . The MA(1)-process Yt = εt + aεt−1 with a = 0 has the autocorrelation function  1.. pertaining to an invertible MA(q)-process Yt = q au εt−u . zq ∈ C of A(z) = q au z u = 0 u=0 are outside of the unit circle i. which exceeds 1/2 essentially.2. moreover. v = 0.e. and in this case we have by Lemma 2. v=0  ρ(v) = a/(1 + a2 ). v = ±1  0 elsewhere. t ∈ Z. with a0 = 1 and aq = 0. if |zi | > 1 for 1 ≤ i ≤ q. Invertible Processes The MA(q)-process Yt = q au εt−u .1. q−v 59 γ(v) (iv) ρ(v) = = γ(0)     ρ(−v) = ρ(v). Example 2.10 with probability one εt = au Yt−u . u=0 can be obtained by means of an absolutely summable and causal filter (bu )u≥0 via εt = bu Yt−u . In particular the MA(1)-process Yt = εt − aεt−1 is invertible iff |a| < 1.2. .2 Moving Averages and Autoregressive Processes  0.    1.2. 0 < v ≤ q. a large value of the empirical autocorrelation function r(1).1) imply that the white noise process (εt ). is said u=0 to be invertible if all q roots z1 . .1. av+w aw w=0 q 2 w=0 aw . . u≥0 with probability one. We have.

(2. . . . In this case. regressed on its own past p values plus a random shock.2) The value of an AR(p)-process at time t is. the stationary solution is almost surely uniquely determined by Yt := u≥0 bu εt−u . denoted by AR(p) if there exist a1 . . ap implying the existence of a uniquely determined stationary solution (Yt ) of (2. (2. and a white noise (εt ) such that Yt = a1 Yt−1 + · · · + ap Yt−p + εt .3.11. where (bu )u≥0 is the absolutely summable inverse causal filter of c0 = 1. u = 1.26). . The following result provides a sufficient condition on the constants a1 . Proof. p and cu = 0 elsewhere. u=1 1 − a1 z − a2 z 2 − · · · − ap z p = 0 for |z| ≤ 1. cu = −au .6. and equation (2. t ∈ Z. . .1. The condition that all roots of the characteristic equation of an AR(p)process Yt = p au Yt−u + εt are outside of the unit circle i. The existence of an absolutely summable causal filter follows from Theorem 2. . .e. therefore. .2. The stationarity of Yt = u≥0 bu εt−u is a consequence of Theorem 2. ..2) with the given constants a1 .1. . . .6 MA(q)-processes are automatically stationary. Theorem 2. ap and white noise (εt )t∈Z has a stationary solution (Yt )t∈Z if all p roots of the equation 1 − a1 z − a2 z 2 − · · · − ap z p = 0 are outside of the unit circle.3) t ∈ Z. and its uniqueness follows from εt = Yt − a1 Yt−1 − · · · − ap Yt−p . .2). The Stationarity Condition While by Theorem 2. t ∈ Z. this is not true for AR(p)-processes (see Exercise 2. ap ∈ R with ap = 0. .1). . The AR(p)-equation (2.1.60 Models of Time Series Autoregressive Processes A real valued stochastic process (Yt ) is said to be an autoregressive process of order p.

If there are solutions in the unit circle.2. 1. then the stationary solution is noncausal. u ≥ 0. εs+w−u ) bu bu−s Cov(ε0 .2.4. Yt is correlated with future values of εs . 1 − a2 s = 0. iff |a| < 1. Note that a stationary solution (Yt ) of (2. The autocorrelation function of an AR(1)-process Yt = aYt−1 + εt with |a| < 1 therefore decreases at an exponential rate.2 Moving Averages and Autoregressive Processes will be referred to in the following as the stationarity condition for an AR(p)-process. 2. t ∈ Z. with probability one Yt = u≥0 61 bu εt−u = u≥0 au εt−u . and γ(−s) = γ(s). This is frequently regarded as unnatural. the autocorrelation function of (Yt ) is given by ρ(s) = a|s| . From Theorem 2. a1 = −a and au = 0 elsewhere is given by bu = au . In this case we obtain from Lemma 2. Denote by σ 2 the variance of ε0 ..1.1) exists in general if no root zi of the characteristic equation lies on the unit sphere.6 we obtain the autocovariance function of (Yt ) γ(s) = u w bu bw Cov(ε0 . .10 that the absolutely summable inverse causal filter of a0 = 1.. . The AR(1)-process Yt = aYt−1 + εt . In particular we obtain γ(0) = σ 2 /(1 − a2 ) and thus. and thus. satisfies the stationarity condition iff |z1 | > 1 i. The process (Yt ). .e. s ∈ Z. Example 2. with a = 0 has the characteristic equation 1 − az = 0 with the obvious solution z1 = 1/a. therefore. ε0 ) u≥0 = = σ 2 as u≥s a2(u−s) = σ 2 as .1. 0). s > t.e. Its sign is alternating if a ∈ (−1. . i.

sas */ TITLE1 ’ Autocorrelation functions of AR (1) . SYMBOL2 C = GREEN V = DOT I = JOIN H =0. .2. END .7 .62 Models of Time Series Plot 2.1a: Autocorrelation functions of AR(1)-processes Yt = aYt−1 + εt with different values of a. rho = a ** s . AXIS2 LABEL =( F = CGREEK ’r ’ F = COMPLEX H =1 ’a ’ H =2 ’( s ) ’) .processes ’.3 L =33.6) . 0. /* Generate data for different autocorrelation functions */ DATA data1 . OUTPUT .3 L =1.9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 /* ar1_autocorrelation . END . /* Plot autocorrelation functions */ PROC GPLOT DATA = data1 . DO s =0 TO 20.5 .0. SYMBOL3 C = GREEN V = DOT I = JOIN H =0. 0.3 L =2. LEGEND1 LABEL =( ’ a = ’) SHAPE = SYMBOL (10 . AXIS1 LABEL =( ’s ’) . DO a = -0. /* Graphical options */ SYMBOL1 C = GREEN V = DOT I = JOIN H =0.

The SHAPE option SHAPE=SYMBOL(10. Computing autocorrelation functions of AR(1)- The data step evaluates rho for three different values of a and the range of s from 0 to 20 using two loops. . . The following figure illustrates the significance of the stationarity condition |a| < 1 of an AR(1)-process.0. The plot is generated by the procedure GPLOT. the font COMPLEX assuming this to be the default text font (GOPTION FTEXT=COMPLEX). RUN .5. . Realizations Yt = aYt−1 + εt . .5 the sample path follows the constant zero closely. t = 1. While for a = 0. ε10 are independent standard normal in each case and Y0 is assumed to be zero. which is the expectation of each Yt . 63 23 24 Program 2.5 and a = 1. are displayed for a = 0.2. ε2 . where ε1 . .2 Moving Averages and Autoregressive Processes PLOT rho * s = a / HAXIS = AXIS1 VAXIS = AXIS2 LEGEND = LEGEND1 VREF =0.2. 10.5. . .1: processes. . QUIT . . The LABEL option in the AXIS2 statement uses. the observations Yt decrease rapidly in case of a = 1.6) in the LEGEND statement defines width and height of the symbols presented in the legend. in addition to the greek font CGREEK.

LEGEND1 LABEL =( ’ a = ’) SHAPE = SYMBOL (10 . Program 2.processes ’.2: Simulating AR(1)-processes. y = a * y + RANNOR (1) .processes */ DATA data1 . .5Yt−1 + εt and Yt = 1. DO t =1 TO 10.4 L =1. RUN .5. . AXIS2 LABEL =( ’Y ’ H =1 ’t ’) . OUTPUT . DO a =0. t =0. . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 /* ar1_plot .6) . END . /* Plot the AR (1) .2. /* Graphical options */ SYMBOL1 C = GREEN V = DOT I = JOIN H =0. .2a: Realizations Yt = 0.2. 1. t = 1.64 Models of Time Series Plot 2.0.processes */ PROC GPLOT DATA = data1 ( WHERE =( t >0) ) . END .5Yt−1 + εt . QUIT . AXIS1 LABEL =( ’t ’) MINOR = NONE .4 L =2. 10.5 . y =0. . PLOT y * t = a / HAXIS = AXIS1 VAXIS = AXIS2 LEGEND = LEGEND1 . OUTPUT . sas */ TITLE1 ’ Realizations of AR (1) . SYMBOL2 C = GREEN V = DOT I = JOIN H =0. /* Generated AR (1) . with εt independent standard normal and Y0 = 0.

. Its autocorrelation function ρ then satisfies for s = 1. A positive value of this seed always produces the same series of random numbers. a negative value generates a different series each time the program is submitted.. the first one over the two values for a. . Only observations fulfilling the condition t>0 are read into the data set used here.5. Let Yt = u=1 au Yt−u + εt be an AR(p)-process.5) with .2 Moving Averages and Autoregressive Processes The data are generated within two loops.2. A value of y is calculated as the sum of a times the actual value of y and the random number and stored in a new observation... . 2. t and y). In the plot created by PROC GPLOT the initial observations are dropped using the WHERE data set option. To suppress minor tick marks between the integers 0. .. p Lemma 2. u=1 (2.3). the recursion p ρ(s) = u=1 au ρ(s − u). which satisfies the stationarity condition (2. 10 are created within the second loop over t and with the help of the function RANNOR which returns pseudo random numbers distributed as standard normal. The resulting data set has 22 observations and 3 variables (a. With µ := E(Y0 ) we have for t ∈ Z p p Yt − µ = u=1 au (Yt−u − µ) + εt − µ 1 − au . The variable y is initialized with the value 0 corresponding to t=0.2.5) p and taking expectations of (2.1. . (2. 65 The Yule–Walker Equations The Yule–Walker equations entail the recursive computation of the autocorrelation function ρ of an AR(p)-process satisfying the stationarity condition (2.5) gives µ(1 − u=1 au ) = E(ε0 ) =: ν due to the stationarity of (Yt ).3).4) known as Yule–Walker equations... The realizations for t=1.10 the option MINOR in the AXIS1 statement is set to NONE. The argument 1 is the initial seed to produce a stream of random numbers. Proof. By multiplying equation (2.

Dividing the above equation by γ(0) now yields the assertion. ρ(p − 1) a1 ρ(1) ρ(2)  ρ(1)  a2  1 ρ(1) ρ(p − 2)      ρ(3) =  ρ(2) ρ(1) 1 ρ(p − 3) a3   . .66 Models of Time Series Yt−s − µ for s > 0 and taking expectations again we obtain γ(s) = E((Yt − µ)(Yt−s − µ)) p = u=1 p au E((Yt−u − µ)(Yt−s − µ)) + E((εt − ν)(Yt−s − µ)) au γ(s − u). = u=1 for the autocovariance function γ of (Yt ). .5. . . .  . Cov(Yt−s .4) can be represented as      1 ρ(1) ρ(2) . . εt ) = u≥0 bu Cov(εt−s−u . . . where r = (r(1). (2. . . . The final equation follows from the fact that Yt−s and εt are uncorrelated for s > 0. see Theorem 2. r(p − 2) . . .6) This matrix equation offers an estimator of the coefficients a1 . .. we can rewrite the formal equation r = Ra as R−1 r = a.  . . . . 1 If the p × p-matrix R is invertible. . .2. .   . .. ap by replacing the autocorrelations ρ(j) by their empirical counterparts r(j). which motivates the estimator a := R−1 r ˆ of the vector a = (a1 . . ap )T of the coefficients. .1. r(p − 1)  r(1) 1 r(1) . 1 ≤ j ≤ p.    . Equation (2. . ρ(p) ρ(p − 1) ρ(p − 2) ρ(p − 3) .    . . .6) then formally becomes r = Ra. equations (2.3. . . . Since ρ(−s) = ρ(s).  . ap )T and   1 r(1) r(2) .7) . r(p − 1) r(p − 2) r(p − 3) . a = (a1 . . . R :=  . . This is a consequence of Theorem 2. εt ) = 0. . . by which almost surely Yt−s = u≥0 bu εt−s−u with an absolutely summable causal filter (bu ) and thus. . . r(p))T . . 1 ap (2.

k ρ(k) akk   1 ρ(1) ρ(2) . =P  . . . ap . If we suppose that P k is positive definite.9) . Thus we have α(p) = ap . (2. .. If the empirical counterpart Rk of P k is invertible as well.2.2 Moving Averages and Autoregressive Processes 67 The Partial Autocorrelation Coefficients We have seen that the autocorrelation function ρ(k) of an MA(q)process vanishes for k > q.  .j≤k is positive semidefinite for any k ≥ 1.   . . . . .18) in Section 2. . is by the Yule–Walker equations (2. . ak :=  . . . Note that the coefficient α(k) also occurs as the coefficient of Yn−k in the best linear one-step forecast k cu Yn−u of Yn+1 . . ap ). 0) ∈ Rk . Yj )  1≤i. . . . . see u=0 equation (2. This is not true for an AR(p)-process. 0. α(k) = 0 for k > p. . whereas the partial autocorrelation coefficients will share this property. and the equation     ak1 ρ(1) . .4) a solution of the equation (2. .9). Note that the correlation matrix Pk : = Corr(Yi . denoted by α(k).   . .. . . see Lemma 2. . .2. ˆ k . Observe that for k ≥ p the vector (a1 . . . with k − p zeros added to the vector of coefficients (a1 . then it is invertible.1.8) The number akk is called partial autocorrelation coefficient at lag k.  = P −1  . ρ(k − 1)  ρ(1) 1 ρ(1) ρ(k − 2)   ρ(1) 1 ρ(k − 3) =  ρ(2)   . ρ(k − 1) ρ(k − 2) ρ(k − 3) . k ≥ 1. then ak := R−1 r k . 1 (2.3. k akk ρ(k) has the unique solution    ρ(1) ak1 .

r(k))T being an obvious estimate of ak . whereas ˆ α(k) ≈ α(k) = 0 for k > p should be close to zero.2. It can be utilized to estimate the order p of an AR(p)-process. The recursion (2. . α(2) = a2 . 2 ρ(1) = a1 + a2 ρ(1).3Yt−2 + εt for 1 ≤ t ≤ 200.6.4) for an AR(2)process Yt = a1 Yt−1 + a2 Yt−2 + εt are for s = 1. . ˆ Example 2. since α(p) ≈ α(p) = ap is different from zero. .6Yt−1 − 0. α(j) = 0. j ≥ 3. conditional on Y−1 = Y0 = 0. with the solutions a1 . 1 − a2 ρ(2) = a1 ρ(1) + a2 and thus. The Yule–Walker equations (2.10) of ak = (ˆk1 . The corresponding partial autocorrelation function is shown in Plot 2. ρ(1) = 1 − a2 a2 1 ρ(2) = + a2 .68 Models of Time Series with r k := (r(1).4) entails the computation of ρ(s) for an arbitrary s from the two values ρ(1) and ρ(2). akk ) is the empirical partial autocorrelation coefˆ a ˆ ficient at lag k. The random shocks εt are iid standard normal. . . The following figure displays realizations of the AR(2)-process Yt = 0.4a . the partial autocorrelation coefficients are α(1) = ρ(1).2. . . The k-th component α(k) := akk ˆ ˆ (2. .

processes */ PROC GPLOT DATA = data1 ( WHERE =( t >0) ) .3: Simulating AR(2)-processes.3.3a: Realization of the AR(2)-process Yt = 0.process ’. conditional on Y−1 = Y0 = 0. DO t =1 TO 200. AXIS2 LABEL =( ’Y ’ H =1 ’t ’) . y =0. t = -1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /* ar2_plot . . are iid standard normal.3* y2 + RANNOR (1) . AXIS1 LABEL =( ’t ’) .2. OUTPUT .2 Moving Averages and Autoregressive Processes 69 Plot 2. t =0.3Yt−2 + εt . QUIT . END . y1 = y .2. OUTPUT . y =0. y2 = y1 .process */ DATA data1 . y1 = y . OUTPUT . /* Graphical options */ SYMBOL1 C = GREEN V = DOT I = JOIN H =0. RUN . 1 ≤ t ≤ 200. y =0. The εt . sas */ TITLE1 ’ Realisation of an AR (2) .6Yt−1 − 0. Program 2. /* Generated AR (2) . PLOT y * t / HAXIS = AXIS1 VAXIS = AXIS2 . /* Plot the AR (2) .6* y1 -0.2.

sas */ TITLE1 ’ Empirical partial autocorrelation function ’.process data ’. /* Note that this program requires data1 generated by the previous →program ( ar2_plot .2. AXIS1 LABEL =( ’k ’) .4a: Empirical partial autocorrelation function of the AR(2)data in Plot 2. /* Graphical options */ SYMBOL1 C = GREEN V = DOT I = JOIN H =0. y1 and y are updated one after the other. IDENTIFY VAR = y NLAG =50 OUTCOV = corr NOPRINT . sas ) */ /* Compute partial autocorrelation function */ PROC ARIMA DATA = data1 ( WHERE =( t >0) ) .70 The two initial values of y are defined and stored in an observation by the OUTPUT statement. Within the loop the Models of Time Series values y2 (for yt−2 ).7. TITLE2 ’ of simulated AR (2) . .2. The second observation contains an additional value y1 for yt−1 . The data set used by PROC GPLOT again just contains the observations with t > 0. Plot 2.3a 1 2 3 4 5 6 7 8 9 10 11 12 /* ar2_epa .

. RUN .3. q)-process with q ≥ 1 is a moving average MA(q).12) (2.2 Moving Averages and Autoregressive Processes AXIS2 LABEL =( ’ a ( k ) ’) .4: Computing the empirical partial autocorrelation function of AR(2)-data.2. if it satisfies the equation Yt = a1 Yt−1 + a2 Yt−2 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q . q. ap . p. PLOT PARTCORR * LAG / HAXIS = AXIS1 VAXIS = AXIS2 VREF =0. . bq ∈ R. denoted by ARMA(p. whereas an ARMA(0.11) An ARMA(p. 0)-process with p ≥ 1 is obviously an AR(p)-process. 71 13 14 15 16 17 18 Program 2. The polynomials A(z) := 1 − a1 z − · · · − ap z p B(z) := 1 + b1 z + · · · + bq z q . . Otherwise you have to add the block of statements to this program concerning the data step. Like in Program 1. QUIT . This variable is plotted against the lag stored in variable LAG. b0 .2.sas) the procedure ARIMA with the IDENTIFY statement is used to create a data set.13) and are the characteristic polynomials of the autoregressive part and of the moving average part of an ARMA(p. Here we are interested in the variable PARTCORR containing the values of the empirical partial autocorrelation function from the simulated AR(2)-process data. . (2. which we can represent in the form Yt − a1 Yt−1 − · · · − ap Yt−p = εt + b1 εt−1 + · · · + bq εt−q . .sas). (2. This program requires to be submitted to SAS for execution within a joint session with Program 2. . q)-process (Yt ). ARMA-Processes Moving averages MA(q) and autoregressive AR(p)-processes are special cases of so called autoregressive moving averages. . . .3 (ar2 plot.2.1 (sunspot correlogram. Let (εt )t∈Z be a white noise. A real valued stochastic process (Yt )t∈Z is said to be an autoregressive moving average process of order p. because it uses the temporary data step data1 generated there. q). /* Plot autocorrelation function */ PROC GPLOT DATA = corr . q ≥ 0 integers and a0 .

then we deduce from Theorem 2.11). is the almost surely uniquely determined stationary solution of the ARMA(p. .p) gu (Yt−u − a1 Yt−1−u − · · · − ap Yt−p−u ) =− aw gv−w Yt−v .72 Models of Time Series Denote by Zt the right-hand side of the above equation i. stationary by Theorem 2. p. If all p roots of the equation A(z) = 1 − a1 z − · · · − ap z p = 0 are outside of the unit circle. has an absolutely summable causal inverse filter (du )u≥0 . q)-equation (2. which is called causal..1.e.1) that with b0 = 1. u = 1. . . The condition that all p roots of the characteristic equation A(z) = 1 − a1 z − a2 z 2 − · · · − ap z p = 0 of the ARMA(p. q)-process (Yt ) are outside of the unit circle will again be referred to in the following as the stationarity condition (2. Consequently we obtain from the equation Zt = Yt −a1 Yt−1 − · · · − ap Yt−p and (2. cu = −au .11) for a given white noise (εt ) . . This is a MA(q)-process and.1. q)-equation (2.q) = v≥0 w=0 bw dv−w εt−v =: v≥0 αv εt−v is the almost surely uniquely determined stationary solution of the ARMA(p. v≥0 w=0 In this case the ARMA(p. t ∈ Z. .3). q)-process (Yt ) is said to be invertible. bw = 0 if w > q Yt = u≥0 du Zt−u = u≥0 du (εt−u + b1 εt−1−u + · · · + bq εt−q−u ) du bw εt−w−u = du bw εt−v v≥0 u+w=v = u≥0 w≥0 min(v.1.11 and equation (2. Zt := εt + b1 εt−1 + · · · + bq εt−q . Theorem 2. In this case. The MA(q)-process Zt = εt + b1 εt−1 + · · · + bq εt−q is by definition invertible if all q roots of the polynomial B(z) = 1+b1 z +· · ·+bq z q are outside of the unit circle.6. the process Yt = v≥0 αv εt−v .1) imply in this case the existence of an absolutely summable causal filter (gu )u≥0 such that with a0 = −1 εt = u≥0 gu Zt−u = u≥0 min(v. cu = 0 elsewhere.11 that the filter c0 = 1. therefore.

3). w ≥ 2. aw−1 εt−w .14) α0 = 1. .13) for 0 < |z| < 1 A(z)(B(z)D(z)) = B(z) p q ⇔ ⇔ ⇔ − au z u=0 u v≥0 αv z v = w=0 bw z w bw z w w≥0 w≥0 − − au αv z w = u+v=w w au αw−u z w = u=0 w≥0 bw z w w≥0 ⇔ Example 2. w ≥ 1.2. 1)-process Yt − aYt−1 = εt + bεt−1 with |a| < 1 we obtain from (2.v) αv = w=0 bw dv−w .2.14) This implies α0 = 1. hence. α1 − a = b. in the above representation Yt = v≥0 αv εt−v . which satisfies the stationarity condition (2.12). q)process (Yt ). we compute at first the absolutely summable coefficients min(q. where A(z) is given in (2. αw = aw−1 (b + a).2 Moving Averages and Autoregressive Processes 73 The Autocovariance Function of an ARMA-Process In order to deduce the autocovariance function of an ARMA(p. v ≥ 0.7. Yt = εt + (b + a) w≥1  α0 = 1    w   α − au αw−u = bw w u=1  p    α −  w au αw−u = 0  u=1 for 1 ≤ w ≤ p for w > p. αw − aαw−1 = 0. Thus we obtain with B(z) as given in (2.1. The characteristic polynomial D(z) of the absolutely summable causal filter (du )u≥0 coincides by Lemma 2.9 for 0 < |z| < 1 with 1/A(z). and. For the ARMA(1. (2.

Put µ := E(Y0 ) and ν := E(ε0 ). 0 ≤ s ≤ q. where αv . t ∈ Z. which coincides with the autocorrelation function of the stationary AR(p)-process Xt = p au Xt−u + εt .5. t ∈ Z.74 Models of Time Series p Theorem 2. Yt−u ) + v=0 bv Cov(Yt−s . Taking expecu=1 v=0 q p tations on both sides we obtain µ = v=0 bv ν. Then we have p q Yt − µ = u=1 au (Yt−u − µ) + v=0 bv (εt−v − ν).15) u=1 au γ(s − u) = 0. and taking expectations.8. s ≥ q + 1. Its autocovariance function γ then satisfies the recursion p q γ(s) − γ(s) − u=1 p au γ(s − u) = σ 2 v=s bv αv−s .2. Multiplying both sides with Yt−s − µ.2. Suppose that Yt = u=1 au Yt−u + q bv εt−v . εt−v ). q)process (Yt ) satisfies p ρ(s) = u=1 au ρ(s − u). which u=1 au µ + yields now the equation displayed above.f. εt−v ).14) and σ is the variance of ε0 . By the preceding result the autocorrelation function ρ of the ARMA(p. b0 := v=0 1. (2. Yt ) = u=1 au Cov(Yt−s . which satisfies the stationarity condition (2. q)-process. are the coefficients in the representation Y t = 2 v≥0 αv εt−v .3). v=0 . s ≥ q + 1. c. Lemma 2. u=1 Proof of Theorem 2.2. Recall that Yt = p au Yt−u + q bv εt−v . is an ARMA(p. v ≥ 0. s ≥ 0.8. which we computed in (2. we obtain p q Cov(Yt−s . which implies p q γ(s) − u=1 au γ(s − u) = bv Cov(Yt−s .

εt−v ) = 0 σ 2 αv−s if v < s if v ≥ s. 1 − a2 γ(1) = σ 2 (1 + ab)(a + b) . εt−v ) v=s σ2 0 q v=s bv αv−s if s ≤ q if s > q.15) γ(s) = aγ(s − 1) = · · · = as−1 γ(1). + 2ab + b2 .2.1. Example 2. This implies p q γ(s) − u=1 au γ(s − u) = = bv Cov(Yt−s .2.2.2.7 and Theorem 2.5 αw Cov(εt−s−w . 1)-process Yt − aYt−1 = εt + bεt−1 with |a| < 1 we obtain from Example 2. which is the assertion.8 with σ 2 = Var(ε0 ) γ(0) − aγ(1) = σ 2 (1 + b(b + a)).2 Moving Averages and Autoregressive Processes From the representation Yt−s = we obtain Cov(Yt−s . 1 − a2 For s ≥ 2 we obtain from (2. εt−v ) = w≥0 w≥0 αw εt−s−w 75 and Theorem 2. . and thus γ(0) = σ 21 γ(1) − aγ(0) = σ 2 b. For the ARMA(1.9.

’ || b || ’) ’) . END .8.2.5.5 and σ 2 = 1.8 . 1)-processes with a = 0.5a: Autocorrelation functions of ARMA(1. SYMBOL5 C = YELLOW V = DOT I = JOIN H =0. q = COMPRESS ( ’( ’ || a || ’.7 L =2. SYMBOL2 C = YELLOW V = DOT I = JOIN H =0.5 .1) . rho =1. q = COMPRESS ( ’( ’ || a || ’.7 L =1. 0 .7 L =33.5/0/ − 0.8/ − 0.7 L =3. /* Graphical options */ SYMBOL1 C = RED V = DOT I = JOIN H =0. rho =(1+ a * b ) *( a + b ) /(1+2* a * b + b * b ) .1) →processes */ DATA data1 . 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 . s =0. DO a = -0.’ || b || ’) ’) . 0. OUTPUT . OUTPUT . END . rho = a * rho . /* Compute autocorrelations functions for different ARMA (1 . SYMBOL3 C = BLUE V = DOT I = JOIN H =0. 0. q = COMPRESS ( ’( ’ || a || ’. s =1. END . SYMBOL4 C = RED V = DOT I = JOIN H =0.’ || b || ’) ’) .processes ’. b = 0. DO b = -0.8.7 L =4.76 Models of Time Series Plot 2. DO s =2 TO 10. sas */ TITLE1 ’ Autocorrelation functions of ARMA (1 . OUTPUT . 1 2 3 4 /* a r m a 1 1 _ a u t o c o r r e l a t i o n .

5: Computing autocorrelation functions of ARMA(1.2. the coefficient of the MA(1)-part. AXIS1 LABEL =( F = CGREEK ’r ’ F = COMPLEX ’( k ) ’) . . For the arguments (lags) s=0 and s=1 the computation is done directly. Then we can eliminate this trend by considering the process (∆d Yt ). d. ARIMA-Processes Suppose that the time series (Yt ) has a polynomial trend of degree d. b ) = ’) SHAPE = SYMBOL (10 . In the data step the values of the autocorrelation function belonging to an ARMA(1. t ∈ Z. ap . For the COMPRESS statement see Program 1. and three different values of b. AXIS2 LABEL =( ’ lag k ’) MINOR = NONE . Example 2.7 L =5.10. /* Plot the autocorrelation functions */ PROC GPLOT DATA = data1 . . PLOT rho * s = q / VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = LEGEND1 . 1. t ∈ Z.8) . The second part of the program uses PROC GPLOT to plot the autocorrelation function. RUN . 1) process are calculated for two different values of a. If the filtered process (∆d Yd ) is an ARMA(p. q).1.3 (logistic. the original process (Yt ) is said to be an autoregressive integrated moving average of order p. 1)processes. . denoted by ARIMA(p. LEGEND1 LABEL =( ’( a . 1)-process (Yt ) satisfies ∆Yt = a∆Yt−1 + εt + bεt−1 . In this case constants a1 . the coefficient of the AR(1)-part.0. obtained by d times differencing as described in Section 1. q. An ARIMA(1.2. . 77 28 29 30 31 32 33 34 35 36 Program 2. Pure AR(1)-processes result for the value b=0. . QUIT . bq ∈ R exist such that p q ∆d Y t = u=1 au ∆d Yt−u + w=0 bw εt−w .2 Moving Averages and Autoregressive Processes SYMBOL6 C = BLUE V = DOT I = JOIN H =0. d. . using known statements and options to customize the output. . q)-process satisfying the stationarity condition (2. .2.2. for the rest up to s=10 a loop is used for a recursive computation.3). b1 . b0 = 1. . where (εt ) is a white noise.sas).

q)-process with period length s. Then (Yt ) and (Zt ) are both I(1). and (εt ). This is a quite common assumption in practice. t ∈ Z. Cointegration In the sequel we will frequently use the notation that a time series (Yt ) is I(d). .. The fact that the combination of two nonstationary series yields a stationary process arises from a common component (Wt ). then the original process (Y t ) is called a seasonal ARMA(p. where the random component (Rt ) is a stationary process and the seasonal component (St ) is periodic of length s. 1. 1.e. t. where (Wt ) is I(1).. i. Cov(εt . t ∈ Z. is I(0). for t ∈ Z. 0)-process. q).e. which is . called a SARIMA(p. but Xt := Yt − aZt = εt − aδt .. q)-process satisfying the stationarity condition (2. therefore. d = 0.78 Models of Time Series where |a| < 1. Consider Yt = St + Rt . . Suppose that the two time series (Yt ) and (Zt ) satisfy Yt = aWt + εt . (δt ) are uncorrelated white noise processes. t ∈ Z. q)-process is. denoted by SARMAs (p. A random walk Xt = Xt−1 +εt is obviously an ARIMA(0. t ∈ Z. This implies Yt = (a + 1)Yt−1 − aYt−2 + εt + bεt−1 . if the sequence of differences (∆d Yt ) of order d is a stationary process. Zt = W t + δ t . for some real number a = 0. If this seasonally adjusted process (Yt∗ ) is an ARMA(p.e. b = 0 and (εt ) is a white noise. d.3). St = St+s = St+2s = . δs ) = 0. i. i. t ∈ Z. q)-process. Then the process (Yt ) is in general not stationary. By the difference ∆0 Yt of order zero we denote the undifferenced process Yt . s ∈ Z. One frequently encounters a time series with a trend as well as a periodic seasonal component. but Yt∗ := Yt − Yt−s is. A stochastic process (Yt ) with the property that (∆d (Yt − Yt−s )) is an ARMA(p. Yt − Yt−1 = a(Yt−1 − Yt−2 ) + εt + bεt−1 .

model their ways by random walks Yt = Yt−1 + εt and Zt = Zt−1 + δt . Engle and Granger (1987)). the distance between them will. that (Yt ) and (Zt ) are cointegrated with constants α1 = 1 and α2 = −1. in equilibrium. we can choose α1 = 1 in this case.2. while the process may fluctuate more-or-less randomly. Principles of supply and demand. We can. be relatively constant (typically about zero).f. (Zt ) are said to be cointegrated . prices for the same commodity in different parts of a country. It seems reasonable to assume that these distances form a stationary process. The additional terms on the right-hand side of these equations are the error correction terms. along with the possibility of arbitrage. . for example. where the individual single steps (εt ). In the same way a drunkard seems to follow a random walk an unleashed dog wanders aimlessly. Consider. such that the process X t = µ + α 1 Y t + α 2 Zt ..2 Moving Averages and Autoregressive Processes I(1). Without loss of generality. α2 different from 0. t∈Z 79 is I(0). We model the cointegrated walks above more tritely by assuming the existence of constants c. d ∈ R such that Yt − Yt−1 = εt + c(Yt−1 − Zt−1 ) and Zt − Zt−1 = δt + d(Yt−1 − Zt−1 ). (δt ) of man and dog are uncorrelated white noise processes. i. α1 . c. Such cointegrated time series are often encountered in macroeconomics (Granger (1981). The link between cointegration and error correction can vividly be described by the humorous tale of the drunkard and his dog. More generally. since their variances increase. if there exist constants µ. And if the dog belongs to the drunkard? We assume the dog to be unleashed and thus. Murray (1994). α2 with α1 . and so both processes (Yt ) and (Zt ) are not stationary. therefore. means that. Random walks are not stationary. the distance Yt − Zt between the drunk and his dog is a random variable. two I(1) series (Yt ).e.

where (εt ) is a stationary process and β0 .2. Do they provide a typical example of cointegrated series? A discussion can be found in Box and Tiao (1977).S. If one or both are not I(1). and one checks. β1 . whether the estimated residuals ˆ ˆ ε t = Y t − β 0 − β 1 Zt ˆ are generated by a stationary process. A general strategy for examining cointegrated series can now be summarized as follows: 1. This is based on the cointegration regression Y t = β 0 + β 1 Zt + ε t . but that a linear combination of them be I(0). (Hog Data) Quenouille’s (1957) Hog Data list the annual hog supply and hog prices in the U. This means that the first step is to figure out if the series themselves are I(1). ˆ ˆ One can use the ordinary least squares estimates β0 .β1 ∈R t=1 Y t − β 0 − β 1 Zt 2 . which satisfy n t=1 ˆ ˆ Y t − β 0 − β 1 Zt 2 n = min β0 . Examine εt for stationarity. using ˆ • the Durbin–Watson test • standard unit root tests such as Dickey–Fuller or augmented Dickey–Fuller. Whether two processes (Yt ) and (Zt ) are cointegrated can be tested by means of a linear regression approach. ˆ ˆ 2. Compute εt = Yt − β0 − β1 Zt using ordinary least squares.11. . ˆ 3. cointegration is not an option.80 Models of Time Series Cointegration requires that both variables in question be I(1). Example 2. Determine that the two series are I(1). between 1867 and 1948. typically by using unit root tests. β1 ∈ R are the cointegration constants. β1 of the target parameters β0 .

2. /* Merge data sets . txt ’. INPUT supply @@ . txt ’.2. INFILE ’c :\ data \ hogsuppl . sas */ TITLE1 ’ Hog supply . generate year and compute differences */ . /* Read in the two data sets */ DATA data1 .6a: Hog Data: hog supply and hog prices. DATA data2 . INPUT price @@ . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 /* hog . hog prices and differences ’. TITLE2 ’ Hog Data (1867 -1948) ’.2 Moving Averages and Autoregressive Processes 81 Plot 2. INFILE ’c :\ data \ hogprice .

TREPLAY 1: GPLOT 2: GPLOT1 3: GPLOT2 . not seem to be realizations of stationary processes. AXIS3 LABEL =( ANGLE =90 ’d i f f e r e n c e s ’) . Year is an additional variable with values 1867. . nevertheless. . PLOT supply * year / VAXIS = AXIS1 . /* Display them in one output */ GOPTIONS DISPLAY . hog prices and their differences diff are plotted in three different plots stored in the graphics catalog abb. their .2. Models of Time Series 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 /* Graphical options */ SYMBOL1 V = DOT C = GREEN I = JOIN H =0. PLOT price * year / VAXIS = AXIS2 . MERGE data1 data2 . PROC GPLOT DATA = data3 GOUT = abb . therefore. Hog supply (=: yt ) and hog price (=: zt ) obviously increase in time t and do. a linear combination of both might be stationary.82 DATA data3 .5 W =1. TEMPLT . RUN . 1868. /* Generate three plots */ GOPTIONS NODISPLAY . diff = supply . PLOT diff * year / VAXIS = AXIS3 VREF =0. A high price zt at time t is a good reason for farmers to breed more hogs. This makes the price zt+1 fall with the effect that farmers will reduce their supply of hogs in the following year t + 2. as they behave similarly. By PROC GPLOT hog supply. AXIS1 LABEL =( ANGLE =90 ’h o g s u p p l y ’) . when hogs are in short supply. Program 2. In this case. The horizontal line at the zero level is plotted by the option VREF=0. Note that the labels of the vertical axes are spaced out as SAS sets their characters too close otherwise. However. DELETE _ALL_ . AXIS2 LABEL =( ANGLE =90 ’h o g p r i c e s ’) . The supply data and the price data read in from two external files are merged in data3. RUN . PROC GREPLAY NOFS IGOUT = abb TC = SASHELP . thus leading to a large supply yt+1 in the next year t + 1. . .price . 1932. The plots are put into a common graphic using PROC GREPLAY and the template V3. year = _N_ +1866. TEMPLATE = V3 .6: Plotting the Hog Data. This phenomenon can easily be explained as follows. QUIT . hog supply and hog price would be cointegrated.

2.258 4229 924. RUN . and the observed cointegration helps us to detect its existence. sas */ TITLE1 ’ Testing for cointegration ’. 1 2 3 4 5 6 7 8 9 /* hog_cointegration .2.5839 DFE Root MSE AIC Total R .7a: Phillips–Ouliaris test for cointegration of Hog Data.36 7.2059 Standard Error 26.Square Durbin . There is obviously some error correction mechanism inherent in these two processes. Program 2.3902 0.0001 <. sas ) */ /* Compute Phillips .7: Phillips–Ouliaris test for cointegration of Hog Data. /* Note that this program needs data3 generated by the previous →program ( hog .Square 80 65.2 Moving Averages and Autoregressive Processes price zt+2 will rise etc.0001 Listing 2.03117 919.7978 0.15 Approx Pr > | t | <.3902 Phillips .0288 t Value 19.Ouliaris .Ouliaris Cointegration Test Lags 1 Rho -28.2.359266 0.test for conintegration */ PROC AUTOREG DATA = data3 . 83 The AUTOREG Procedure Dependent Variable supply Ordinary Least Squares Estimates SSE MSE SBC Regress R . MODEL supply = price / STATIONARITY =( PHILLIPS ) . TITLE2 ’ Hog Data (1867 -1948) ’.172704 0.Watson 338324. .0142 Variable Intercept price DF 1 1 Estimate 515.6398 0.9109 Tau -4. QUIT .

i. i. β0 = 0. but only the values of the test statistics denoted by RHO and TAU.15 0. Tables of critical values of these test statistics can be found in Phillips and Ouliaris (1990).07 -3. This case is referred to as demeaned. The Phillips-Ouliaris test statistics need some further explanation.26 -2.2.88 -22.01 RHO -10.35 -2.57 -12.sas).05 -3. the Phillips-Ouliaris test statistics and the regression coefficients with their t-ratios. and none of the explanatory variables has a nonzero trend component.075 0.01 RHO -14.39 (2) If the estimated cointegrating regression contains an intercept.64 -18.025 0.91 -15.6 (hog.96 .075 0. α 0.05 0. For this one has to differentiate between the following cases. Note that in the original paper the two test statistics ˆ ˆ are denoted by Zα and Zt .37 -3.81 -15.83 TAU -2.81 -28. and none of the explanatory variables has a nonzero trend component.48 -20. then use the following table for critical values of RHO and TAU. The hypothesis is to be rejected if RHO or TAU are below the critical value for the desired type I level error α.1 0. The output of the above program contains some characteristics of the regression.84 The procedure AUTOREG (for autoregressive models) uses data3 from Program 2.93 -17. Unfortunately SAS does not provide the p-value.e.32 TAU -2. then use the following table for critical values of RHO and TAU.1 0. β0 = 0.96 -3.49 -23.15 0. The hypothesis of the Phillips–Ouliaris cointegration test is no cointegration.58 -2.54 -13.e.04 -18.45 -2.125 0.025 0. This is the so-called standard case. α 0.64 -3. (1) If the estimated cointegrating regression does not contain any intercept. In the MODEL statement a regression from supply on price is defined and the option Models of Time Series STATIONARITY=(PHILLIPS) makes SAS calculate the statistics of the Phillips–Ouliaris test for cointegration of order 1.125 0.76 -3.74 -11.20 -3.05 0.86 -2.

and at least one of the explanatory variables has a nonzero trend component. the RHO-value is −28. .01 RHO -20.75 -27. i.9109 and the TAU-value −4.19 -24.07 -4.075 0.33 -3.025 0. lead to a rejection of the null hypothesis of no cointegration at the 5%-level.65 -3. 85 ARCH. t ∈ Z. where the Zt are independent and identically distributed random variables with E(Zt ) = 0 and E(Zt2 ) = 1. which depends on preceding realizations.2 Moving Averages and Autoregressive Processes (3) If the estimated cointegrating regression contains an intercept.125 0. β0 = 0. Both are smaller than the critical values of −27.79 -21.and GARCH-Processes In particular the monitoring of stock prices gave rise to the idea that the volatility of a time series (Yt ) might not be a constant but rather a random variable. t ∈ Z.42 -3.36 In our example with an arbitrary β0 and a visible trend in the investigated time series.84 -35.80 -4. The scale σt is supposed to be a function of the past p values of the series: p 2 σt = a0 + j=1 2 aj Yt−j . t ∈ Z.0142. We assume the multiplicative model Y t = σ t Zt .52 -3.2.80 in the above table of the demeaned and detrended case and thus.05 0. α 0.42 TAU -3. For further information on cointegration we refer to Chapter 19 of the time series book by Hamilton (1994).09 and −3. The following approach to model such a change in volatility is due to Engle (1982).1 0.15 0.e. then use the following table for critical values of RHO and TAU.81 -23.09 -30. This case is said to be demeaned and detrended.

Conditional on 2 Yt−j = yt−j . If. . the inequality p aj < 1. x ∈ R. inherent in the process (Yt ). We assume moreover that the process (Yt ) is a causal one in the sense that Zt and Ys . The process Yt = σt Zt is. . called an autoregressive and conditional heteroscedastic process of order p. Common choices for the distribution of Zt are the standard normal distribution or the (standardized) t-distribution. are independent. 1 ≤ j ≤ p − 1. } and a0 > 0. Note. that the preceding j=1 . . which in the non-standardized form has the density fm (x) := Γ((m + 1)/2) x2 √ 1+ m Γ(m/2) πm −(m+1)/2 . s < t. ARCH(p)-process for short. the causal process (Yt ) is stationary. 1. and the innovation on this scale is then provided by Zt .86 Models of Time Series where p ∈ {0. . the conditional variances of the process will generally be different. The scale σt in the above model is determined by the past observations Yt−1 . a0 . aj ≥ 0. in addition. 1 ≤ j ≤ p. Some autoregressive structure is. . therefore. moreover. p j=1 aj A necessary condition for the stationarity of the process (Yt ) is. therefore. therefore. Yt−p . The particular choice p = 0 yields obviously a white noise model for (Yt ). ap > 0 are constants. The number m ≥ 1 is the degree of freedom of the t-distribution. . the variance of Yt is a0 + p aj yt−j and. j=1 thus. then we obviously have E(Yt ) = E(σt ) E(Zt ) = 0 and 2 σ 2 := E(Yt2 ) = E(σt ) E(Zt2 ) p = a0 + j=1 2 aj E(Yt−j ) p = a0 + σ which yields σ2 = 1− 2 j=1 aj . .

2. the stationarity condition (2.. therefore. . . For h ∈ N the causality of (Yt ) finally implies 2 2 2 E((εt − a0 )(εt+h − a0 )) = E(σt σt+h (Zt2 − 1)(Zt+h − 1)) 2 2 2 = E(σt σt+h (Zt2 − 1)) E(Zt+h − 1) = 0. . t ∈ Z. The following lemma is crucial. 87 where (εt ) is a white noise with E(εt ) = a0 . Hence. Let (Yt ) be a stationary and causal ARCH(p)-process with constants a0 . (εt ) is a white noise with E(εt ) = a0 . From the assumption that (Yt ) is an ARCH(p)-process we obtain p εt := Yt2 − j=1 2 2 2 2 aj Yt−j = σt Zt2 − σt + a0 = a0 + σt (Zt2 − 1). Lemma 2.2 Moving Averages and Autoregressive Processes arguments immediately imply that the Yt and Ys are uncorrelated for different values of s and t. so that our above tools for the analysis of autoregressive processes can be applied here as well. Proof.2.12.3) if all p roots of the equation 1 − p aj z j = 0 are outside of the j=1 unit circle. If the process of squared random variables (Yt2 ) is a stationary one.e. ap . The process (Yt2 ) satisfies. independent of t by the stationarity of (Yt2 ). This implies E(εt ) = a0 and 4 E((εt − a0 )2 ) = E(σt ) E((Zt2 − 1)2 ) p = E (a0 + j=1 2 aj Yt−j )2 E((Zt2 − 1)2 ) =: c. t ∈ Z. a1 . i. It embeds the ARCH(p)-processes to a certain extent into the class of AR(p)-processes. we can estimate the order p using an estimate as . then it is an AR(p)-process: 2 2 Yt2 = a1 Yt−1 + · · · + ap Yt−p + εt . . but they are not independent.

. . . In this case it is possible to write down explicitly the joint density of the vector (Yp+1 . The Yule– Walker equations provide us. . . conditional on Y1 = y1 . . 1981 and September 30th. . (Hongkong Data). The set of parameters aj . bk can again be estimated by conditional maximum likelihood as before if a parametric model for the distribution of the innovations Zt is assumed. Yn ). Example 2. . 1983. relative to the initial one.10) of the partial autocorrelation function of (Yt2 ). leading to a total amount of 552 observations pt .88 Models of Time Series in (2. . . Yt−p = yt−p . The expansion log(1 + ε) ≈ ε implies that yt = log 1 + pt − pt−1 pt − pt−1 ≈ . ap then leads to a maximum likelihood estimate of the vector of constants. In this case we can interpret the return as the difference of indices on subsequent days. The daily log returns are defined as yt := log(pt ) − log(pt−1 ). A generalized ARCH-process. GARCH(p.36). for example. . ap . .3. where the constants bk are nonnegative. which then can be utilized to estimate the expectation a0 of the error εt . adds an autoregressive structure to the scale σt by assuming the representation p q 2 σt = a0 + j=1 2 aj Yt−j + k=1 2 bk σt−k . Note that conditional on Yt−1 = yt−1 . q) for short (Bollerslev (1986)). with an estimate of the coefficients a1 . . . A numerical maximization of this density with respect to a0 . where we now have a total of n = 551 observations. pt−1 pt−1 provided that pt−1 and pt are close to each other. . the distribution of Yt = σt Zt is a normal one if the Zt are normally distributed. . .2. The daily Hang Seng closing index was recorded between July 16th. . . . a1 . . see also Section 2. Yp = yp (Exercise 2.13.

If one assumes t-distributed innovations Zt .278166 and a3 = 0. The following plots show the returns of the Hang Seng index. which seems to be a plausible choice by the partial autocorrelations plot.157807.1780 = 5.2 Moving Averages and Autoregressive Processes We use an ARCH(3) model for the generation of yt .2. 89 .61 degrees of freedom. SAS estimates the distribution’s degrees of freedom and displays the reciprocal in the TDFI-line. Following we obtain the estimates a0 = 0.000214. some specific information for the (G)ARCH approach and as mentioned above the estimates for the ARCH model parameters in combination with t ratios and approximated p-values. here m = 1/0. their squares.147593. a1 = 0. The SAS output also contains some general regression model information from an ordinary least squares estimation approach. a2 = 0. the pertaining partial autocorrelation function and the parameter estimates.

INPUT p . sas */ TITLE1 ’ Daily log returns and their squares ’.02) . t = _N_ . /* Graphical options */ SYMBOL1 C = RED V = DOT H =0. y = DIF ( LOG ( p ) ) .2.12 TO . /* Read in the data . TITLE2 ’ Hongkong Data ’. AXIS2 LABEL =( ’ y2 ’ H =1 ’t ’) . y2 = y **2. /* Generate two plots */ .5 I = JOIN L =1.90 Models of Time Series Plot 2. txt ’. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 /* hongkong_plot . compute log return and their squares */ DATA data1 .10 BY .8a: Log returns of Hang Seng index and their squares. AXIS1 LABEL =( ’y ’ H =1 ’t ’) ORDER =( -. INFILE ’c :\ data \ hongkong .

PLOT y * t / VAXIS = AXIS1 .2. /* Display them in one output */ GOPTIONS DISPLAY . TREPLAY 1: GPLOT 2: GPLOT1 . .2. TEMPLATE = V2 . 91 19 20 21 22 23 24 25 26 27 28 29 30 Program 2. In the DATA step the observed values of the Hang Seng closing index are read into the variable p from an external file. QUIT . After defining different axis labels. DELETE _ALL_ . The time index variable t uses the SAS-variable N . RUN . PLOT y2 * t / VAXIS = AXIS2 . PROC GPLOT DATA = data1 GOUT = abb . two plots are generated by two PLOT statements in PROC GPLOT. their squared values in y2. By means of PROC GREPLAY the plots are merged vertically in one graphic.8: Plotting the log returns and their squares.2 Moving Averages and Autoregressive Processes GOPTIONS NODISPLAY . RUN . but they are not displayed. and the log transformed and differenced values of the index are stored in the variable y. PROC GREPLAY NOFS IGOUT = abb TC = SASHELP . TEMPLT .

2.Sq 0.92 Models of Time Series Plot 2.0000 Durbin .000483 1706.7698 OBS 551 UVAR 0.squares are redefined . R . The AUTOREG Procedure Dependent Variable = Y Ordinary Least Squares Estimates SSE 0.000483 SBC -2643.0000 NOTE : No intercept term is used .82 0.532 -3381.5 119.9a: Partial autocorrelations of squares of log returns of Hang Seng index.265971 MSE 0.0000 AIC -3403. GARCH Estimates SSE MSE Log L SBC Normality Test 0.Watson 1.82 Reg Rsq 0.021971 -2643.8540 DFE Root MSE AIC Total Rsq 551 0.06 Prob > Chi .0001 Variable DF B Value Std Error t Ratio Approx Prob .000515 Total Rsq 0.265971 0.

To identify the order of a possibly underlying ARCH process for the daily log returns of the Hang Seng closing index. sas ) */ /* Compute partial autocorrelation function */ PROC ARIMA DATA = data1 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 /* hongkong_pa .287 2. PLOT partcorr * lag / VREF =0. /* Graphical options */ SYMBOL1 C = RED V = DOT H =0.0608 0. QUIT . SAS uses the letter q for the ARCH model order.0010 0. A horizontal reference line helps to decide whether a value is substantially different from 0.9b: Parameter estimates in the ARCH(3)-model for stock returns.0001 93 Listing 2.model ’. the empirical partial autocorrelations of their squared values. The MODEL statement specifies the dependent variable y. RUN . MODEL y = / NOINT GARCH =( q =3) DIST = T .3 The Box–Jenkins Program ARCH0 ARCH1 ARCH2 ARCH3 TDFI 1 1 1 1 1 0.0667 0. which are stored in the variable y2 of the data set data1 in Program 2. in contrast to our notation.model */ PROC AUTOREG DATA = data1 .0846 0.594 3.444 2. sas */ TITLE1 ’ ARCH (3) .833 0.278166 0. /* Plot partial autocorrelation function */ PROC GPLOT DATA = data2 .000039 0. The option NOINT suppresses an intercept parameter.0095 0. Program 2.2.147593 0.178074 0. are calculated by means of PROC ARIMA and the IDENTIFY statement. The subsequent procedure GPLOT displays these partial autocorrelations.0269 0.000214 0.157807 0.5 I = JOIN . .0001 0. IDENTIFY VAR = y2 NLAG =50 OUTCOV = data2 . Note that.8 (hongkong plot.0465 5.2.2.213 3. RUN . TITLE2 ’ Hongkong Data ’.9: Analyzing the log returns. PROC AUTOREG is used to analyze the ARCH(3) model for the daily log returns.2. /* Estimate ARCH (3) .sas). /* Note that this program needs data1 generated by the previous →program ( hongkong_plot . GARCH=(q=3) selects the ARCH(3) model and DIST=T determines a t distribution for the innovations Zt in the model equation.

. Models of Time Series 2. . ap and b1 . we could also fit the model Yt = v≥0 αv εt−v to the data. however. . yn are (possibly) variance-stabilized as well as trend or seasonally adjusted.3 Specification of ARMA-Models: The Box– Jenkins Program The aim of this section is to fit a time series model (Yt )t∈Z to a given set of data y1 . Forecasting: The prediction of future values of the original process. . Diagnostic check: The fit of the ARMA(p. Order selection: Choice of the parameters p and q. q)-process (Yt )t∈Z . This suggests to choose the order q such that r(q) is clearly different from zero. But then we would have to determine infinitely many parameters α v . The order p of an AR(p)-process can be estimated in an analogous way using the empirical partial autocorrelation function α(k). This. as ˆ defined in (2. 2. v ≥ 0. The Box–Jenkins program consists of four steps: 1. q)-process. which we will fit to the data in the following. whereas r(k) for k ≥ q + 1 is quite close to zero. . k ≥ 1. . . reasonable to fit only the finite number of parameters of an ARMA(p.1 (iv) shows that the autocorrelation function ρ(k) vanishes for k ≥ q + 1. bq are estimated. . .2. . Yn from an ARMA(p. Since α(p) should be close to the p-th coefficient a p ˆ . . 3. q)-model with the estimated coefficients is checked. by the correlogram. however. As noted in Section 2. By the principle of parsimony it seems. is obviously a rather vague selection rule. yn collected in time t. We suppose that the data y1 . . . . .2.. The four steps are discussed in the following. 4.10). Lemma 2. Estimation of coefficients: The coefficients a1 . . where (εt ) is a white noise.94 Chapter 2.e. . . . Order Selection The order q of a moving average MA(q)-process can be estimated by means of the empirical autocorrelation function r(k) i. We assume that they were generated by clipping Y1 . .

whereas α(k) ≈ 0 ˆ for k > p. Note that the variance estimate σp.3 Specification of ARMA-Models: The Box–Jenkins Program of the AR(p)-process. which is different from zero. q). q)-process (Yt )t∈Z . b1 . yn . . Ytk ) with t1 < · · · < tk is multivariate normal. . minimizing some function. thus helping to prevent overfitting of the data by choosing p and q too large. The additive terms in the above criteria serve. q) := log(ˆp. . . q) := log(ˆp. q) := log(ˆp. n+1 95 AIC and BIC are discussed in Section 9. q)-process is a bit more challenging. see below. . . . In the next step we will derive estimators of the constants a1 . ˆ The choice of the orders p and q of an ARMA(p. (p + q) log(n + 1) n+1 p+q+1 .q ) + 2 σ2 the Bayesian Information Criterion BIC(p. . . . . .q ) + σ2 and the Hannan-Quinn Criterion HQ(p. which is based on an estimate σp. . Popular functions are Akaike’s Information Criterion AIC(p. ap . t ∈ Z. . Estimation of Coefficients Suppose we fixed the order p and q of an ARMA(p. where the joint distribution of an arbitrary vector (Yt1 . . with Y1 .3 of Brockwell and Davis (1991) for Gaussian processes (Yt ).q ) + σ2 2(p + q)c log(log(n + 1)) n+1 with c > 1. .q of the ˆ2 variance of ε0 . For the HQ-criterion we refer to Hannan and Quinn (1979). . the above rule can be applied again with r replaced by α. In this case one commonly takes the pair (p. Yn now modelling the data y1 .q will in general ˆ2 become arbitrarily small as p + q increases. . bq in the model Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q . therefore.2. as penalties for large values. . . .

n} = . .Σ (x1 . yn ). . Ys ) = v≥0 w≥0 αv αw Cov(ε−v . . . . . . if the vector (Y1 . −∞ −∞ ϕµ. .j≤n denoted by N (µ. . . . t ∈ Z. the joint distribution of (Y1 . . We assume that the process (Yt ) satisfies the stationarity condition (2. . . xn ). Precisely. . . . i = 1. . . . i = 1. .Σ (x1 . xn ∈ R is the density of the n-dimensional normal distribution with mean vector µ = (µ. . zn ) dzn . µ)T ∈ Rn and covariance matrix Σ = (γ(i − j))1≤i. the unknown underlying mean vector µ and covariance matrix Σ ought to be such that ϕµ. .Σ (x1 . xn ) − µ)T 2 (2π)n/2 (det Σ)1/2 for arbitrary x1 . . . . . Yn ) realizes close to (x1 . . . . Σ). . . we have for ε↓0 P {Yi ∈ [xi − ε.. . . . . xn ) − µ)Σ−1 ((x1 . yn ) is maximized. The number ϕµ. . Here ϕµ. in which case Yt = v≥0 αv εt−v . is invertible. . ap and b1 . . . dz1 ≈ 2n εn ϕµ. .Σ (x1 . . where (εt ) is a white noise and the coefficients αv depend only on a1 . . . . .. . xn ) reflects the probability that the random vector (Y1 . . . . dx1 for arbitrary s1 . .3). . Consequently we have for s ≥ 0 γ(s) = Cov(Y0 .Σ (y1 . . . . bq . εs−w ) = σ 2 αv αs+v . . . Yn ) is a n-dimensional normal distribution s1 sn P {Yi ≤ si . Models of Time Series The Gaussian Model: Maximum Likelihood Estimator We assume first that (Yt ) is a Gaussian process and thus. . . . . The computation of these parameters leads to the maximum likelihood estimator of µ and Σ. . .Σ (z1 . . n} x1 +ε xn +ε = x1 −ε . .96 Chapter 2. . xn ) dxn . . . . Yn ) actually attained the value (y1 . sn ∈ R. . . . xn −ε ϕµ. . The likelihood principle is the fact that a random variable tends to attain its most likely value and thus. . . xi + ε]. . xn ). . . . .. xn ) 1 1 = exp − ((x1 .. where µ = E(Y0 ) and γ is the autocovariance function of the stationary process (Y t ). v≥0 . . . . .

xn |ϑ) := ϕµ. ˆ The maximum likelihood estimator ϑ therefore satisfies ˆ l(ϑ|y1 . a1 . . . . µ. . yn ) . . b1 . . . . . . 1 Q(ϑ|x1 . . . . . depends only on a1 . xn ) := ((x1 . . . . maximizing the likelihood function is in general equivalent to the maximization of the loglikelihood function l(ϑ|y1 . yn ) = log L(ϑ|y1 . . . ap and b1 . ap . bq . . ˆ A parameter ϑ maximizing the likelihood function ˆ L(ϑ|y1 . . . . xn ) − µ)Σ −1 ((x1 . . . .2. . . . yn ) := p(y1 . . . . . . . . . ϑ is then a maximum likelihood estimator of ϑ. yn ) = sup L(ϑ|y1 . . . . yn ) ϑ n 1 1 = sup − log(2πσ 2 ) − log(det Σ ) − 2 Q(ϑ|y1 . . . .3 Specification of ARMA-Models: The Box–Jenkins Program The matrix 97 Σ := σ −2 Σ. . xn ) = (2πσ 2 )−n/2 (det Σ )−1/2 exp − where Q(ϑ|x1 . . yn ). . . The likelihood function pertaining to the outcome (y1 . . . . . . yn ) = sup l(ϑ|y1 . . . . yn |ϑ). . yn ). 2σ 2 . . . . . Due to the strict monotonicity of the logarithm. . therefore. . . 2 2 2σ ϑ The computation of a maximizer is a numerical and usually computer intensive problem. xn ) − µ)T is a quadratic function. . . . . xn ) ∈ Rn p(x1 . . bq ) ∈ Rp+q+2 and (x1 . . . . . . . . We can write now the density ϕµ. xn ) . . . .Σ (x1 .Σ (x1 . . . . xn ) as a function of ϑ := (σ 2 . . . yn ) is L(ϑ|y1 . . . . . .

. . . . . 0 −a 0 0   2 1 + a −a 0  . yn ) − µ)Σ −1 ((y1 . 2σ 2 = (y1 − µ) + (yn − µ) + (1 + a ) 2 2 2 (yi − µ) − 2a (yi − µ)(yi+1 − µ). yn ) = (2πσ 2 )−n/2 (1 − a2 )1/2 exp − where Q(ϑ|y1 .2.3. yn ) = ((y1 . yn ) − µ)T n−1 i=2 n−1 2 i=1 The inverse matrix is  1 −a −a 1 + a2   0 −a −1 Σ = .98 Chapter 2. then ˆ Yt = a1 Yt−1 + · · · + ap Yt−p + b1 εt−1 + · · · + bq εt−q .. . a) is given by L(ϑ|y1 ..... . 0 0  1 a a2 . .4 the autocovariance function as γ(s) = σ . Nonparametric Approach: Least Squares If E(εt ) = 0. see Exercise 2. . an−1  a 1 a an−2  1  .. . . .  . and thus. .. .. . Σ = . then the likelihood function of ϑ = (σ 2 . ..  . Check that the determinant of Σ −1 is det(Σ −1 ) = 1−a2 = 1/ det(Σ ). . . 0 −a 1  1 Q(ϑ|y1 . . . . 1  0 0 . 1 − a2 2 s ≥ 0. . .  . Models of Time Series Example 2.40.  2 −a 1 + a −a . The AR(1)-process Yt = aYt−1 + εt with |a| < 1 has by Example 2. yn ) . .. ..1. . .  1 − a2  . . . . n−1 n−2 n−3 a a a . If (Yt ) is a Gaussian process. . µ.  0 .

3 Specification of ARMA-Models: The Box–Jenkins Program would obviously be a reasonable one-step forecast of the ARMA(p. ˆ ˆ . ˆ We have no observation yt available for t ≤ 0. . ap . . ˆt The estimated residuals εt can then be computed from the recursion ˆ ε1 = y1 ˆ ε 2 = y 2 − a 1 y1 − b 1 ε 1 ˆ ˆ ε 3 = y 3 − a 1 y2 − a 2 y1 − b 1 ε 2 − b 2 ε 1 ˆ ˆ ˆ . based on Yt−1 . . bq of S 2 . b1 . ˆ ˆ ˆ The function S 2 (a1 . But from the assumption E(εt ) = 0 and thus E(Yt ) = 0. ap . . . . . b1 . . . . . . . . ap . . .2. leading to ˆ n S (a1 . bq ) n = t=−∞ n ε2 ˆt (yt − a1 yt−1 − · · · − ap yt−p − b1 εt−1 − · · · − bq εt−q )2 ˆ ˆ = t=−∞ is the residual sum of squares and the least squares approach suggests to estimate the underlying set of constants by minimizers a1 . Note that the residuals εt and the constants are nested. . ap . . . it is reasonable to backforecast yt by zero and to put εt := 0 for t ≤ 0. εt−q . t ≤ n. . ˆ εj = yj − a1 yj−1 − · · · − ap yj−p − b1 εj−1 − · · · − bq εj−q . . which depends on the ˆ choice of constants a1 . . . . . . . . . bq ) = t=1 2 ε2 . b1 . . . . . The prediction error is given by the residual ˆ Yt − Yt = ε t . . . q)process Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q . . Yt−p and εt−1 . . 99 Suppose that εt is an estimator of εt . . . bq and satisfies the recursion εt = yt − a1 yt−1 − · · · − ap yt−p − b1 εt−1 − · · · − bq εt−q . . . b1 . .

. . q)-process underlying the data. whether the estimated residuals εt . bq have been chosen in order to model an ARMA(p. The parameter K must be chosen such that the sample size n − k in rε (k) is ˆ large enough to give a stable estimate of the autocorrelation function. using the Yule-Walker equations as described in (2. since in this case the value Q(K) is unexpectedly large. n − 1. . . . . 2 By χ2 K−p−q we denote the distribution function of the χ -distribution with K − p − q degrees of freedom. and checks.7). . t = 1. ap of a pure AR(p)-process can be estimated directly. . .4 in Brockwell and Davis (1991)). To this end one considers the pertaining empirical autocorrelation function ˆ rε (k) := n−k ε ¯ ε j=1 (ˆj − ε)(ˆj+k − n ε ¯2 j=1 (ˆj − ε) ε) ¯ . To accelerate the convergence to 2 the χK−p−q distribution under the null hypothesis of an ARMA(p. n. behave approximately ˆ like realizations from a white noise process. k = 1.g. The Portmanteau-test of Box and Pierce (1970) checks. . . ˆ2 which follows asymptotically for n → ∞ a χ2 -distribution with K − p − q degrees of freedom if (Yt ) is actually an ARMA(p. q} to n. . Section 9. Diagnostic Check Suppose that the orders p and q as well as the constants a1 . . b1 . q)-model is rejected if the p-value 1 − χ2 K−p−q (Q(K)) is too small. Models of Time Series where j now runs from max{p. . ˆ where ε = n−1 n εj . . The coefficients a1 . whether the values rε (k) are suf¯ j=1 ˆ ficiently close to zero. ap . The ARMA(p. This decision is based on K Q(K) := n k=1 rε (k). q)-process (see e. . . . . one often replaces the Box–Pierce statistic Q(K) by the Box– Ljung statistic (Ljung and Box (1978)) K Q (K) := n k=1 ∗ n+2 n−k 1/2 2 K rε (k) ˆ = n(n + 2) k=1 1 2 r (k) ˆ n−k ε with weighted empirical autocorrelations. q)process. .100 Chapter 2. .

16) are then of Yule-Walker type n−1 ρ(h + s) = u=0 c∗ ρ(s − u).. . ..16) Proof. Yn . The equations (2. n. . . . . . 1.cn−1 ∈R min E  Yn+h − cu Yn−u .3.. . .. based on Y1 . .2. . i = 1. . The following result provides a sufficient condition for the optimality of weights. . . Then we have n−1 ∗ u=0 cu Yn−u is a best h-step forecast n−1 ˜ Yn+h := u=0 cu Yn−u be an arbitrary forecast. of Yn+h . . If the weights c∗ . E  Yn+h − c∗ Yn−u u = c0 . u s = 0. . . c∗ have the property that n−1 0 n−1 E Yi Yn+h − ˆ then Yn+h := c∗ Yn−u u u=0 = 0.3 Specification of ARMA-Models: The Box–Jenkins Program 101 Forecasting We want to determine weights c∗ . . Yn . . . Let (Yt ) be an arbitrary stochastic process with finite second moments. . Lemma 2. c∗ ∈ R such that for h ∈ N 0 n−1     2 2 n−1 u=0 n−1 u=0 n−1 ˆ Then Yn+h := u=0 c∗ Yn−u with minimum mean squared error is said u to be a best (linear) h-step forecast of Yn+h .2. based on ˜ E((Yn+h − Yn+h )2 ) ˆ ˆ ˜ = E((Yn+h − Yn+h + Yn+h − Yn+h )2 ) n−1 u=0 ˆ = E((Yn+h − Yn+h ) ) + 2 2 2 ˆ (c∗ − cu ) E(Yn−u (Yn+h − Yn+h )) u ˆ ˜ + E((Yn+h − Yn+h ) ) ˆ ˆ ˜ = E((Yn+h − Yn+h )2 ) + E((Yn+h − Yn+h )2 ) ˆ ≥ E((Yn+h − Yn+h )2 ). (2. n − 1. Suppose that (Yt ) is a stationary process with mean zero and autocorrelation function ρ. . . . . Let Y1 .

. If we put h = 1. The best forecast of Yn+1 is by (2. ..18). P n is invertible...   . then  ∗    c0 ρ(h) .j≤n is positive definite. If this matrix is invertible.  = P −1  .  .   .  . then equation (2. n c∗ ρ(h + n − 1) n−1 is the uniquely determined solution of (2.   . 0 a  a 0   1+a2 1 1+a2 0   0 a a 1 1+a2 0    1+a2 Pn =  . ρ(1) = a/(1 + a2 ). xT P n x > 0 for any x ∈ Rn unless x = 0 (Exercise 2.17) Check that the matrix P n = (Corr(Yi ..18) implies that c∗ equals the n−1 partial autocorrelation coefficient α(n). .2.  .40). In this case.17).   ∗  c0 ρ(h)  ∗   ρ(h + 1)   = P n  c.102 or.  a   0 . . α(n) is the coefn−1 ∗ ˆ ficient of Y1 in the best linear one-step forecast Yn+1 = u=0 cu Yn−u of Yn+1 . . Consider the MA(1)-process Yt = εt + aεt−1 with E(ε0 ) = 0. ... Its autocorrelation function is by Example 2. in matrix language  Chapter 2. and thus. .1   .  := P −1   . therefore. . 1 1+a2  a 1 0 0 .3. The matrix P n equals therefore   a 1 1+a2 0 0 .. . n  .3. ∗ cn−1 ρ(h + n − 1) (2. Models of Time Series with the matrix P n as defined in (2. n−1 ∗ u=0 cu Yn−u with  a   ∗  1+a2 c0  0  ..8). 1+a2 . ∗ cn−1 0 Example 2.18) .  . Yj ))1≤i. (2. ρ(u) = 0 for u ≥ 2. .2 given by ρ(0) = 1.

.. .p) min(h−1.3 Specification of ARMA-Models: The Box–Jenkins Program −1 which is a/(1 + a2 ) times the first column of P n . is a stationary AR(p)-process. it is invertible by Theorem 2.   = u=1 au E ˆ Yn+h−u − Yn+h−u Yi = 0. . p Theorem 2. t ∈ Z. The case of an arbitrary h ≥ 2 is now a consequence of the recursion ˆ E((Yn+h − Yn+h )Yi )  = E εn+h + min(h−1. Suppose that Yt = u=1 au Yt−u + εt . .2. Since (Yt ) satisfies the stationarity condition (2. . almost surely.5. Y n . This implies in particular E(Yt εt+h ) = u≥0 bu E(εt−u εt+h ) = 0 for any h ≥ 1. cf.e.2.3. . Proof.18) the constant 0. . t ∈ Z. . Theorem 2.p) min(h−1. . . there exists an absolutely summable causal filter (bu )u≥0 such that Yt = u≥0 bu εt−u . The best h-step forecast for arbitrary h ≥ 2 is recursively given by ˆ ˆ ˆ Yn+h = a1 Yn+h−1 + · · · + ah−1 Yn+1 + ah Yn + · · · + ap Yn+h−p .3).3) and has zero mean E(Y0 ) = 0. Note that Yn+h is for h ≥ 2 uncorrelated with Y1 . Let n ≥ p.2. and Lemma 2. . . which satisfies the stationarity condition (2. .3. . . .1.3. Hence we obtain for i = 1.p) u=1 au Yn+h−u − u=1 ˆ au Yn+h−u  Yi  i = 1. The best forecast of Yn+h for h ≥ 2 is by (2. n.3 i. The best one-step forecast is ˆ Yn+1 = a1 Yn + a2 Yn−1 + · · · + ap Yn+1−p 103 and the best two-step forecast is ˆ ˆ Yn+2 = a1 Yn+1 + a2 Yn + · · · + ap Yn+2−p . n ˆ E((Yn+1 − Yn+1 )Yi ) = E(εn+1 Yi ) = 0 from which the assertion for h = 1 follows by Lemma 2. Yn and thus not really predictable by Y1 . .4.2.

4Yn − 0.3) and has zero mean. 1)process Yt = 0.4Yn+1 + 0 + 0. obtain Y1 := 0.2Y1 . n.4Yn+h−1 = · · · = 0.4Yt−1 + εt − 0. which satisfies the stationarity condition (2. Suppose that n + q − p ≥ 0.3.6ˆ1 ε = −0. Yn+h = 0. q Theorem 2. thus. is an ARMA(p. q)-process.3. . precisely E(ε0 ) = 0.6εt−1 . until Yi and εi are defined for i = 1.2Y1 ) = −0.4Yn + 0 − 0. . . First we need the optimal 1-step forecast Yi for i = 1. which shows that for an ARMA(p. .104 Chapter 2. .4Y2 + 0 − 0.5. ε Yn+2 = 0.12Y1 .6ˆ2 ε = 0. . with E(Yt ) = E(εt ) = 0.4Y1 + 0 − 0. . These are defined by putting unknown values of Yt with an index t ≤ 0 equal to their expected value. Example 2. Y2 := 0.4Y2 − 0. .6(Y2 + 0. We illustrate the best forecast of the ARMA(1. n.2Y2 − 0.6(Yn − Yn ). which is zero.6ˆn = 0. . Models of Time Series A repetition of the arguments in the preceding proof implies the following result. . . Suppose that Yt = p au Yt−u + εt + v=1 bv εt−v . ε1 := Y1 − Y1 = Y1 . . .4h−1 Yn+1 −→ 0. ˆ ε2 := Y2 − Y2 = Y2 + 0. . u=1 t ∈ Z.2Y1 . We. Y3 := 0.6. The actual forecast is then ˆ given by Yn+1 = 0. q)-process the forecast of Yn+h for h > q is controlled only by the AR-part of the process. . . ˆ . The best h-step forecast of Yn+h for h > q satisfies the recursion p ˆ Yn+h = u=1 ˆ au Yn+h−u . h→∞ . ˆ ε3 := Y3 − Y3 . t ∈ Z.

The aim is the derivation of best linear estimates of Xt .4 State-Space Models In state-space models we have.20). 105 2. A multivariate state-space model is now defined by the state equation Xt+1 = At Xt + Bt εt+1 ∈ Rk . We say that a time series (Yt ) has a state-space representation if it satisfies the representations (2. bv in the above forecasts by their estimated values. By E(W ) we denote the vector of the componentwise expectations of W . They are linked by the assumption that (Yt ) is a linear function of (Xt ) with an added noise. t ≥ 1.2. Many models of time series such as ARMA(p.4 State-Space Models where εt with index t ≥ n + 1 is replaced by zero.20) We assume that (At ). and the observation equation Y t = C t Xt + η t ∈ R m . (2. q)-processes can be embedded in state-space models if we allow in the following sequences of random vectors Xt ∈ Rk and Yt ∈ Rm . where the linear function may vary in time.. are uncorrelated. ηt . where two random vectors W ∈ Rp and V ∈ Rq are said to be uncorrelated if their components are i.e. . (2. based on (Ys )s≤t . since it is uncorrelated with Yi . In practice one replaces the usually unknown coefficients au .19) and (2. in general. We suppose further that X0 and εt .19) describing the time-dependent behavior of the state Xt ∈ Rk . a nonobservable target process (Xt ) and an observable process (Yt ). t T Cov(ηt ) = E(ηt ηt ) =: Rt . if the matrix of their covariances vanishes E((W − E(W )(V − E(V ))T ) = 0. (εt ) and (ηt ) are uncorrelated sequences of white noises with mean vectors 0 and known covariance matrices Cov(εt ) = E(εt εT ) =: Qt . (Bt ) and (Ct ) are sequences of known matrices. i ≤ n.

106 Chapter 2. 1 Note that the state Xt is nonstochastic i..e. Yt−p+1 )T . An AR(p)-process Yt = a1 Yt−1 + · · · + ap Yt−p + εt with a white noise (εt ) has a state-space representation with state vector Xt = (Yt . .2. . Bt = 0.. 0 0 .. µt . This model is moreover time-invariant..4. since the matrices A. ap−1 1 0 . This simple model can be represented as a state-space model as follows. Models of Time Series Example 2. Let (ηt ) be a white noise in R and put Yt := µt + ηt with linear trend µt = a + bt. . 0  0 A :=  0 1 . . 0 . 1 From the recursion µt+1 Xt+1 = and with C := (1. . 0) µt + ηt = CXt + ηt .. Yt−1 . . . B := Bt and C do not depend on t. . . . .. Define the state vector Xt as Xt := and put 1 b 0 1 = µt + b we then obtain the state equation A := µt+1 1 = 1 b 0 1 µt 1 = AXt . . . 1  ap 0  0 . Example 2.. If we define the p × p-matrix A by  a1 a2 . .1. 0) the observation equation Yt = (1.4.

2. bq ) we obtain the state equation Xt+1 = AXt + Bεt+1 and the observation equation Yt = CXt . .. 0. . εt−q )T ∈ Rq+1 .4 State-Space Models and the p × 1-matrices B. . 0)T and the 1 × (q + 1)-matrix C := (1. . .. . . . . C T by B := (1. . With the (q + 1) × (q + 1)-matrix  0 0 0 1 0 0  A := 0 1 0 . For the MA(q)-process Yt = εt + b1 εt−1 + · · · + bq εt−q we define the non observable state Xt := (εt . .3. . Example 2. . . . . ... 0 0 0  . . . then we have the state equation Xt+1 = AXt + Bεt+1 and the observation equation Yt = CXt . . 0 0  0 0 . . .4. 0 0 . b1 . εt−1 . 1 0 107 the (q + 1) × 1-matrix B := (1. . 0)T =: C T ... . . . 0. . . .

. . .. . 0 . . . 0 . . ... .. 0 .. . 0   1 0 . . . . .. . . . . . . q)-processes Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q . . . 0 0 0  .  0 . 0 .. 0 0 the (p + q) × 1-matrix B := (1.. .. 0. Yt−p+1 . . . . . . . .. . .. . .4. .. εt .. . .. ..108 Chapter 2. . εt−1 . . .. We define the (p + q) × (p + q)-matrix  a1 a2 . 0 1 . .. . εt−q+1 )T ∈ Rp+q . . .. . 0. .. . . In this case the state vector can be chosen as Xt := (Yt . . . . . 0 0  ..  b2 . ap−1 ap b1 1 0 . ..  0 .4.... .. . . . .. . 1 . .. we obtain a state-space representation of ARMA(p... . . 0  . ... . 0.. 0 1 0 0 A :=  0 . . . 1. . bq−1 bq .. 0 . 0).. . . 0 1 0 . ... . ... . 0. ... 0  . . Then we have the state equation Xt+1 = AXt + Bεt+1 and the observation equation Yt = CXt .. 0)T with the entry 1 at the first and (p + 1)-th position and the 1 × (p + q)matrix C := (1. 0. . Yt−1 . Combining the above results for AR(p) and MA(q)processes. . . Models of Time Series Example 2..

21) of Xt . We want to compute the best linear prediction ˆ X t := D1 Y1 + · · · + Dt Yt (2.20) is the estimation of the nonobservable state Xt . (2. .e.Dt min Dj Yj ) (Xt − T Dj Yj )). It is possible to derive an estimate of Xt recursively from an estimate of Xt−1 together with the last observation Yt .. with respect to the inner product E(XY ) of two random variable X and Y. . based on Y1 . .20) satisfies ˆ E((Xt − X t )YsT ) = 0. It states that X t is a best linear prediction of ˆ Xt based on Y1 .5. . (2. ˆ Note that E((Xt − X t )YsT ) is a k × m-matrix. Yt i. 1 ≤ s ≤ t.. ˆ Lemma 2. . . known as the Kalman recursions (Kalman 1960).19). If the estimate X t defined in (2.4. . which is generated by ˆ multiplying each component of Xt − X t ∈ Rk with each component of Ys ∈ Rm .3. Yt if each component of the vector Xt − X t is orthogonal to each component of the vectors Ys .. . . .22) j=1 By repeating the arguments in the proof of Lemma 2.2 we will prove ˆ the following result. (2.4 State-Space Models 109 The Kalman-Filter The key problem in the state-space model (2. 1 ≤ s ≤ t. Dt are such that the mean squared error is minimized ˆ ˆ E((Xt − X t )T (Xt − X t )) t t = E((Xt − = j=1 Dj Yj ) (Xt − E((Xt − T Dj Yj )) j=1 t j=1 t k×m−matrices D1 . . the k × m-matrices D1 . We obtain in this way a unified prediction technique for each time series model that has a state-space representation. .. .2..22).23) then it minimizes the mean squared error (2.

since in the second-to-last line the final term is nonnegative and the second one vanishes by the property (2. Models of Time Series Proof. Then ˜ ˆ X t := At−1 X t−1 (2. . which is easy to see. . .110 Chapter 2. ˜ E((Xt − X t )YsT ) = 0 for 1 ≤ s ≤ t − 1.24) is the best linear prediction of Xt based on Y1 . Yt−1 . . . . . From this we obtain that ˜ ˜ Y t := Ct X t is the best linear prediction of Yt based on Y1 . . Yt−1 .e. . since E((Yt − ˜ ˜ Y t )YsT ) = E((Ct (Xt − X t ) + ηt )YsT ) = 0. Note that εt and Ys are uncorrelated if s < t i. Yt . Define now by ˆ ˆ ˜ ˜ ˜ ∆t := E((Xt −X t )(Xt −X t )T ) and ∆t := E((Xt −X t )(Xt −X t )T ).23). Yt−1 . . Then we have E((Xt − Xt )T (Xt − Xt )) t =E ˆ Xt − X t + T j=1 (Dj − Dj )Yj t T t ˆ Xt − X t + j=1 (Dj − Dj )Yj ˆ ˆ = E((Xt − X t ) (Xt − X t )) + 2 t j=1 ˆ E((Xt − X t )T (Dj − Dj )Yj ) +E j=1 (Dj − Dj )Yj T t j=1 (Dj − Dj )Yj ˆ ˆ ≥ E((Xt − X t )T (Xt − X t )). . . . ˆ Let now X t−1 be the best linear prediction of Xt−1 based on Y1 .. We simply replaced εt in the state equation by its expectation 0. . note that ηt and Ys are uncorrelated if s < t. 1 ≤ s ≤ t − 1. Let Xt = t Dj Yj ∈ Rk be an arbitrary linear combination j=1 of Y1 . . . .

. How can we use this additional information to improve the prediction ˜ X t of Xt ? To this end we add a matrix Kt such that we obtain the ˆ best prediction X t based on Y1 ..24) is a solution of the equation ˜ ˜ Kt (Ct ∆t CtT + Rt ) = ∆t CtT . the matrix Kt is called the Kalman gain. t−1 111 ˆ since εt and Xt−1 − X t−1 are obviously uncorrelated. Suppose that we have observed Y1 ..2.25) i. s ≤ t − 1. and that we have pre˜ ˆ dicted Xt by X t = At−1 X t−1 . . t. . t. . . Assume that we now also observe Yt . . . Yt−1 .4. (2. The matrix Kt in (2.6. .e. Yt : ˜ ˜ ˆ X t + Kt (Yt − Y t ) = X t (2. . s ≤ t. In this case. Then we have ˜ ˆ ˆ ∆t = E((At−1 (Xt−1 − X t−1 ) + Bt−1 εt )(At−1 (Xt−1 − X t−1 ) + Bt−1 εt )T ) ˆ ˆ = E(At−1 (Xt−1 − X t−1 )(At−1 (Xt−1 − X t−1 ))T ) + E((Bt−1 εt )(Bt−1 εt )T ) T = At−1 ∆t−1 AT + Bt−1 Qt Bt−1 . . i. .4. . Lemma 2. Note that an arbitrary k × m-matrix Kt satisfies ˜ ˜ E (Xt − X t − Kt (Yt − Y t ))YsT ˜ ˜ = E((Xt − X t )YsT ) − Kt E((Yt − Y t )YsT ) = 0. In complete analogy one shows that ˜ ˜ ˜ E((Yt − Y t )(Yt − Y t )T ) = Ct ∆t CtT + Rt .4 State-Space Models the covariance matrices of the approximation errors. .5 such ˆ that Xt − X t and Ys are uncorrelated for s = 1. . .e. . we have to choose the matrix Kt according to Lemma 2. ˆ ˜ ˜ 0 = E((Xt − X t )YsT ) = E((Xt − X t − Kt (Yt − Y t ))YsT ). The matrix Kt has to be chosen such that Xt − X t and Ys are uncorrelated for s = 1.26) ˆ Proof.

the matrix Kt needs to satisfy only ˜ ˜ 0 = E((Xt − X t )YtT ) − Kt E((Yt − Y t )YtT ) ˜ ˜ ˜ ˜ = E((Xt − X t )(Yt − Y t )T ) − Kt E((Yt − Y t )(Yt − Y t )T ) ˜ ˜ = ∆t CtT − Kt (Ct ∆t CtT + Rt ).4. .6. moreover. The recursion in the discrete Kalman filter is done in two steps: From ˆ X t−1 and ∆t−1 one computes in the prediction step first ˜ ˆ X t = At−1 X t−1 . T ˜ ∆t = At−1 ∆t−1 AT + Bt−1 Qt Bt−1 . . Note that Y t is a linear ˜ combination of Y1 . .6.27) . then ˜ ˜ Kt := ∆t CtT (Ct ∆t CtT + Rt )−1 is the uniquely determined Kalman gain.4. Yt−1 and that ηt and Xt − X t as well as ηt and Ys are uncorrelated for s ≤ t − 1. ˜ If the matrix Ct ∆t CtT + Rt is invertible.112 Chapter 2. . ˜ ˜ ˜ ˜ = E((Xt − X t )(Ct (Xt − X t ) + ηt )T ) − Kt E((Yt − Y t )(Yt − Y t )T ) ˜ ˜ ˜ ˜ = E((Xt − X t )(Xt − X t )T )CtT − Kt E((Yt − Y t )(Yt − Y t )T ) ˜ But this is the assertion of Lemma 2.26) and the arguments in the proof of Lemma 2. ˜ ˜ Y t = Ct X t . We have. for a Kalman gain ˆ ˆ ∆t = E((Xt − X t )(Xt − X t )T ) ˜ ˜ = ∆t + Kt (Ct ∆t CtT + Rt )KtT ˜ ˜ − ∆t CtT KtT − Kt Ct ∆t ˜ ˜ = ∆t − Kt Ct ∆t ˜ ˜ ˜ = ∆t + Kt E((Yt − Y t )(Yt − Y t )T )KtT ˜ ˜ ˜ ˜ − E((Xt − X t )(Yt − Y t )T )KtT − Kt E((Yt − Y t )(Xt − X t )T ) ˜ ˜ ˜ ˜ = E (Xt − X t − Kt (Yt − Y t ))(Xt − X t − Kt (Yt − Y t ))T by (2. Models of Time Series In order to fulfill the above condition. t−1 (2.

If in addition we require that the state-space model (2. see e.. Yt = Xt . Let (ηt ) be a white noise in R with E(ηt ) = 0. The pertaining h-step prediction of Yt+h is then ˜ ˜ Y t+h := Ct+h X t+h . 113 (2. The prediction step (2.7. This process can be represented as a state-space model by putting Xt := µ. ˆ ˜ ˜ X t = X t + Kt (Yt − Y t ).20) is completely determined by some parametrization ϑ of the distribution of (Yt ) and (Xt ). ˆ however. At = 1 = Ct and Bt = 0. h ≥ 1. The number σ 2 reflects the degree of uncertainty about the underlying model. ∆t = ∆t−1 .28) ˜ ˜ An obvious problem is the choice of the initial values X 1 and ∆1 .7 below.28) under suitable conditions by a maximum likelihood estimate of ϑ. . (2. ˜ ˆ ˆ By iterating the 1-step prediction X t = At−1 X t−1 of X t in (2. ˜ ˆ with the initial value X t+0 := X t .24) h times. 2 E(ηt ) = σ 2 > 0 and put for some µ ∈ R Yt := µ + ηt . see for instance Example 2.19). we obtain the h-step prediction of the Kalman filter ˜ ˜ X t+h := At+h−1 X t+h−1 .g. then we can estimate the matrices of the Kalman filter in (2.5 of Brockwell and Davis (2002) or Section 4. ˜ ˜ ∆t = ∆ t − K t C t ∆ t .27) and (2. Example 2. Simulations as well as theoretical results show. h ≥ 1. One ˜ ˜ frequently puts X 1 = 0 and ∆1 as the diagonal matrix with constant 2 entries σ > 0. that the estimates X t are often not affected by the initial ˜ ˜ values X 1 and ∆1 if t is large.2.5 in Janacek and Swift (1993).4.e.4.4 State-Space Models In the updating step one then computes Kt and the updated values ˆ X t . t ∈ Z. ∆t ˜ ˜ Kt = ∆t CtT (Ct ∆t CtT + Rt )−1 . Section 8.27) of the Kalman filter is now given by ˜ ˆ ˜ ˜ ˜ Xt = Xt−1 . with state equation Xt+1 = Xt and observation equation Yt = Xt + ηt i.

144 were log-transformed xt = log(yt ) to stabilize the variance. ∆t−1 + σ 2 ˆ Note that ∆t = E((Xt − Xt )2 ) ≥ 0 and thus.8.. Example 2. finally. given by Xt . t = 1. Yt+h ˜ are. zt = ∆xt − ∆xt−12 were computed to remove the seasonal component of 12 months. The . which means ˆ that additional observations Yt do not contribute to Xt if t is large. The update step (2. Its limit ∆ := limt→∞ ∆t consequently exists and satisfies ∆=∆ σ2 ∆ + σ2 ˆ i. σ2 0 ≤ ∆t = ∆t−1 ≤ ∆t−1 ∆t−1 + σ 2 is a decreasing and bounded sequence.1 together with 12-step forecasts based on the Kalman filter. . . we obtain for the mean squared error of the h-step prediction ˜ Yt+h of Yt+h ˜ ˆ E((Yt+h − Yt+h )2 ) = E((µ + ηt+h − Xt )2 ) t→∞ 2 ˆ = E((µ − Xt )2 ) + E(ηt+h ) −→ σ 2 .28) of the Kalman filter is Kt = ∆t−1 ∆t−1 + σ 2 ˆ ˆ ˆ Xt = Xt−1 + Kt (Yt − Xt−1 ) ∆t = ∆t−1 − Kt ∆t−1 σ2 = ∆t−1 . no matter how the initial values ˜ ˜ X1 and ∆1 are chosen. therefore. This means that the mean squared error E((Xt − Xt )2 ) = ˆ E((µ − Xt )2 ) vanishes asymptotically. The h-step predictions Xt+h . The original data yt .4.3. . Models of Time Series ˜ ˜ Note that all these values are in R.114 Chapter 2.e. first order differences ∆xt = xt − xt−1 were used to eliminate the trend and. Further we have limt→∞ Kt = 0. . ∆ = 0. Finally. The following figure displays the Airline Data from Example 1.

/* Read in the data and compute log . . yl = LOG ( y ) . . t = 145. .2. . sas */ TITLE1 ’ Original and Forecasted Data ’. .1a: Airline Data and predicted values using the Kalman filter. VAR yl (1 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 /* airline_kalman . SET data2 .4 State-Space Models Kalman filter was applied to forecast zt .transformation */ DATA data1 . . 156. and the results were transformed in the reverse order of the preceding steps to predict the initial values yt . /* Compute trend and seasonally adjusted data set */ PROC STATESPACE DATA = data1 OUT = data2 LEAD =12. t = _N_ . . TITLE2 ’ Airline Data ’. 156.4. .12) . . INPUT y .transformation */ DATA data3 . /* Compute forecasts by inverting the log . INFILE ’c :\ data \ airline . 115 Plot 2. yhat = EXP ( FOR1 ) . txt ’. ID t . t = 145.

AXIS2 LABEL =( ’ January 1949 to December 1961 ’) . This is invoked by LEAD=12. The variable t contains the observation number. Yt2 +k ). 36 Program 2.5 I = JOIN L =1. Z) = a Cov(Y.12) of the PROC STATESPACE procedure tells SAS to use first order differences of the initial data to remove their trend and to adjust them to a seasonal component of 12 months. SYMBOL2 C = BLACK V = CIRCLE H =1. The statement VAR yl(1. Show that for arbitrary a. Models of Time Series 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 /* Graphical options */ LEGEND1 LABEL =( ’ ’) VALUE =( ’ original ’ ’ forecast ’) . Yt2 ) = Cov(Yt1 +k . Finally.7 I = JOIN L =1. (Yt ) be stationary processes such that Cov(Xt . 2. the two data sets are merged and displayed in one plot. In the first data step the Airline Data are read into data1. t2 ∈ Z and k = 0 E(Yt1 ) = E(Yt1 +k ) but Cov(Yt1 . Their logarithm is computed and stored in the variable yl. thereby inverting the logtransformation in the first data step. SYMBOL1 C = BLACK V = DOT H =0. Exercises 2. Chapter 2. PLOT y * t =1 yhat * t =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = →LEGEND1 . Show that Cov(aY + b. (i) Let (Xt ). /* Plot data and forecasts */ PROC GPLOT DATA = data4 . b ∈ C the linear combinations (aXt + bYt ) yield a stationary process. The data are identified by the time index set to t. BY t .1. b ∈ C.2.116 /* Meger data sets */ DATA data4 ( KEEP = t y yhat ) . data3 contains the exponentially transformed forecasts. MERGE data1 data3 .3. Ys ) = 0 for t. AXIS1 LABEL =( ANGLE =90 ’ Passengers ’) . QUIT . RUN . a. The results are stored in the data set data2 with forecasts of 12 months after the end of the input data. Give an example of a stochastic process (Yt ) such that for arbitrary t1 . .1: Applying the Kalman filter. Suppose that the complex random variables Y and Z are square integrable. s ∈ Z. 2. Z).4.

e. |γ(h)| ≤ γ(0). h ∈ Z. z i. . Show that the autocovariance function γ : Z → C of a complexvalued stationary process (Yt )t∈Z . (ii) Show that the random variable Y = beiU has mean zero. γ is a Hermitian function.8. n → ∞. 2π) and b ∈ C. which is defined by ¯ ¯ γ(h) = E(Yt+h Yt ) − E(Yt+h ) E(Yt ). n ∈ N...4 State-Space Models (ii) Suppose that the decomposition Zt = Xt + Yt . Let Z1 .6. distributed random variables and choose λ ∈ R. |k| = 0. 2. . . Suppose that Yt .2. t ∈ Z holds.5. 2 2. the empirical autocovariance function at lag k. σ2 > 0 is the cosinoid process 117 Yt = Z1 cos(2πλt) + Z2 sin(2πλt). has the following properties: γ(0) ≥ 0. |k| ≥ n. n − 1. k ∈ Z. . . γ(h) = γ(−h). . For which means µ1 . . Express the ˆ t=1 mean square error E(ˆ n − µ)2 in terms of the autocovariance function µ γ and show that E(ˆ n − µ)2 → 0 if γ(n) → 0. Z2 be independent and normal N (µi . t = 1. n. (i) Show that the process Yt = Xeiat . .7. µ2 ∈ R 2 2 and variances σ1 . .s≤n zr γ(r − s)¯s ≥ 0 for z1 . is stationary.e. is a stationary process with mean µ. 2. Then µn := n−1 n Yt is an unbiased estimator of µ. i. a ∈ R. zn ∈ C. Show that stationarity of (Zt ) does not necessarily imply stationarity of (Xt ). . µ 2. where X is a complex valued random variable with mean zero and finite variance. Suppose that (Yt )t∈Z is a stationary process and denote by c(k) := 1 n n−|k| t=1 (Yt 0. . and 1≤r. i = 1. stationary? t∈Z 2. γ is a positive semidefinite function. where U is a uniformly distributed random variable on (0. 2.4. . ¯ ¯ − Y )(Yt+|k| − Y ). . σi ).

.. Let (Yt )t∈Z be a stationary process with mean zero. Express the autocovariance function of the difference filter of first order ∆Yt = Yt − Yt−1 in terms of γY . 2. . (iii) If c(0) > 0. i. 2. . .. .) Hint: Consider k ≥ n and write Ck = n−1 AAT with a suitable k × 2k-matrix A. If its autocovariance function satisfies γ(τ ) = 0 for some τ > 0... k. . E(c(k)) = γ(k). Suppose that (Yt ) is a stationary process with autocovariance function γY .e. Let (Yt ) be a stochastic process such that for t ∈ Z P {Yt = 1} = pt = 1 − P {Yt = −1}..e.e. the resulting covariance matrix may not be positive semidefinite. . . . (ii) Show that the k-dimensional empirical covariance matrix   c(0) c(1) . Find it when γY (k) = λ|k| . Models of Time Series (i) Show that c(k) is a biased estimator of γ(k) (even if the factor n−1 is replaced by (n − k)−1 ) i.9. c(k − 1) c(k − 2) . Suppose in addition that (Yt ) is a Markov process. . (If the factor n−1 in the definition of c(j) is replaced by (n − j)−1 . . k≥1 P (Yt = y0 |Yt−1 = y1 . Show further that Cm is positive semidefinite if Ck is positive semidefinite for k > m. 2. c(0) is positive semidefinite. . . γ(t + τ ) = γ(t). i. .10. Yt−k = yk ) = P (Yt = y0 |Yt−1 = y1 ).. (i) Is (Yt )t∈N a stationary process? (ii) Compute the autocovariance function in case P (Yt = 1|Yt−1 = 1) = λ and pt = 1/2. 0 < pt < 1.e. j = 1. then γ is periodic with length τ .118 Chapter 2. i.11. c(k − 1)  c(1) c(0) c(k − 2)  Ck :=  . . Ck is positive definite. then Ck is nonsingular. . .   . . t ∈ Z. for any t ∈ Z.

Plot the path of a random walk with normal N (µ. Hint: Use the general inequality |xy| ≤ (x2 + y 2 )/2.2. The autocorrelation function ρ of an arbitrary MA(q)-process satisfies q 1 1 − ≤ ρ(v) ≤ q.13. The process Yt = s=1 εs is said to be a random walk. 2 2 v=1 Give examples of MA(q)-processes.. ˜ Plot the path of (εt )t and (˜t )t for t = 1. Yt = v bv Zt−v . where the εt are neither independent nor identically distributed.e.17. if t is even. γ(2) = 1. µ = 0 and µ > 0. Compute the orders p and the coefficients au of the process Yt = p u=0 au εt−u with Var(ε0 ) = 1 and autocovariance function γ(1) = 2. t ∈ Z be an AR(1)-process with |a| > 1. . 100 and compare! ε t 2. .14. i. 2.16. Then we have E(Xt Yt ) = u v au bv E(Zt−u Zt−v ). σ 2 ) distributed εt for each of the cases µ < 0. Let Yt = aYt−1 + εt . . Let (εt )t∈Z be a white noise. Show that (˜t )t is a white noise process with E(˜t ) = 0 and Var(˜t ) = ε ε ε 1.12. Let (au ) be absolutely summable filters and let (Zt ) be a stochastic process with supt∈Z E(Zt2 ) < ∞. . 2. these bounds are sharp. if t is odd.4 State-Space Models 2. . where the lower bound and the upper bound are attained. √ εt = ˜ 2 (εt−1 − 1)/ 2.15. 119 2. γ(3) = −1 and γ(t) = 0 for t ≥ 4. Compute the autocorrelation function of this process. Is this process invertible? 2. Put for t ∈ Z Xt = u au Zt−u . Let (εt )t be a white noise process with independent εt ∼ N (0. 1) and define εt .

t = 1. Define the new process εt = ˜ ∞ j=0 (−a)−j Yt−j and show that (˜t ) is a white noise.18. . Find two MA(1)-processes with the same autocovariance functions. Generate and plot AR(3)-processes (Yt ).21. where |a| > 1. . (iii) all roots are on the unit circle.20. (ii) all roots are inside the unit disk.120 Chapter 2. . Then there exists a ∈ (−1. (iv) two roots are outside. Hint: Example 2. 2. Let (Yt )t∈Z be a stationary stochastic process with E(Yt ) = 0. 500 where the roots of the characteristic polynomial have the following properties: (i) all roots are outside the unit disk. and  1 if t = 0  ρ(t) = ρ(1) if t = 1  0 if t > 1. t ∈ Z. where |ρ(1)| < 1/2. Suppose that Yt = εt + aεt−1 is a noninvertible MA(1)-process.19. Plot the autocorrelation functions of MA(p)-processes for different values of p. .22. one root inside the unit disk. 2. ˜ ˜ 2.2. Show that Var(˜t ) = a2 Var(εt ) ε ε and (Yt ) has the invertible representation Yt = εt + a−1 εt−1 . 2.2. . Models of Time Series 2. 1) and a white noise (εt )t∈Z such that Yt = εt + aεt−1 .

Hint: Use the fact that necessarily ρ(1) ∈ (−1.25. 2. ˜ a . ∞ −j j=1 a εt+j with |a| > 1. 2.24.1.2.3). a2 ) is in the triangle ∆ := (α. Show that the AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt for a1 = 1/3 and a2 = 2/9 has the autocorrelation function ρ(k) = 16 2 21 3 |k| 121 + 5 1 − 21 3 |k| .10). α + β < 1 and β − α < 1 . Then (Yt ) is given by the expression Yt = − (see the proof of Lemma 2. 1).4 State-Space Models (v) one root is outside. k∈Z and for a1 = a2 = 1/12 the autocorrelation function ρ(k) = 45 1 77 3 |k| + 1 32 − 77 4 |k| . k ∈ Z. Y0 = 0. one root is inside the unit disk and one root is on the unit circle. Var(ε0 ) = σ 2 and put Yt = εt − Yt−1 . t ∈ Z. (i) Let (Yt ) denote the unique stationary solution of the autoregressive equations Yt = aYt−1 + εt . β) ∈ R2 : −1 < β < 1.23. 2. t}/ st. (vi) all roots are outside the unit disk but close to the unit circle. Yt ) = (−1)s+t min{s. 2. Define the new process 1 εt = Yt − Yt−1 . if the pair (a1 .26. Show that t ∈ N. An AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt satisfies the stationarity condition (2. √ Corr(Ys . Let (εt ) be a white noise with E(ε0 ) = µ.

i.3Yt−1 + 0. α(2) = − 1 + a2 + a4 . Y10 .3 and a2 = 0. Is there something like asymptotic stationarity for t → ∞? (ii) Choose a ∈ (−1.e.3Yt−2 + εt . t 2. Compute E(Yt ).. E(ε2 ) = 0. is t∈Z a2 . 2. Use the IML function ARMASIM to simulate the stationary AR(2)process Yt = −0. Var(Yt ) and Cov(Yt . 1). Models of Time Series and show that (˜t ) is a white noise with Var(˜t ) = Var(εt )/a2 . . and compute the correlation matrix of Y1 . Show that the value at lag 2 of the partial autocorrelation function of the MA(1)-process Yt = εt + aεt−1 . . . These ε ε calculations show that (Yt ) is the (unique stationary) solution of the causal AR-equations 1 Yt = Yt−1 + εt .e. ˜ i.29. equals the AR(1)-process Yt = aYt−1 + εt . Estimate the parameters a1 = −0. 2.27. Yt . conditional ˜ ˜ on Y0 = 0. Thus. every AR(1)-process with |a| > 1 can be represented as an AR(1)-process with |a| < 1 and a new white noise.122 Chapter 2. t ≥ 1.. ˜ a t ∈ Z. A stationary solution exists if the white noise process is degenerated. (ii) Show that for |a| = 1 the above autoregressive equations have no stationary solutions.28.3 by means of the Yule–Walker equations using the SAS procedure PROC ARIMA. . Yt+s ). a = 0. (i) Consider the process ˜ Yt := ε1 aYt−1 + εt for t = 1 for t > 1.

or AR(p)process reasonable? 2.1. . . Determine the joint density of Yp+1 .36. Hint: Apply the dominated convergence theorem (Lebesgue). Plot also their empirical counterparts. Yn for an ARCH(p)process Yt with normal distributed Zt given that Y1 = y1 . . y) = fY |X (y|x)fX (x). Show that the density of the t-distribution with m degrees of freedom converges to the density of the standard normal distribution as m tends to infinity. Apply the Box–Jenkins program. q)-processes for different values of p. Y ) can be written in the form fX. . Is a fit of a pure MA(q). Compute the autocovariance function of an ARMA(1.31. ∞ j 2 2 j=0 a1 Zt Zt−1 2 · · · · · Zt−j with probability 123 (ii) Show that E(Yt2 ) = a0 /(1 − a1 ). 2. q using the IML function ARMACOV. 2. Yp = yp .1. 2. Plot the autocorrelation functions of ARMA(p. Hint: Theorem 2. Hint: Recall that the joint density fX. where fY |X (y|x) := fX.1.4 State-Space Models 2. . is the (conditional) density of Y given X = x and fX .33.30. Y . . . y)/fX (x) if fX (x) > 0. fY is the (marginal) density of X. else. 2. 4 (iii) Evaluate E(Yt4 ) and deduce that E(Z1 )a2 < 1 is a sufficient 1 condition for E(Yt4 ) < ∞.Y (x.32.34. . (i) Show that Yt2 = a0 one.35.Y (x. 2)-process.2. introduced in Example 1. and fY |X (y|x) := fY (y).5. (Unemployed1 Data) Plot the empirical autocorrelations and partial autocorrelations of the trend and seasonally adjusted Unemployed1 Data from the building trade. Derive the least squares normal equations for an AR(p)-process and compare them with the Yule–Walker equations. 2.Y of a random vector (X. . Let (Yt )t be a stationary and causal ARCH(1)-process with |a1 | < 1.

38. 2. Consider the two state-space models Xt+1 = At Xt + Bt εt+1 Y t = C t Xt + η t and ˜ ˜ ˜ ˜ ˜ Xt+1 = At Xt + Bt εt+1 ˜ ˜ ˜ ˜ Y t = C t Xt + η t . (Zurich Data) The daily value of the Zurich stock index was recorded between January 1st. (Car Data) Apply the Box–Jenkins program to the Car Data. What is the value of the partial autocorrelation coefficient at lag 1 and lag 2? Use PROC ARIMA to estimate the parameter of the AR(1)process (Yt2 )t and apply the Box–Ljung test. ηt )T is a white noise. .1 has the determinant 1 − a2 . Consider in particular the cases p = q = 2 and p = 3.3. 1981. YtT )T .42. 2.39. (Hong Kong Data) Fit an GARCH(p.40.37. 1988 and December 31st.3. to September 31. Plot (Yt )t as well as (Yt2 )t and its partial autocorrelation function.5.41. q)-model to the daily Hang Seng closing index of Hong Kong stock prices from July 16. 1983. (ii) Show that the matrix Pn in Example 2. their squares. T ˜t ˜T where (εT . q = 2. 2. 1988. Can the squared process be considered as an AR(1)-process? 2. 2. εT . (i) Show that the matrix Σ −1 in Example 2. ηt . Plot the (trend-adjusted) data.124 Chapter 2. Derive a state-space repret ˜ sentation for (YtT .3 has the determinant (1 + a2 + a4 + · · · + a2n )/(1 + a2 )n . Generate an ARCH(1)-process (Yt )t with a0 = 1 and a1 = 0. Models of Time Series 2. the pertaining partial autocorrelation function and parameter estimates. Use a difference filter of first order to remove a possible trend.

46. Yt−1 ) .26. .4 State-Space Models 2. Hint: Yt = ∆d Yt − j=1 (−1)j d Yt−j and consider the state d T vector Zt := (Xt . Find the state-space representation of an ARIMA(p. Show that there exists some ε > 0 such that (Ir − Az)−1 has the power series representation ∞ j j j=0 A z in the region |z| < 1 + ε. Hint: The condition on the eigenvalues is equivalent to det(Ir − Az) = 0 for |z| ≤ 1. 2. Can they be stationary? Compute the one-step predictors and plot them together with the actual data. . .45. 2.2. 2. where Xt ∈ Rp+q is the state vector of the ARMA(p. 125 . d. q)-process d (Yt )t . . q)-process ∆d Yt and Yt−1 := (Yt−d . (Gas Data) Apply PROC STATESPACE to the gas data. Show that the unique stationary solution of equation (2. Assume that the matrices A and B in the state-space model (2. Apply PROC STATESPACE to the simulated data of the AR(2)process in Exercise 2.19) is given by the infinite series ∞ j Xt = j=0 A Bεt−j+1 . Yt−1 )T .44.19) are independent of t and that all eigenvalues of A are in the interior of the unit circle {z ∈ C : |z| ≤ 1}.43.

126 Chapter 2. Models of Time Series .

. Another approach towards the modelling and analysis of time series is via the frequency domain: A series is often the sum of a whole variety of cyclic components. . Nevertheless it is reasonable to apply the results only to realizations of stationary processes. q)-process to stationary sequences of observations.Chapter The Frequency Domain Approach of a Time Series The preceding sections focussed on the analysis of a time series in the time domain. The analysis of a time series in the frequency domain aims at the detection of such cycles and the computation of their frequencies. which need for mathematical reasons not to be generated by a stationary process.18. . abbreviated by Hz. The period is the interval of time required for one cycle to complete. see Exercise 1. Such cyclic components can be described by their periods and frequencies. which is the number of cycles per second. yn . since the empirical autocovariance function occurring below has no interpretation for non-stationary processes. for example. frequencies are commonly measured in hertz. In the following we show that a time series can be completely decomposed into cyclic components. Note that in this chapter the results are formulated for any data y1 . from which we had already added to our model (1. mainly by modelling and fitting an ARMA(p.2) a long term cyclic one or a short term seasonal one. . in electronic media. 3 . The frequency of a cycle is its number of occurrences during a fixed time unit.

Popular examples of periodic functions are sine and cosine. (Star Data). An arbitrary (time) interval of length L consequently shows Lλ cycles of a periodic function f with fundamental frequency λ.07 − 1. which have period 1/λ and frequency λ.86 cos(2π(1/24)t) + 6. the intensity of light emitted by this pulsar was recorded at midnight during 600 consecutive nights. will be named a harmonic wave of length r. A smallest period is called a fundamental one. A linear combination of harmonic components r g(t) := µ + k=1 Ak cos(2πλk t) + Bk sin(2πλk t) .01 sin(2π(1/29)t). respectively. . The predominant family of periodic functions within time series analysis are the harmonic components m(t) := A cos(2πλt) + B sin(2πλt). which both have the fundamental period P = 2π. Example 3.07 fitted to these data.1. µ ∈ R.09 cos(2π(1/29)t) + 8. yt = 17. plus a constant term µ = 17. For easier access we begin with the case of known frequencies but unknown constants.. B ∈ R.128 The Frequency Domain Approach of a Time Series 3. The derivation of these particular frequencies and coefficients will be the content of this section and the following ones. It turns out that a harmonic wave of length two fits the data quite well.82 sin(2π(1/24)t) ˜ + 6. A.e. Their fundamental frequency. is λ = 1/(2π). i.1 Least Squares Approach with Known Frequencies A function f : R −→ R is said to be periodic with period P > 0 if f (t + P ) = f (t) for any t ∈ R. The reciprocal value λ = 1/P of a fundamental period is the fundamental frequency.1. λ > 0. The data are taken from Newton (1988). therefore. To analyze physical properties of a pulsating star. The following figure displays the first 160 data yt and the sum of two harmonic components with period 24 and 29.

sq 0. Model : MODEL1 Dependent Variable : LUMEN Analysis of Variance Sum of Squares 48400 146.square Adj R .24545 Source Model Error C Total Root MSE Dep Mean C.9970 Parameter Estimates Parameter Estimate Standard Error Variable DF t Value Prob > | T | .1 Least Squares Approach with Known Frequencies 129 Plot 3.89782 Mean Square 12100 0.2 Prob > F <. DF 4 595 599 F Value 49297.1a: Intensity of light emitted by a pulsating star and a fitted harmonic wave.3.V.49543 17.1.0001 R .9970 0.09667 2.04384 48546 0.

The last part of the program creates a plot of the observed lumen values and a curve of the predicted values restricted on the first 160 observations. .1. The other variables here are lumen read from an external file and t generated by N .57 <. 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Program 3.78 237.1.1b: Regression results of fitting a harmonic wave. QUIT . This is used to define the variables cos24. /* Compute a regression */ PROC REG DATA = data1 . txt ’. t = _N_ . cos24 = COS (2* pi * t /24) .0001 Listing 3.0001 <.47 212.4. /* Plot data and fitted harmonic wave */ PROC GPLOT DATA = regdat ( OBS =160) . TITLE2 ’ Star Data ’. sin29 = SIN (2* pi * t /29) . OUTPUT OUT = regdat P = predi .85779 8. It contains the original variables of the source data step and the values predicted by the regression for lumen in the variable predi.0001 <.85 279. /* Graphical options */ SYMBOL1 C = GREEN V = DOT I = NONE H =. SYMBOL2 C = RED V = NONE I = JOIN . AXIS1 LABEL =( ANGLE =90 ’ lumen ’) .81736 -1.06903 6. A temporary data file named regdat is generated by the OUTPUT statement. cos29 = COS (2* pi * t /29) .02868 0. /* Read in the data and compute harmonic waves to which data are to → be fitted */ DATA data1 . cos29 and sin29 for the harmonic components.0001 <. sas */ TITLE1 ’ Harmonic wave ’. sin24 = SIN (2* pi * t /24) . It is then stored in the variable pi.0001 <. INFILE ’c :\ data \ star . AXIS2 LABEL =( ’t ’) . PLOT lumen * t =1 predi * t =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 . MODEL lumen = sin24 cos24 sin29 cos29 .02865 0.130 Intercept sin24 cos24 sin29 cos29 1 1 1 1 1 The Frequency Domain Approach of a Time Series 17.01416 6.02867 0.02865 843.1: Fitting a harmonic wave. INPUT lumen @@ .81 -64.08905 0.02023 0. pi = CONSTANT ( ’ PI ’) . sin24. The PROC REG statement causes SAS to make a regression from the independent variable lumen defined on the left side of the MODEL statement on the harmonic components which are on the right side. The number π is generated by the SAS function CONSTANT with the argument ’PI’. RUN . 1 2 3 4 5 /* star_harmonic .

B ∈ R m(t) = Am1 (t) + Bm2 (t).1 (star harmonic. For further information on how to read the output. . (2002). n.1. we ¯ put with arbitrary A. . ¯ Ac21 + Bc22 = t=1 where n cij = t=1 mi (t)mj (t). we obtain that the minimizing pair A. B . B) := t=1 (yt − y − m(t))2 . To this end. B has to satisfy the normal equations n Ac11 + Bc12 = t=1 n (yt − y ) cos(2πλt) ¯ (yt − y ) sin(2πλt). ¯ Taking partial derivatives of the function R with respect to A and B and equating them to zero. .3. where m1 (t) := cos(2πλt). If c11 c22 − c12 c21 = 0.sas) is the standard text output of a regression with an ANOVA table and parameter estimates. m2 (t) = sin(2πλt). it is reasonable to choose the constants A and B as minimizers of the residual sum of squares n R(A. t = 1. . the uniquely determined pair of solutions A. we refer to Chapter 3 of Falk et al. In a first step we will fit a harmonic component with fixed frequency λ to mean value adjusted data yt − y .1 Least Squares Approach with Known Frequencies 131 The output of Program 3. In order to get a proper fit uniformly over all t.

. yn . [n/2].132 The Frequency Domain Approach of a Time Series of these equations is c22 C(λ) − c12 S(λ) c11 c22 − c12 c21 c21 C(λ) − c11 S(λ) B = B(λ) = n . The solutions A and B become particularly simple in the case of Fourier frequencies λ = k/n. 1.1. If k = 0 and k = n/2 in the case of an even sample size n.2.2) below that c12 = c21 = 0 and c11 = c22 = n/2 and thus A = 2C(λ). Lemma 3. ¯ (yt − y ) sin(2πλt) ¯ (3. . Harmonic Waves with Fourier Frequencies Next we will fit harmonic waves to data y1 . then we obtain from (3. For arbitrary 0 ≤ k. B = 2S(λ). . respectively. where we restrict ourselves to Fourier frequencies. where [x] denotes the greatest integer less than or equal to x ∈ R. which facilitates the computations. m ≤ [n/2] we have . The following lemma will be crucial. k = 0. .1) are the empirical (cross-) covariances of (yt )1≤t≤n and (cos(2πλt))1≤t≤n and of (yt )1≤t≤n and (sin(2πλt))1≤t≤n . . . As we will see. these cross-covariances C(λ) and S(λ) are fundamental to the analysis of a time series in the frequency domain. 2. . c12 c21 − c11 c22 A = A(λ) = n where 1 C(λ) := n S(λ) := 1 n n t=1 n t=1 (yt − y ) cos(2πλt). .

. . t=1 n t=1 k = m = 0 or n/2. The above lemma implies that the 2[n/2] + 1 vectors in Rn (sin(2π(k/n)t))1≤t≤n . if n is even k = m = 0 and = n/2.3. Note that by Lemma 3. . there exist in any case uniquely determined coefficients Ak and Bk . . t=1  0. [n/2]. . As a consequence we obtain that for a given set of data y1 . . . .2 in the case of n odd the above 2[n/2] + 1 = n vectors are linearly independent. . [n/2]. span the space Rn . . . k = 0. n  k m sin 2π t sin 2π t = n/2. yt = k=0 k k Ak cos 2π t + Bk sin 2π t n n . if n is even k=m k = m = 0 or n/2. . k = 0. yn . (3. . Exercise 3. . n n Proof. . . with B0 := 0 such that [n/2] k = 1. . . . t = 1. if n is even k = m = 0 and = n/2. They are obviously minimizers of the residual sum of squares n [n/2] R := t=1 yt − k=0 k k αk cos 2π t + βk sin 2π t n n 2 .  n n 0. n. .3. . and (cos(2π(k/n)t))1≤t≤n . .1.2) Next we determine these coefficients Ak . whereas in the case of an even sample size n the vector (sin(2π(k/n)t))1≤t≤n with k = n/2 is the null vector (0. . [n/2]. 0) and the remaining n vectors are again linearly independent. n  m k cos 2π t cos 2π t = n/2.  n n 0.1 Least Squares Approach with Known Frequencies 133  n. Bk . if n is even k=m k m cos 2π t sin 2π t = 0. .

3) for k = 1. Bk in R to obtain directly that R = 0 in this case. k = 1.1). [(n − 1)/2]. k = 0 and k = n/2. equating them to zero and taking into account the linear independence of the above vectors. n k = 0. . . . y A0 = 2A0 = n t=1 (3. This follows from the equations (Exercise 3. . the coefficients Ak . [(n − 1)/2]. [(n − 1)/2].3) is [n/2] ˜ A0 k k yt = + Ak cos 2π t + Bk sin 2π t 2 n n k=1 . k = 1. . n k sin2 2π t . . n k = 1. n k = 1.5) .2) n t=1 k cos 2π t = n n t=1 k sin 2π t = 0. . with Ak . . . [n/2]. . we obtain that these minimizers solve the normal equations n t=1 n t=1 k yt cos 2π t = αk n k yt sin 2π t = βk n n t=1 n t=1 k cos2 2π t . The solutions of this system are by Lemma 3. . . . . Taking partial derivatives. . .134 The Frequency Domain Approach of a Time Series with respect to αk . . t = 1. . defined in (3. . Up to the factor 2.3) One can substitute Ak . Bk coincide with the empirical covariances C(k/n) and S(k/n).4) = 0 for an even n. [n/2]. . . (3. if n is even t=1 k yt sin 2π t . [(n − 1)/2] Ak = 2 Bk = n 1 k 2π n t . n. . [n/2] k = 1. βk . . (3. A popular equivalent formulation of (3. . . .2 given by  2 n n n n t=1 yt cos n t=1 yt cos k 2π n t . Bk as in (3. . . . Bn/2 and n 2 ˜ yt = 2¯. . . .1.

yn . The question which frequencies actually govern a time series leads to the intensity of a frequency λ. . Example 3. . . . For general frequencies λ ∈ R we define its intensity now by I(λ) = n C(λ)2 + S(λ)2 1 = n n t=1 (yt − y ) cos(2πλt) ¯ 2 n + t=1 (yt − y ) sin(2πλt) ¯ 2 .2 The Periodogram In the preceding section we exactly fitted a harmonic wave with Fourier frequencies λk = k/n. The following Theorem implies in particular that it is sufficient to define the periodogram on the . . k = 0. 1 ≤ k ≤ [n/2]. [(n − 1)/2]. There is a general tendency that a time series is governed by different frequencies λ1 .1.1.1 shows that a harmonic wave including only two frequencies already fits the Star Data quite well. (3.2 The Periodogram 135 3. . k = 1. .3. which on their part can be approximated by Fourier frequencies k1 /n. . . . kr /n if n is sufficiently large. It is called ¯ t=1 the intensity of the frequency k/n. to a given series y1 . The intensity of the Fourier frequency λ = k/n. k k=1 2 2 The number (n/2)(A2 +Bk ) = 2n(C (k/n)+S 2 (k/n)) is therefore the k contribution of the harmonic component with Fourier frequency k/n. . This number reflects the influence of the harmonic component with frequency λ on the series.2.5) and the normal equations n t=1 k k yt − y − Ak cos 2π t − Bk sin 2π t ¯ n n n 2 = t=1 (yt − y )2 − ¯ n n 2 2 A k + Bk . . . . . . . . [(n − 1)/2]. [n/2]. (3. is defined via its residual sum of squares. . . Further insight is gained from the Fourier analysis in Theorem 3.2. to the total variation n (yt − y )2 . . . .4. 2 [n/2] k = 1. λr with r < [n/2]. and n (yt − y ) = ¯ 2 t=1 2 2 A 2 + Bk .6) This function is called the periodogram. . We have by Lemma 3.

for example. [(n − 1)/2]. For Fourier frequencies we obtain from (3. 1] by putting I ∗ (ω) := 2I(ω/(2π)). This indicates that essentially two cycles with period 24 and 28 or 29 are inherent in the data. Proof.035 ≈ 1/28. The following figure displays the periodogram of the Star Data from Example 3. x ∈ R. .04167.1. 0. A least squares approach for the determination of the coefficients Ai . i. 2 with frequencies 1/24 and 1/29 as done in Program 3. I(0) = 0.1 implies that the function I(λ) is completely determined by its values on [0.4 below we prefer I(λ).1 (star harmonic.1. It has two obvious peaks at the Fourier frequencies 21/600 = 0. i = 1.1.3) and (3. 2π] instead of [0. We have 1. Part (i) follows from sin(0) = 0 and cos(0) = 1. 3. .57 and 25/600 = 1/24 ≈ 0. Theorem 3. . Theorem 3.5) I(k/n) = n 2 2 A k + Bk . .. 2. 1]. sin(−x) = − sin(x).2.2. In view of Theorem 3.1. used in SAS. I(λ) = I(−λ) for any λ ∈ R. while (ii) is a consequence of cos(−x) = cos(x). The periodogram is often defined on the scale [0. this version is. I has the period 1. 4 k = 1. . Bi .2.5]. I is an even function.136 The Frequency Domain Approach of a Time Series interval [0.1.e. however.sas) then leads to the coefficients in Example 3.1. Part (iii) follows from the fact that 2π is the fundamental period of sin and cos.

0000 27.25 354.76062 2.19 8972.1933 -------------------------------------------------------------PERIOD COS_01 SIN_01 P LAMBDA 28.035000 0. TITLE2 ’ Star Data ’. INFILE ’c :\ data \ star .05822 8. .91071 -0.54977 7.1a: Periodogram of the Star Data.2 The Periodogram 137 Plot 3. txt ’.038333 ˜ Listing 3.2727 31.09324 -1.033333 0.0870 -0.20493 -0.71 212.031667 0. sas */ TITLE1 ’ Periodogram ’.2. /* Read in the data */ DATA data1 .16333 0.036667 0.2.5714 24.1b: The constant A0 = 2A0 = 2¯ and the six Fourier y frequencies λ = k/n with largest I(k/n)-values. INPUT lumen @@ .18946 11089.0000 30.22 661. 1 2 3 4 5 6 7 8 /* star_periodogram .041667 0.42338 -0.06291 0.73 0.52404 1.71 2148. their inverses and the Fourier coefficients pertaining to the Star Data.73396 -3.3.5789 26. -------------------------------------------------------------COS_01 34.

Using the SAS procedure SPECTRA with the options P (periodogram). The procedure SORT generates a new data set data4 containing the same observations as the input data set data3. PROC PRINT DATA = data4 ( OBS =6) NOOBS . RUN . p = P_01 /2. This means a restriction of lambda up to 50/600 = 1/12. PLOT p * lambda =1 / VAXIS = AXIS1 HAXIS = AXIS2 . The two PROC PRINT statements at the end make SAS print to datasets data2 and data4. Because SAS uses different definitions for the frequencies and the periodogram. a FREQ variable for this frequencies. but they are sorted in descending order of the periodogram values. . AXIS2 LABEL =( F = CGREEK ’l ’) .1: Computing and plotting the periodogram. VAR COS_01 . /* Plot the periodogram */ PROC GPLOT DATA = data3 ( OBS =50) . lambda = FREQ /(2* CONSTANT ( ’ PI ’) ) . /* Print largest periodogram values */ PROC PRINT DATA = data2 ( OBS =1) NOOBS . It contains periodogram data P 01 evaluated at the Fourier frequencies. The following PROC GPLOT just takes the first 50 observations of data3 into account. The first step is to read the star data from an external file into a data set. /* Sort by periodogram values */ PROC SORT DATA = data3 OUT = data4 . here in data3 new variables lambda (dividing FREQ by 2π to eliminate an additional factor 2π) and p (dividing P 01 by 2) are created and the no more needed ones are dropped. the part with the greatest peaks in the periodogram. VAR lumen . COEF (Fourier coefficients). QUIT . an output data set is generated. DROP P_01 FREQ . /* Graphical options */ SYMBOL1 V = NONE C = GREEN I = JOIN . AXIS1 LABEL =( ’ I ( ’ F = CGREEK ’l ) ’) .2. /* Adjusting different periodogram definitions */ DATA data3 . BY DESCENDING p . the pertaining period in the variable PERIOD and the variables COS 01 and SIN 01 with the coefficients for the harmonic waves. SET data2 ( FIRSTOBS =2) . By means of the data set option FIRSTOBS=2 the first observation of data2 is excluded from the resulting data set. OUT=data2 and the VAR statement specifying the variable of interest.138 The Frequency Domain Approach of a Time Series 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 /* Compute the periodogram */ PROC SPECTRA DATA = data1 COEF P OUT = data2 . Program 3.

n.3. since both values can be recovered from the complex number D(λ).2. being its real and negative imaginary part. . . The results for the Fourier frequencies with the six greatest periodogram values constitute the second part of the output. because SAS lets the index run from 0 to n − 1 instead of 1 to n. t = 1. ¯ D(λ) := C(λ) − iS(λ) = n t=1 The periodogram is a function of D(λ). Theorem 3. fa (0) = t∈Z at . .2. the number D(λ) contains the complete information about C(λ) and S(λ). 0. λ ∈ R. t ∈ Z.2. since I(λ) = n|D(λ)|2 . .3). since |eix | = 1 for any x ∈ R.1. Note that −i2πλt | = t∈Z |at | < ∞. For such a sequence a the complex valued function fa (λ) = at e−i2πλt . We have 1. . t∈Z is said to be its Fourier transform. It links the empirical autocovariance function to the periodogram as it will turn out in Theorem 3. Let a := (at )t∈Z be an absolutely summable sequence of real numbers.2 The Periodogram ˜ The first part of the output is the coefficient A0 which is equal to two times the mean of the lumen data. . Note that the meaning of COS 01 and SIN 01 are slightly different from the definitions of Ak and Bk in (3. The t∈Z |at e Fourier transform of at = (yt − y )/n. In the following we view the data y1 . z ∈ R. The following elementary properties of the Fourier transform are immediate consequences of the arguments in the proof of Theorem 3. In particular we obtain that the Fourier transform is already determined by its values on [0. yn again as a clipping from an infinite series yt . .3 that the latter is the Fourier transform of the first.5]. Unlike the periodogram. and at = 0 else¯ where is then given by D(λ). . . 139 The Fourier Transform From Euler’s equation eiz = cos(z) + i sin(z). we obtain for λ∈R n 1 (yt − y )e−i2πλt .2.

140

The Frequency Domain Approach of a Time Series 2. fa (−λ) and fa (λ) are conjugate complex numbers i.e., fa (−λ) = fa (λ), 3. fa has the period 1.

Autocorrelation Function and Periodogram
Information about cycles that are inherent in given data, can also be deduced from the empirical autocorrelation function. The following figure displays the autocorrelation function of the Bankruptcy Data, introduced in Exercise 1.17.

Plot 3.2.2a: Autocorrelation function of the Bankruptcy Data.
1 2 3 4 5 6 7

/* b a n k r u p t c y _ c o r r e l o g r a m . sas */ TITLE1 ’ Correlogram ’; TITLE2 ’ Bankruptcy Data ’; /* Read in the data */ DATA data1 ; INFILE ’c :\ data \ bankrupt . txt ’;

3.2 The Periodogram
INPUT year bankrupt ; /* Compute autocorrelation function */ PROC ARIMA DATA = data1 ; IDENTIFY VAR = bankrupt NLAG =64 OUTCOV = corr NOPRINT ; /* Graphical options */ AXIS1 LABEL =( ’ r ( k ) ’) ; AXIS2 LABEL =( ’k ’) ; SYMBOL1 V = DOT C = GREEN I = JOIN H =0.4 W =1; /* Plot auto correlation function */ PROC GPLOT DATA = corr ; PLOT CORR * LAG / VAXIS = AXIS1 HAXIS = AXIS2 VREF =0; RUN ; QUIT ;

141

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Program 3.2.2: Computing the autocorrelation function.

After reading the data from an external file into a data step, the procedure ARIMA calculates the empirical autocorrelation function and stores

them into a new data set. The correlogram is generated using PROC GPLOT.

The next figure displays the periodogram of the Bankruptcy Data.

142

The Frequency Domain Approach of a Time Series

Plot 3.2.3a: Periodogram of the Bankruptcy Data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

/* b a n k r u p t c y _ p e r i o d o g r a m . sas */ TITLE1 ’ Periodogram ’; TITLE2 ’ Bankruptcy Data ’; /* Read in the data */ DATA data1 ; INFILE ’c :\ data \ bankrupt . txt ’; INPUT year bankrupt ; /* Compute the periodogram */ PROC SPECTRA DATA = data1 P OUT = data2 ; VAR bankrupt ; /* Adjusting different periodogram definitions */ DATA data3 ; SET data2 ( FIRSTOBS =2) ; p = P_01 /2; lambda = FREQ /(2* CONSTANT ( ’ PI ’) ) ; /* Graphical options */ SYMBOL1 V = NONE C = GREEN I = JOIN ; AXIS1 LABEL =( ’I ’ F = CGREEK ’( l ) ’) ; AXIS2 ORDER =(0 TO 0.5 BY 0.05) LABEL =( F = CGREEK ’l ’) ; /* Plot the periodogram */ PROC GPLOT DATA = data3 ; PLOT p * lambda / VAXIS = AXIS1 HAXIS = AXIS2 ;

3.2 The Periodogram
RUN ; QUIT ;

143

28

Program 3.2.3: Computing the periodogram.
This program again first reads the data and then starts a spectral analysis by PROC SPECTRA. Due to the reasons mentioned in the comments to Program 3.2.1 (star periodogram.sas) there are some transformations of the periodogram and the frequency values generated by PROC SPECTRA done in data3. The graph results from the statements in PROC GPLOT.

The autocorrelation function of the Bankruptcy Data has extreme values at about multiples of 9 years, which indicates a period of length 9. This is underlined by the periodogram in Plot 3.2.3a, which has a peak at λ = 0.11, corresponding to a period of 1/0.11 ∼ 9 years as well. As mentioned above, there is actually a close relationship between the empirical autocovariance function and the periodogram. The corresponding result for the theoretical autocovariances is given in Chapter 4. Theorem 3.2.3. Denote by c the empirical autocovariance function n−k ¯ ¯ of y1 , . . . , yn , i.e., c(k) = n−1 j=1 (yj − y )(yj+k − y ), k = 0, . . . , n−1, n −1 where y := n ¯ j=1 yj . Then we have with c(−k) := c(k)
n−1

I(λ) = c(0) + 2
k=1 n−1

c(k) cos(2πλk) c(k)e−i2πλk .

=
k=−(n−1)

Proof. From the equation cos(x1 ) cos(x2 ) + sin(x1 ) sin(x2 ) = cos(x1 − x2 ) for x1 , x2 ∈ R we obtain 1 I(λ) = n 1 = n
n n

s=1 t=1 n n

(ys − y )(yt − y ) ¯ ¯

× cos(2πλs) cos(2πλt) + sin(2πλs) sin(2πλt) ast ,
s=1 t=1

. The complex representation of the periodogram is then obvious: n−1 n−1 c(k)e k=−(n−1) −i2πλk = c(0) + k=1 n−1 c(k) ei2πλk + e−i2πλk = c(0) + k=1 c(k)2 cos(2πλk) = I(λ). |k| ≤ n − 1. λ ∈ R. Since the periodogram is the Fourier transform of the empirical autocovariance function. which is the content of our next result. Inverse Fourier Transform The empirical autocovariance function can be recovered from the periodogram.4.144 The Frequency Domain Approach of a Time Series where ast := (ys − y )(yt − y ) cos(2πλ(s − t)). Since ast = ats and ¯ ¯ cos(0) = 1 we have moreover 1 I(λ) = n 1 = n n t=1 n t=1 2 att + n n−1 n−k ajj+k k=1 j=1 n−1 2 (yt − y ) + 2 ¯ n−1 k=1 1 n n−k j=1 (yj − y )(yj+k − y ) cos(2πλk) ¯ ¯ = c(0) + 2 k=1 c(k) cos(2πλk).2. The periodogram n−1 I(λ) = k=−(n−1) c(k)e−i2πλk . this result is a special case of the inverse Fourier transform in Theorem 3.5 below. satisfies the inverse formula 1 c(k) = 0 I(λ)ei2πλk dλ.2. Theorem 3.

3.2 The Periodogram In particular for k = 0 we obtain
1

145

c(0) =
0

I(λ) dλ.
n j=1 (yj

The sample variance c(0) = n−1

area under the curve I(λ), 0 ≤ λ ≤ 1. The integral λ12 I(λ) dλ can be interpreted as that portion of the total variance c(0), which is contributed by the harmonic waves with frequencies λ ∈ [λ1 , λ2 ], where 0 ≤ λ1 < λ2 ≤ 1. The periodogram consequently shows the distribution of the total variance among the frequencies λ ∈ [0, 1]. A peak of the periodogram at a frequency λ0 implies, therefore, that a large part of the total variation c(0) of the data can be explained by the harmonic wave with that frequency λ0 . The following result is the inverse formula for general Fourier transforms. Theorem 3.2.5. For an absolutely summable sequence a := (at )t∈Z with Fourier transform fa (λ) = t∈Z at e−i2πλt , λ ∈ R, we have
1

− y )2 equals, therefore, the ¯
λ

at =
0

fa (λ)ei2πλt dλ,

t ∈ Z.

Proof. The dominated convergence theorem implies
1

fa (λ)e
0

i2πλt

1

dλ =
0

as e−i2πλs ei2πλt dλ
s∈Z 1

=
s∈Z

as
0

ei2πλ(t−s) dλ = at ,

since
1 0

ei2πλ(t−s) dλ =

1, if s = t 0, if s = t.

The inverse Fourier transformation shows that the complete sequence (at )t∈Z can be recovered from its Fourier transform. This implies

146

The Frequency Domain Approach of a Time Series in particular that the Fourier transforms of absolutely summable sequences are uniquely determined. The analysis of a time series in the frequency domain is, therefore, equivalent to its analysis in the time domain, which is based on an evaluation of its autocovariance function.

Aliasing
Suppose that we observe a continuous time process (Zt )t∈R only through its values at k∆, k ∈ Z, where ∆ > 0 is the sampling interval, i.e., we actually observe (Yk )k∈Z = (Zk∆ )k∈Z . Take, for example, Zt := cos(2π(9/11)t), t ∈ R. The following figure shows that at k ∈ Z, i.e., ∆ = 1, the observations Zk coincide with Xk , where Xt := cos(2π(2/11)t), t ∈ R. With the sampling interval ∆ = 1, the observations Zk with high frequency 9/11 can, therefore, not be distinguished from the Xk , which have low frequency 2/11.

Plot 3.2.4a: Aliasing of cos(2π(9/11)k) and cos(2π(2/11)k).

3.2 The Periodogram
/* aliasing . sas */ TITLE1 ’ Aliasing ’; /* Generate harmonic waves */ DATA data1 ; DO t =1 TO 14 BY .01; y1 = COS (2* CONSTANT ( ’ PI ’) *2/11* t ) ; y2 = COS (2* CONSTANT ( ’ PI ’) *9/11* t ) ; OUTPUT ; END ; /* Generate points of intersection */ DATA data2 ; DO t0 =1 TO 14; y0 = COS (2* CONSTANT ( ’ PI ’) *2/11* t0 ) ; OUTPUT ; END ; /* Merge the data sets */ DATA data3 ; MERGE data1 data2 ; /* Graphical options */ SYMBOL1 V = DOT C = GREEN I = NONE H =.8; SYMBOL2 V = NONE C = RED I = JOIN ; AXIS1 LABEL = NONE ; AXIS2 LABEL =( ’t ’) ; /* Plot the curves with point of intersection */ PROC GPLOT DATA = data3 ; PLOT y0 * t0 =1 y1 * t =2 y2 * t =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 VREF →=0; RUN ; QUIT ;

147

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32

Program 3.2.4: Aliasing.
In the first data step a tight grid for the cosine waves with frequencies 2/11 and 9/11 is generated. In the second data step the values of the cosine wave with frequency 2/11 are generated just for integer values of t symbolizing the observation points. After merging the two data sets the two waves are plotted using the JOIN option in the SYMBOL statement while the values at the observation points are displayed in the same graph by dot symbols.

This phenomenon that a high frequency component takes on the values of a lower one, is called aliasing. It is caused by the choice of the sampling interval ∆, which is 1 in Plot 3.2.4a. If a series is not just a constant, then the shortest observable period is 2∆. The highest observable frequency λ∗ with period 1/λ∗ , therefore, satisfies 1/λ∗ ≥ 2∆,

148

The Frequency Domain Approach of a Time Series i.e., λ∗ ≤ 1/(2∆). This frequency 1/(2∆) is known as the Nyquist frequency. The sampling interval ∆ should, therefore, be chosen small enough, so that 1/(2∆) is above the frequencies under study. If, for example, a process is a harmonic wave with frequencies λ1 , . . . , λp , then ∆ should be chosen such that λi ≤ 1/(2∆), 1 ≤ i ≤ p, in order to visualize p different frequencies. For the periodic curves in Plot 3.2.4a this means to choose 9/11 ≤ 1/(2∆) or ∆ ≤ 11/18.

Exercises
3.1. Let y(t) = A cos(2πλt) + B sin(2πλt) be a harmonic component. Show that y can be written as y(t) = α cos(2πλt − ϕ), where α is the amplitiude, i.e., the maximum departure of the wave from zero and ϕ is the phase displacement. 3.2. Show that
n

cos(2πλt) =
t=1 n

n, λ∈Z cos(πλ(n + 1)) sin(πλn) , λ ∈ Z sin(πλ) 0, λ∈Z sin(πλ(n + 1)) sin(πλn) , λ ∈ Z. sin(πλ)

sin(2πλt) =
t=1

Hint: Compute n ei2πλt , where eiϕ = cos(ϕ) + i sin(ϕ) is the comt=1 plex valued exponential function. 3.3. Verify Lemma 3.1.2. Hint: Exercise 3.2. 3.4. Suppose that the time series (yt )t satisfies the additive model with seasonal component
s

s(t) =
k=1

k Ak cos 2π t + s

s

k=1

k Bk sin 2π t . s

Show that s(t) is eliminated by the seasonal differencing ∆s yt = yt − yt−s . 3.5. Fit a harmonic component with frequency λ to a time series y1 , . . . , yN , where λ ∈ Z and λ − 0.5 ∈ Z. Compute the least squares estimator and the pertaining residual sum of squares.

1. 3.10.9. 3. − 2 sin(θ/2) 4 sin2 (θ/2) t cos(θt) = t=1 3. Add a seasonal adjustment and compare the periodograms. k ≥ 1. k = 1.5)θ) sin(nθ) − 2 2 sin(θ/2) 4 sin (θ/2) n sin((n − 0. . . .3.11. . [(n − 1)/2]. . (Airline Data) Plot the periodogram of the variance stabilized and trend adjusted Airline Data. introduced in Example 1. .6.8. . and the intermediate values 1/2k. Let a = (at )t∈Z and b = (bt )t∈Z be absolute summable sequences. Establish a version of the inverse Fourier transform in real terms. λ ∈ [0. The contribution of the autocovariance c(k). . 3. 5/(2k)? Illustrate the effect at a time series with seasonal component k = 12.3.1. . n.7. . . 1/k. 3. to the periodogram can be illustrated by plotting the functions ± cos(2πλk). 3/(2k).2 The Periodogram 3. Show that n I(k/n) = . Put yt = t.5)θ) 1 − cos(nθ) . .1. (i) Which conclusion about the intensities of large or small frequencies can be drawn from a positive value c(1) > 0 or a negative one c(1) < 0? (ii) Which effect has an increase of |c(2)| if all other parameters remain unaltered? (iii) What can you say about the effect of an increase of c(k) on the periodogram at the values 40. 3/k. 2/k. t = 1. (Unemployed1 Data) Plot the periodogram of the first order differences of the numbers of unemployed in the building trade as introduced in Example 1.5]. 2 4 sin (πk/n) Hint: Use the equations n−1 149 t sin(θt) = t=1 n−1 n cos((n − 0.

Put f (s/N ) = N −1 t=0 at e−i2πst/N . N − 1. s ∈ {0. K − 1}. .. Is an aliasing effect observable? . . K − 1}. 3. . M ∈ N. . . . s1 ∈ {0.13. . a1 . which is the Fourier transform of length N . . . . (ii) For ab := (at bt )t∈Z we have 1 fab (λ) = fa ∗ fb (λ) := (iii) Show that for a ∗ b := ( 0 fa (µ)fb (λ − µ) dµ. 3. α. Hint: Each t. keep only every seventh observation). M − 1} s = s0 + s1 M. . . . Sum over t0 and t1 . . t0 ∈ {0. . . Show that f can be represented as Fourier transform of length K. . N − 1} can uniquely be written as t = t0 + t1 K. t1 ∈ {0. (Fast Fourier Transform (FFT)) The Fourier transform of a finite sequence a0 . aN −1 can be represented under suitable conditions as the composition of Fourier transforms. (Star Data) Suppose that the Star Data are only observed weekly (i. Suppose that N = KM with K. s0 ∈ {0. (convolution) s∈Z as bt−s )t∈Z fa∗b (λ) = fa (λ)fb (λ). fαa+βb (λ) = αfa (λ) + βfb (λ). . . s = 0. .150 The Frequency Domain Approach of a Time Series (i) Show that for αa + βb := (αat + βbt )t∈Z . . . . . β ∈ R. computed for a Fourier transform of length M . . M − 1}. . . .12.e.

Its empirical counterpart. . γ(t)e−i2πλt = γ(0) + 2 t∈N γ(t) cos(2πλt). was investigated in the preceding sections.3 we will in particular compute the spectrum of an ARMA-process. which shows that the spectrum is a decomposition of the variance γ(0).2.3. t ∈ Z. Let (Yt )t∈Z be a (real valued) stationary process with absolutely summable autocovariance function γ(t).Chapter The Spectrum of a Stationary Process In this chapter we investigate the spectrum of a real valued stationary process. is called spectral density or spectrum of the process (Yt )t∈Z . By the inverse Fourier transform in Theorem 3. For t = 0 we obtain 1 γ(0) = 0 f (λ) dλ. the periodogram. Theorem 3. which is the Fourier transform of its (theoretical) autocovariance function. In Section 4. As a preparatory step we investigate properties of spectra for arbitrary absolutely summable filters. cf.5 we have 1 γ(t) = 0 f (λ)e i2πλt 1 dλ = 0 f (λ) cos(2πλt) dλ.2. Its Fourier transform f (λ) := t∈Z 4 λ ∈ R.

Proof. . whose finite dimensional distributions coincide with the given family. . Theorem 4. which satisfies the consistency condition of Kolmogorov’s theorem. see Exercise 4. xn ∈ R. It is easy to see that (4.. Define the n × n-matrix K (n) := K(r − s) 1≤r.1) h ∈ Z. with the properties γ(0) ≥ 0.1 Characterizations of Autocovariance Functions Recall that the autocovariance function γ : Z → R of a stationary process (Yt )t∈Z is given by γ(h) = E(Yt+h Yt ) − E(Yt+h ) E(Yt ).e. .1.2) is a necessary condition for K to be the autocovariance function of a stationary process. i.s≤n . . We will define a family of finite-dimensional normal distributions. |γ(h)| ≤ γ(0). i.s≤n for arbitrary n ∈ N and x1 . γ(h) = γ(−h). .2) 1≤r. which is positive semidefinite. Vn ) with mean vector .1 in Brockwell and Davies (1991). cf. Theorem 1. we will construct a stationary process.2) is sufficient. .19. Consequently there exists an n-dimensional normal distributed random vector (V1 ..2. A symmetric function K : Z → R is the autocovariance function of a stationary process (Yt )t∈Z iff K is a positive semidefinite function. .1. h ∈ Z. The following result characterizes an autocovariance function in terms of positive semidefiniteness.152 The Spectrum of a Stationary Process 4. This result implies the existence of a process (Vt )t∈Z .e. whose autocovariance function is K. K(−n) = K(n) and xr K(r − s)xs ≥ 0 (4. It remains to show that (4. (4. .

.tm does not depend on the special choice of t and n and thus. . F (λ) = F (λ) − F (0) = . This process has... vn ) := P {V1 ≤ v1 .1] g(λ) dF (λ).2. h ∈ Z. thus. whose finite dimensional distribution at t1 < · · · < tm has distribution function Ft1 . Define now for each n ∈ N and t ∈ Z a distribution function on Rn by Ft+1.. 1] with F (0) = 0. increasing and bounded function. where 1 ≤ n1 < · · · < nm ≤ n.4.t+n (v1 .3) where F is a real valued measure generating function on [0.. .. therefore.. The function F is uniquely determined. 1 ≤ i ≤ m}. 153 Spectral Distribution Function and Spectral Density The preceding result provides a characterization of an autocovariance function in terms of positive semidefiniteness. ..1.tm .. .. .. The following characterization of positive semidefinite functions is known as Herglotz’s 1 theorem. We use in the following the notation 0 g(λ) dF (λ) in place of (0. Theorem 4. .. (4. If F has a derivative f and. we have defined a family of distribution functions indexed by t1 < · · · < tm on Rm for each m ∈ N. Note that Ft1 . Let now t1 < · · · < tm be arbitrary integers. The uniquely determined function F ...tm ((vi )1≤i≤m ) := P {Vni ≤ vi .. which obviously satisfies the consistency condition of Kolmogorov’s theorem. A symmetric function γ : Z → R is positive semidefinite iff it can be represented as an integral 1 γ(h) = 0 e i2πλh 1 dF (λ) = 0 cos(2πλh) dF (λ).1 Characterizations of Autocovariance Functions zero and covariance matrix K (n) . We define now Ft1 . which is a right-continuous. Vn ≤ vn }. This result implies the existence of a process (Vt )t∈Z . This defines a family of distribution functions indexed by consecutive integers. .. Choose t ∈ Z and n ∈ N such that ti = t + ni . mean vector zero and covariances E(Vt+h Vt ) = K(h). is called the spectral distribution function of γ. h ∈ Z. .

Proof of Theorem 4. cf. Theorem 3. this in turn together with F (0) = G(0) = 0 implies F = G. such h=−N ah e that 0≤λ≤1 sup |ψ(λ) − pε (λ)| ≤ ε. 1 1 ψ(λ) dF (λ) = 0 0 ψ(λ) dG(λ). Since ψ was an arbitrary continuous function. Section 4.24 in Rudin (1974)) that we can find for arbitrary ε > 0 i2πλh a trigonometric polynomial pε (λ) = N . thus. and. 1].5. i = 1. 0 ≤ λ ≤ 1. . From calculus we know (cf.154 λ 0 f (x) dx The Spectrum of a Stationary Process for 0 ≤ λ ≤ 1. 2. Let G be another measure generating function with G(λ) = 0 for λ ≤ 0 and constant for λ ≥ 1 such that 1 γ(h) = 0 e i2πλh 1 dF (λ) = 0 ei2πλh dG(λ).2.2. As a consequence we obtain that 1 1 ψ(λ) dF (λ) = 0 0 1 pε (λ) dF (λ) + r1 (ε) pε (λ) dG(λ) + r1 (ε) 0 1 = = 0 ψ(λ) dG(λ) + r2 (ε). then f is called the spectral density of γ. Note that the property h≥0 |γ(h)| < ∞ already implies the existence of a spectral density of γ.1. where ri (ε) → 0 as ε → 0. Let now ψ be a continuous function on [0. the autocorrelation function ρ(h) = γ(h)/γ(0) has the above integral representation. We establish first the uniqueness of F . 1 Recall that γ(0) = 0 dF (λ) = F (1) and thus. h ∈ Z. but with F replaced by the distribution function F/γ(0).

.3). . fN (x) dx.e. 1 0 Then we have for each h ∈ Z 1 e 0 i2πλh dFN (λ) = |m|<N |m| 1− γ(m) N |h| N ei2πλ(h−m) dλ = 0..1 Characterizations of Autocovariance Functions Suppose now that γ has the representation (4. 2 0 1≤r. 1 0 g(λ) dFNk (λ) −→ k→∞ 0 1 ˜ g(λ) dF (λ) for every continuous and bounded function g : [0.1 in Billingsley (1968)).s≤N e−i2πλr γ(r − s)ei2πλs |m|<N λ (N − |m|)γ(m)e−i2πλm ≥ 0.s≤n n 1 0 r=1 i. Billingsley (1968). 1] → R (cf. Then F is . page 226ff) to deduce the ex˜ istence of a measure generating function F and a subsequence (FNk )k ˜ such that FNk converges weakly to F i. This implies that for 0 ≤ λ ≤ 1 and N ∈ N (Exercise 4.4) Since FN (1) = γ(0) < ∞ for any N ∈ N. n 1 1≤r. We have for arbitrary xi ∈ R. γ is positive semidefinite. i = 1. we can apply Helly’s selection theorem (cf.2) fN (λ) : = = Put now FN (λ) := 0 1 N 1 N 1≤r. if |h| < N if |h| ≥ N . Suppose conversely that γ : Z → R is a positive semidefinite function.s≤n 155 xr γ(r − s)xs = = xr xs ei2πλ(r−s) dF (λ) xr ei2πλr dF (λ) ≥ 0. 0 ≤ λ ≤ 1.e. . 1− γ(h). . Theorem ˜ ˜ 2. (4.. Put now F (λ) := F (λ) − F (0).4.

h = 0 0. The assertion is then a consequence of Theorem 4.4) by Nk and let k tend to infinity. 1]. If we replace N in (4.1. where F is a measure generating function on [0. 1] with F (0) = 0.3. The function f is in this case the spectral density of γ. h ∈ Z. σ2.1. 1 σ e 2 i2πλh dλ = the process (εt ) has by Theorem 4. This is the name giving property of the white noise process: As the white light is characteristically perceived to belong to objects that reflect nearly all incident energy throughout the visible spectrum. A white noise (εt )t∈Z has the autocovariance function γ(h) = Since 0 σ2. 0 ≤ λ ≤ 1. h ∈ Z \ {0}.1. we now obtain representation (4.5.1. h = 0 0. Corollary 4. .4. xn ∈ R.1. γ(h) = 0 ei2πλh dF (λ).2 the constant spectral density f (λ) = σ 2 . . 2. 1≤r. Theorem 4. h ∈ Z \ {0}.2 shows that (i) and (ii) are equivalent. A symmetric function γ : Z → R is the autocovariance function of a stationary process (Yt )t∈Z . Proof.1. λ ∈ [0.s≤n xr γ(r 1 − s)xs ≥ 0 for each n ∈ N and x1 . a white noise process weighs all possible frequencies equally.156 The Spectrum of a Stationary Process a measure generating function with F (0) = 0 and 1 0 ˜ g(λ) dF (λ) = 0 1 g(λ) dF (λ). Corollary 4. iff it satisfies one of the following two (equivalent) conditions: 1.1.3). . Example 4. A symmetric function γ : Z → R with t∈Z |γ(t)| < ∞ is the autocovariance function of a stationary process iff f (λ) := t∈Z γ(t)e−i2πλt ≥ 0. . .

consequently.5 + λ) − F (0. and. Thus we have established representation (4. γ is by Corollary 4. elsewhere f (λ) = t∈Z is the autocovariance function of a stationary process iff |ρ| ≤ 0. 1}  0.1 Characterizations of Autocovariance Functions Proof. Example 2.1.3).1. The function f is consequently nonnegative.e. 0 ≤ t∈Z γ(t)e 1 λ ≤ 1. i2πλt Suppose on the other hand that f (λ) = ≥ 0. Since γ is in this case positive semidefinite by Theorem 4.5− ) = F (0. and t∈Z |γ(t)| < ∞ by assumption. The function  1. Choose a number ρ ∈ R.. if h ∈ {−1. which implies that γ is positive semidefinite.5.1. 0 ≤ λ ≤ 1. N 1− see Exercise 4. The spectral distribution function of a stationary process satisfies (Exercise 4.5 − λ)− ). where F (λ) = 0 f (x) dx.1. The in1 verse Fourier transform in Theorem 3. This follows from γ(t)ei2πλt = γ(0) + γ(1)ei2πλ + γ(−1)e−i2πλ = 1 + 2ρ cos(2πλ) ≥ 0 for λ ∈ [0.5.5) − F ((0. t ∈ Z i. The inverse Fourier transform implies γ(t) = 0 f (λ)ei2πλt dλ λ 1 = 0 ei2πλt dF (λ).4. . 1] iff |ρ| ≤ 0.2.2. 0 ≤ λ < 0.10) F (0. cf.s≤N e−i2πλr γ(r − s)ei2πλs |t| γ(t)e−i2πλt → f (λ) as N → ∞.6.5 implies γ(t) = 0 f (λ)ei2πλt dλ. Example 4. if h = 0  γ(h) = ρ. f is the spectral density of γ.5.4 the autocovariance function of a stationary process. Suppose first that γ is an autocovariance function.8.2.2) 0 ≤ fN (λ) : = = |t|<N 157 1 N 1≤r. we have (Exercise 4. Note that the function γ is the autocorrelation function of an MA(1)-process.

(iii) 1 0 f (λ) dλ < ∞. (ii) f (λ) = f (1 − λ). determined by the values f (λ). Its effect on the spectral density.5) .. if the spectral density exists. We use in the following again the notation 0 g(x) dF (x) in place of (0. 2. The preceding discussion shows that a function f : [0. .5]. we obtain from the above symmetry f (0. equivalently. 1 0.5. Let (Zt )t∈Z be a stationary process with spectral distribution function FZ and let (at )t∈Z be an absolutely summable filter with Fourier transform fa . see Theorem 2. turns. the spectral density f (λ) matters only for λ ∈ [0.5. the largest observable frequency is the Nyquist frequency λ0 = 1/P0 = 0. Remark 4. 0 ≤ λ ≤ 1. hence. 0.158 The Spectrum of a Stationary Process where F (x− ) := limε↓0 F (x − ε) is the left-hand limit of F at x ∈ (0.7. . 1] → R is the spectral density of a stationary process iff f satisfies the following three conditions (i) f (λ) ≥ 0. If F has a derivative f . Hence.2 Linear Filters and Frequencies The application of a linear filter to a stationary time series has a quite complex effect on its autocovariance function.5 γ(h) = 0 cos(2πλh) dF (λ) = 2 0 cos(2πλh)f (λ) dλ. f (1 − λ) = f (λ) and. that the smallest nonconstant period P0 visible through observations evaluated at time points t = 1.1. t ∈ Z.e.2.2. if it exists. cf. 4. The autocovariance function of a stationary process is. however. Recall. (4.λ] g(x) dF (x). The linear filtered process Yt := u∈Z au Zt−u .5 − λ) or. 1].6. Theorem 4. . the end of Section 3. moreover.1. is P0 = 2 i.5 + λ) = f (0. out to be λ quite simple. then has the spectral distribution function λ FY (λ) := 0 |fa (x)|2 dFZ (x).1. therefore. 0 ≤ λ ≤ 0.

Proof.6). the effect (4. Theorem 2.2 Linear Filters and Frequencies If in addition (Zt )t∈Z has a spectral density fZ . While the intensity of λ is diminished by the filter (at ) iff |fa (λ)| < 1. The Fourier transform fa of (at ) is. therefore.2 now implies that FY is the uniquely determined spectral distribution function of (Yt )t∈Z . its intensity is amplified iff |fa (λ)| > 1. where γZ is the autocovariance function of (Zt ). Transfer Function and Power Transfer Function Since the spectral density is a measure of intensity of a frequency λ inherent in a stationary process (see the discussion of the periodogram in Section 3. is the spectral density of (Yt )t∈Z .6 yields that (Yt )t∈Z is stationary with autocovariance function γY (t) = u∈Z w∈Z 159 0 ≤ λ ≤ 1.4.3) implies 1 γY (t) = u∈Z w∈Z 1 au aw 0 ei2πλ(t−u+w) dFZ (λ) aw ei2πλw ei2πλt dFZ (λ) w∈Z = 0 1 u∈Z au e−i2πλu = 0 1 |fa (λ)|2 ei2πλt dFZ (λ) ei2πλt dFY (λ).6) au aw γZ (t − u + w). Its spectral representation (4. (4.6) of applying a linear filter (at ) with Fourier transform fa can easily be interpreted.1. t ∈ Z. The second to last equality yields in addition the spectral density (4. . also called transfer function and the function ga (λ) := |fa (λ)|2 is referred to as the gain or power transfer function of the filter (a t )t∈Z . then fY (λ) := |fa (λ)|2 fZ (λ).1. = 0 Theorem 4.2).

5.2.13 and Theorem 3. u ∈ {−1. Frequencies λ close to 0. however. when the moving average (au ) is applied to a process.. those corresponding to a large period.  ga (λ) = λ=0 2 sin(3πλ) 3 sin(πλ) . 1} 0 elsewhere has the transfer function 1 2 + cos(2πλ) 3 3 fa (λ) = and the power transfer function  1. The frequency λ = 1/3 is completely eliminated.2). 0. It shows that frequencies λ close to zero i. 0. are. damped by the approximate factor 0. which correspond to a short period. λ ∈ (0.2.2. remain essentially unaltered. This power transfer function is plotted in Plot 4. The simple moving average of length three au = 1/3.e.1.1a below. since ga (1/3) = 0. .5] (see Exercise 4.2.160 The Spectrum of a Stationary Process Example 4.

001 TO . DO lambda =. SYMBOL1 V = NONE C = GREEN I = JOIN .1a: Power transfer function of the simple moving average of length three. OUTPUT . sas */ TITLE1 ’ Power transfer function ’. AXIS2 LABEL =( F = CGREEK ’l ’) .2.2 Linear Filters and Frequencies 161 Plot 4. QUIT . /* Compute power transfer function */ DATA data1 . END . /* Graphical options */ AXIS1 LABEL =( ’g ’ H =1 ’a ’ H =2 ’( ’ F = CGREEK ’l ) ’) . .2. 9 10 11 12 13 14 15 16 17 18 19 20 Program 4.5 BY .4.001.1: Plotting power transfer function. 1 2 3 4 5 6 7 8 /* power_transfer_sma3 . TITLE2 ’ of the simple moving average of length 3 ’. RUN . g =( SIN (3* CONSTANT ( ’ PI ’) * lambda ) /(3* SIN ( CONSTANT ( ’ PI ’) * lambda ) ) ) →**2. /* Plot power transfer function */ PROC GPLOT DATA = data1 . PLOT g * lambda / VAXIS = AXIS1 HAXIS = AXIS2 .

u = 1  0 elsewhere has the transfer function fa (λ) = 1 − e−i2πλ . therefore. The first order difference filter. The first order difference filter  1. u=0  au = −1. Since fa (λ) = e−iπλ eiπλ − e−iπλ = ie−iπλ 2 sin(πλ). .162 The Spectrum of a Stationary Process Example 4.5.3.2. damps frequencies close to zero but amplifies those close to 0. its power transfer function is ga (λ) = 4 sin2 (πλ).

. . fa(s) (λ) = 1 − e−i2πλs and the power transfer function ga(s) (λ) = 4 sin2 (πλs).4.2.2 Linear Filters and Frequencies 163 Plot 4.e. The preceding example immediately carries over to the seasonal difference filter of arbitrary length s ≥ 0 i.4. u = s  0 elsewhere.  1.2a: Power transfer function of the first order difference filter. u=0  (s) au = −1.2. which has the transfer function Example 4.

164

The Spectrum of a Stationary Process

Plot 4.2.3a: Power transfer function of the seasonal difference filter of order 12.

Since sin2 (x) = 0 iff x = kπ and sin2 (x) = 1 iff x = (2k + 1)π/2, k ∈ Z, the power transfer function ga(s) (λ) satisfies for k ∈ Z ga(s) (λ) = 0, iff λ = k/s 4 iff λ = (2k + 1)/(2s).

This implies, for example, in the case of s = 12 that those frequencies, which are multiples of 1/12 = 0.0833, are eliminated, whereas the midpoint frequencies k/12 + 1/24 are amplified. This means that the seasonal difference filter on the one hand does what we would like it to do, namely to eliminate the frequency 1/12, but on the other hand it generates unwanted side effects by eliminating also multiples of 1/12 and by amplifying midpoint frequencies. This observation gives rise to the problem, whether one can construct linear filters that have prescribed properties.

4.2 Linear Filters and Frequencies

165

Least Squares Based Filter Design
A low pass filter aims at eliminating high frequencies, a high pass filter aims at eliminating small frequencies and a band pass filter allows only frequencies in a certain band [λ0 − ∆, λ0 + ∆] to pass through. They consequently should have the ideal power transfer functions glow (λ) = ghigh (λ) = gband (λ) = 1, λ ∈ [0, λ0 ] 0, λ ∈ (λ0 , 0.5] 0, λ ∈ [0, λ0 ) 1, λ ∈ [λ0 , 0.5] 1, λ ∈ [λ0 − ∆, λ0 + ∆] 0 elsewhere,

where λ0 is the cut off frequency in the first two cases and [λ0 − ∆, λ0 + ∆] is the cut off interval with bandwidth 2∆ > 0 in the final one. Therefore, the question naturally arises, whether there actually exist filters, which have a prescribed power transfer function. One possible approach for fitting a linear filter with weights au to a given transfer function f is offered by utilizing least squares. Since only filters of finite length matter in applications, one chooses a transfer function
s

fa (λ) =
u=r

au e−i2πλu

with fixed integers r, s and fits this function fa to f by minimizing the integrated squared error
0.5 0

|f (λ) − fa (λ)|2 dλ

in (au )r≤u≤s ∈ Rs−r+1 . This is achieved for the choice (Exercise 4.16)
0.5

au = 2 Re
0

f (λ)ei2πλu dλ ,

u = r, . . . , s,

which is formally the real part of the inverse Fourier transform of f .

166

The Spectrum of a Stationary Process Example 4.2.5. For the low pass filter with cut off frequency 0 < λ0 < 0.5 and ideal transfer function f (λ) = 1[0,λ0 ] (λ) we obtain the weights
λ0

au = 2
0

cos(2πλu) dλ =

2λ0 , u=0 1 πu sin(2πλ0 u), u = 0.

Plot 4.2.4a: Transfer function of least squares fitted low pass filter with cut off frequency λ0 = 1/10 and r = −20, s = 20.
1 2 3 4 5 6 7 8

/* transfer . sas */ TITLE1 ’ Transfer function ’; TITLE2 ’ of least squares fitted low pass filter ’; /* Compute transfer function */ DATA data1 ; DO lambda =0 TO .5 BY .001; f =2*1/10;

4.3 Spectral Densities of ARMA-Processes
DO u =1 TO 20; f = f +2*1/( CONSTANT ( ’ PI ’) * u ) * SIN (2* CONSTANT ( ’ PI ’) *1/10* u ) * COS (2* →CONSTANT ( ’ PI ’) * lambda * u ) ; END ; OUTPUT ; END ; /* Graphical options */ AXIS1 LABEL =( ’f ’ H =1 ’a ’ H =2 F = CGREEK ’( l ) ’) ; AXIS2 LABEL =( F = CGREEK ’l ’) ; SYMBOL1 V = NONE C = GREEN I = JOIN L =1; /* Plot transfer function */ PROC GPLOT DATA = data1 ; PLOT f * lambda / VAXIS = AXIS1 HAXIS = AXIS2 VREF =0; RUN ; QUIT ;

167

9 10

11 12 13 14 15 16 17 18 19 20 21 22 23

Program 4.2.4: Transfer function of least squares fitted low pass filter.
The programs in Section 4.2 (Linear Filters and Frequencies) are just made for the purpose of generating graphics, which demonstrate the shape of power transfer functions or, in case of Program 4.2.4 (transfer.sas), of a transfer function. They all consist of two parts, a DATA step and a PROC step. In the DATA step values of the power transfer function are calculated and stored in a variable g by a DO loop over lambda from 0 to 0.5 with a small increment. In Program 4.2.4 (transfer.sas) it is necessary to use a second DO loop within the first one to calculate the sum used in the definition of the transfer function f. Two AXIS statements defining the axis labels and a SYMBOL statement precede the procedure PLOT, which generates the plot of g or f versus lambda.

The transfer function in Plot 4.2.4a, which approximates the ideal transfer function 1[0,0.1] , shows higher oscillations near the cut off point λ0 = 0.1. This is known as Gibbs’ phenomenon and requires further smoothing of the fitted transfer function if these oscillations are to be damped (cf. Section 6.4 of Bloomfield (1976)).

4.3

Spectral Densities of ARMA-Processes

Theorem 4.2.1 enables us to compute the spectral density of an ARMAprocess. Theorem 4.3.1. Suppose that Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z,

168

The Spectrum of a Stationary Process is a stationary ARMA(p, q)-process, where (εt ) is a white noise with variance σ 2 . Put A(z) := 1 − a1 z − a2 z 2 − · · · − ap z p , B(z) := 1 + b1 z + b2 z 2 + · · · + bq z q and suppose that the process (Yt ) satisfies the stationarity condition (2.3), i.e., the roots of the equation A(z) = 0 are outside of the unit circle. The process (Yt ) then has the spectral density fY (λ) = |B(e−i2πλ )|2 σ2 |A(e−i2πλ )|2 |1 + =σ |1 −
2 q −i2πλv 2 | v=1 bv e . p au e−i2πλu |2 u=1

(4.7)

Proof. Since the process (Yt ) is supposed to satisfy the stationarity condition (2.3) it is causal, i.e., Yt = v≥0 αv εt−v , t ∈ Z, for some absolutely summable constants αv , v ≥ 0, see Section 2.2. The white noise process (εt ) has by Example 4.1.3 the spectral density fε (λ) = σ 2 and, thus, (Yt ) has by Theorem 4.2.1 a spectral density fY . The application of Theorem 4.2.1 to the process Xt := Yt − a1 Yt−1 − · · · − ap Yt−p = εt + b1 εt−1 + · · · + bq εt−q then implies that (Xt ) has the spectral density fX (λ) = |A(e−i2πλ )|2 fY (λ) = |B(e−i2πλ )|2 fε (λ). Since the roots of A(z) = 0 are assumed to be outside of the unit circle, we have |A(e−i2πλ )| = 0 and, thus, the assertion of Theorem 4.3.1 follows. The preceding result with a1 = · · · = ap = 0 implies that an MA(q)process has the spectral density fY (λ) = σ 2 |B(e−i2πλ )|2 . With b1 = · · · = bq = 0, Theorem 4.3.1 implies that a stationary AR(p)-process, which satisfies the stationarity condition (2.3), has the spectral density fY (λ) = σ 2 1 |A(e−i2πλ )|2 .

4.3 Spectral Densities of ARMA-Processes Example 4.3.2. The stationary ARMA(1, 1)-process

169

Yt = aYt−1 + εt + bεt−1

with |a| < 1 has the spectral density 1 + 2b cos(2πλ) + b2 . fY (λ) = σ 1 − 2a cos(2πλ) + a2
2

The MA(1)-process, in which case a = 0, has, consequently the spectral density fY (λ) = σ 2 (1 + 2b cos(2πλ) + b2 ),

and the stationary AR(1)-process with |a| < 1, for which b = 0, has the spectral density 1 . 1 − 2a cos(2πλ) + a2

fY (λ) = σ 2

The following figures display spectral densities of ARMA(1, 1)-processes for various a and b with σ 2 = 1.

170

The Spectrum of a Stationary Process

Plot 4.3.1a: Spectral densities of ARMA(1, 1)-processes Yt = aYt−1 + εt + bεt−1 with fixed a and various b; σ 2 = 1.
1 2 3 4 5 6 7 8 9

/* arma11_sd . sas */ TITLE1 ’ Spectral densities of ARMA (1 ,1) - processes ’; /* Compute spectral densities of ARMA (1 ,1) - processes */ DATA data1 ; a =.5; DO b = -.9 , -.2 , 0 , .2 , .5; DO lambda =0 TO .5 BY .005; f =(1+2* b * COS (2* CONSTANT ( ’ PI ’) * lambda ) + b * b ) /(1 -2* a * COS (2* →CONSTANT ( ’ PI ’) * lambda ) + a * a ) ; OUTPUT ; END ; END ; /* Graphical options */ AXIS1 LABEL =( ’f ’ H =1 ’Y ’ H =2 F = CGREEK ’( l ) ’) ; AXIS2 LABEL =( F = CGREEK ’l ’) ; SYMBOL1 V = NONE C = GREEN I = JOIN L =4; SYMBOL2 V = NONE C = GREEN I = JOIN L =3; SYMBOL3 V = NONE C = GREEN I = JOIN L =2; SYMBOL4 V = NONE C = GREEN I = JOIN L =33; SYMBOL5 V = NONE C = GREEN I = JOIN L =1; LEGEND1 LABEL =( ’ a =0.5 , b = ’) ; /* Plot spectral densities of ARMA (1 ,1) - processes */

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

In the DATA step some loops over the varying parameter and over lambda are used to calculate the values of the spectral densities of the corresponding processes.3. QUIT . RUN . PLOT f * lambda = b / VAXIS = AXIS1 HAXIS = AXIS2 LEGEND = LEGEND1 . Plot 4. 1)-processes with parameter b fixed and various a.3 Spectral Densities of ARMA-Processes PROC GPLOT DATA = data1 . Like in section 4.1: Computation of spectral densities. Corresponding file: arma11 sd2.sas. Here SYMBOL statements are necessary and a LABEL statement to distinguish the different curves generated by PROC GPLOT. 171 25 26 27 Program 4.2 (Linear Filters and Frequencies) the programs here just generate graphics.4.2a: Spectral densities of ARMA(1.3. .

172 The Spectrum of a Stationary Process Plot 4. . Corresponding file: ma1 sd.sas.3.3a: Spectral densities of MA(1)-processes Yt = εt + bεt−1 for various b.

W1 . z = x + iy. 2. Formulate and prove Theorem 4. Then M (n) is a positive semidefinite matrix (check that z T K z = ¯ T (n) n (x. Consider the real-valued 2n × 2n-matrices M (n) 1 = 2 K1 K2 (n) (n) . . .3. .4a: Spectral densities of AR(1)-processes Yt = aYt−1 + εt for various a.1. Vn . Corresponding file: ar1 sd. y ∈ R ). −K2 K1 (n) (n) Kl (n) = Kl (r − s) 1≤r. y) M (x. .1 for Hermitian functions K and complex-valued stationary processes.Exercises 173 Plot 4. l = 1. Exercises 4. .s≤n . .1.1: Let (V1 . y). Hint for the sufficiency part: Let K1 be the real part and K2 be the imaginary part of K.sas. .1. Wn ) be a 2n-dimensional normal distributed random vector with mean vector zero and covariance matrix M (n) and define for n ∈ N the family of finite dimensional . . Proceed as in the proof of Theorem 4. x.

b] → R.2. b] with F (a) = G(a) = 0 and ψ(x) F (dx) = [a.t+n (v1 . W1 ≤ w1 .5 γ(h) = sin(2πah) 2πh . 0. b]. Suppose that F and G are measure generating functions defined on some interval [a. Show that A is also positive semidefinite for complex numbers i. Hint: Approximate the indicator function 1[a. Compute the autocovariance function of a stationary process with spectral density f (λ) = 0.3) to show that for 0 < a < 0.e. Vn ≤ vn . Wt )t∈Z with mean vector zero and covariances 1 E(Vt+h Vt ) = E(Wt+h Wt ) = K1 (h) 2 1 E(Vt+h Wt ) = − E(Wt+h Vt ) = K2 (h).5. 4. Use (4. by continuous functions. Show that F=G. . h ∈ Z \ {0} h=0 is the autocovariance function of a stationary process. The Spectrum of a Stationary Process Ft+1. z T A¯ ≥ 0 for z ∈ Cn . . 2 Conclude by showing that the complex-valued process Yt := Vt − iWt .b] ψ(x) G(dx) for every continuous function ψ : [a..52 0 ≤ λ ≤ 1.3.5 − |λ − 0. . 4. Wn ≤ wn }. Compute its spectral density.174 distributions Chapter 4. . has the autocovariance function K. . 4.. t ∈ Z. . By Kolmogorov’s theorem there exists a bivariate Gaussian process (Vt .e.b] [a. . a.4.. xT Ax ≥ 0 for x ∈ Rn . . Suppose that A is a real positive semidefinite n × n-matrix i. . wn ) := P {V1 ≤ v1 . w1 . vn .t] (x). x ∈ [a.. z 4.5| . t ∈ Z...

5) dF (λ) = 0 g(0. Suppose that (Yt )t∈Z and (Zt )t∈Z are stationary processes such that Yr and Zs are uncorrelated for arbitrary r. consider the function g(x) = 1[0. which in turn form a dense subset in the space of square integrable functions. . N 4. . A real valued stationary process (Yt )t∈Z is supposed to have the spectral density f (λ) = a + bλ. (C´saro convergence) Show that limN →∞ t=1 at = S implies e limN →∞ N −1 (1−t/N )at = S. the hint in Exercise 4. .5. Which conditions must be satisfied by a and b? Compute the autocovariance function of (Yt )t∈Z . If their spectral densities f X and fY satisfy fX (λ) ≤ fY (λ) for 0 ≤ λ ≤ 1.5] → C 1 with 0 |g(λ − 0. Let (Yt ) be a real valued stationary process with spectral distribution function F . on compact sets. Hint: N −1 (1−t/N )at = (1/N ) N −1 t=1 t=1 s=1 175 s t=1 at .9. . Hint: Verify the equality first for g(x) = exp(i2πhx). show that (i) Γn.5)|2 dF (λ) < ∞ 1 1 0 g(λ − 0.5 − λ) dF (λ). Finally. Show that for any function g : [−0.5]. .X and Γn. s ∈ Z. .7.X is a positive semidefinite matrix. h ∈ Z. Show that the process (Xt ) is also stationary and compute its spectral distribution function. . λ ∈ [0. the trigonometric polynomials are uniformly dense in the space of continuous functions.11. 4.Exercises 4. 4. 0 ≤ ξ ≤ 0.Y are the covariance matrices of (X1 . Let (Xt ) and (Yt ) be stationary processes with mean zero and absolute summable covariance functions.5). 0. Denote by FY and FZ the pertaining spectral distribution functions and put X t := Yt + Zt . 4. . Generate a white noise process and plot its periodogram.5 + λ) − F (0. In particular we have F (0.Y −Γn. Yn )T respectively. and .5 − λ)− ). Xn )T and (Y1 .10. and then use the fact that. where Γn.6. 4. 0.5 (cf.5) − F ((0.8. t ∈ Z.ξ] (x).5− ) = F (0.

Yn )) for all a = (a1 . 4. .  0 au = has the gain function ga (λ) =  1.5]. Xn )) ≤ Var(aT (Y1 . u ≤ |q| 0 elsewhere sin((2q+1)πλ) (2q+1) sin(πλ) .11 (iii)). . then the process Zt = w∈Z bw Yt−w has the spectral distribution function λ FZ (λ) = 0 |Fa (µ)|2 |Fb (µ)|2 dFX (µ) (cf. an )T ∈ Rn . 4. 1} u=0 elsewhere. t ∈ Z. Exercise 3. . u < 0. . .  au = 1/2.13. Plot this function for various α. . Is this filter for large q a low pass filter? Plot its power transfer functions for q = 5/10/20. Compute the gain function of the exponential smoothing filter au = α(1 − α)u . . . Compute the gain function  1/4. λ ∈ (0. 4.15. . . Let (Xt )t∈Z be a stationary process. 0. If (bw )w∈Z is another absolutely summable filter.  λ=0 2 of the filter u ∈ {−1. . The Spectrum of a Stationary Process (ii) Var(aT (X1 . u ≥ 0 0. What is the effect of α → 0 or α → 1? 4. . The simple moving average 1/(2q + 1).12.176 Chapter 4. . where 0 < α < 1.14.2. (au )u∈Z an absolutely summable filter and put Yt := u∈Z au Xt−u . Hint: Exercise 3.

Show that (4. An AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt satisfying the stationarity condition (2. . . a2 .2.2) is a necessary condition for K to be the autocovariance function of a stationary process. Plot these functions.23) has the spectral density fY (λ) = σ2 . au ∈ R.18.3) (cf. Compute in analogy to Example 4. Exercise 2.5 177 D(ar . is minimized for 0. 4. . Hint: Put f (λ) = f1 (λ) + if2 (λ) and differentiate with respect to au .16. . . . 1 + a2 + a2 + 2(a1 a2 − a1 ) cos(2πλ) − 2a2 cos(4πλ) 2 1 Plot this function for various choices of a1 . .Exercises 4.17. u = r.5 the transfer functions of the least squares high pass and band pass filter. 4. as ) = 0 |f (λ) − fa (λ)|2 dλ with fa (λ) = s u=r au e−i2πλu . Show that the function 0. s.19.5 au := 2 Re 0 f (λ)ei2πλu dλ . . Is Gibbs’ phenomenon observable? 4. .

The Spectrum of a Stationary Process .178 Chapter 4.

We will test the nullhypothesis A=B=0 against the alternative A = 0 or B = 0. though it will turn out that it is not a consistent estimate. Consistency requires extra smoothing of the periodogram via a linear filter. We start with the model Yt = µ + A cos(2πλt) + B sin(2πλt) + εt . it seems plausible to apply it to the preceding testing problem as well. Since the periodogram is used for the detection of highly intensive frequencies inherent in the data. . will be basic for both problems.Chapter Statistical Analysis in the Frequency Domain We will deal in this chapter with the problem of testing for a white noise and the estimation of the spectral density. Note that (Yt )t∈Z is a stationary process only under the nullhypothesis A = B = 0. The empirical counterpart of the spectral density. whether the data are generated by a white noise (εt )t∈Z .1 Testing for a White Noise Our initial step in a statistical analysis of a time series in the frequency domain is to test. the periodogram. the variance σ 2 > 0 and the intercept µ ∈ R are unknown. where the frequency λ. 5 5. where we assume that the εt are independent and normal distributed with mean zero and variance σ 2 .

. Sε (m/n) = An−1 (εt − ε)1≤t≤n ¯ = A(I n − n−1 E n )(εt )1≤t≤n . . . Lemma 5. .1. . In a preparatory step we compute the distribution of the periodogram. σ 2 /(2n))-distributed. . Sε (k/n).   m . sin 2π n n  . . . . . 1 ≤ k ≤ [(n − 1)/2]. Cε (m/n). . 1 ≤ k ≤ [(n − 1)/2]. Sε (1/n). (3. Let ε1 . are independent and identically N (0. Denote by ε := n−1 n εt the sample mean of ε1 . . . ¯ n t=1 k (εt − ε) sin 2π t ¯ n t=1 n n the cross covariances with Fourier frequencies k/n. Proof.1. . . . 2π m n 2π m n cos 2π m 2 n sin 2π m 2 n   . . . . cf. Note that with m := [(n − 1)/2] we have v := Cε (1/n). sin   . cos 2π n n   m . Then the 2[(n − 1)/2] random variables Cε (k/n). . . . εn be independent and identically normal distributed random variables with mean µ ∈ R and variance σ 2 > 0. where the 2m × n-matrix A is given by   sin  1 A :=  n  cos  sin  1 cos 2π n 1 2π n 1 cos 2π n 2 1 . . .180 Statistical Analysis in the Frequency Domain The Distribution of the Periodogram In the following we will derive the tests by Fisher and Bartlett– Kolmogorov–Smirnov for the above testing problem. εn and by ¯ t=1 Cε Sε k n k n 1 = n 1 = n k (εt − ε) cos 2π t .1). . cos 2π n n 1 2π n n 1 2π n 2 T sin .

. evaluated at the Fourier frequencies k/n. . P {Iε (k/n)/σ 2 ≤ x} = Proof. ..2. .1.1. Let ε1 . thus. normal distributed with mean vector zero and covariance matrix σ 2 A(I n − n−1 E n )(I n − n−1 E n )T AT = σ 2 AAT σ2 = I n. x ≥ 0.7 in Falk et al.2 in Falk et al. see e.1.1 Testing for a White Noise I n is the n × n-unity matrix and E n is the n × n-matrix with each entry being 1. 1 ≤ k ≤ [(n − 1)/2]. x > 0 0. Corollary 5.2. the assertion follows. The random variables Iε (k/n)/σ 2 are independent and identically standard exponential distributed i. σ2 2n Sε (k/n) σ2 1 − exp(−x). (2002). Definition 2. x ≤ 0. (2002). are independent standard normal random variables and.e. Theorem 2.1. Since this distribution has the distribution function 1 − exp(−x/2). εn be as in the preceding lemma and let 2 2 Iε (k/n) = n Cε (k/n) + Sε (k/n) be the pertaining periodogram. The vector v is. 2n = σ 2 A(I n − n−1 E n )AT 181 which is a consequence of (3.1.g.1 implies that 2n Cε (k/n). see e. 2Iε (k/n) = σ2 2n Cε (k/n) σ2 2 + 2n Sε (k/n) σ2 2 is χ2 -distributed with two degrees of freedom.g.5. Lemma 5.5) and the orthogonality properties from Lemma 3. . therefore.

Corollary 5. Um−1:m−1 ). The empirical distribution function of S 1 . the Lebesgue-density f (s1 . 1]. Then we have S1 . therefore.6. . The following consequence of the preceding result is obvious.5. . . is. Sm−1 ) has. j=1 x ∈ [0. i. . . Sm−1 =D U1:m−1 . for example.. Let ε1 . Put S0 := 0 and Mm := max (Sj − Sj−1 ) = 1≤j≤m max1≤j≤m Iε (j/n) . . . . . The following result. . which will be basic for our further considerations. . Um−1 . . .x] (Sj ) =D m−1 m−1 1(0. Note that Sm = 1. . Zm+1 are independent and identically exponential distributed random variables.1. Corollary 5. . . . .3. an immediate consequence of Corollary 5. where Z1 . . . . Sm−1 is distributed like that of U1 . .182 Statistical Analysis in the Frequency Domain Denote by U1:m ≤ U2:m ≤ · · · ≤ Um:m the ordered values pertaining to independent and uniformly on (0.1. By =D we denote equality in distribution. The vector (S1 . m k=1 Iε (k/n) . εn be independent N (µ. 1) distributed random variables U1 . . m k=1 Iε (k/n) j = 1.x] (Uj ). . σ 2 )-distributed random variables and denote by Sj := j k=1 Iε (k/n) . .7 in Reiss (1989). . . the cumulated periodogram.1. . . . . Theorem 5.e. .1. m := [(n − 1)/2]. .2. . It is a well known result in the theory of order statistics that the distribution of the vector (Uj:m )1≤j≤m coincides with that of ((Z1 + · · · + Zj )/(Z1 + · · · + Zm+1 ))1≤j≤m . if 0 < s1 < · · · < sm−1 < 1 0 elsewhere. . see also Exercise 5.3. ˆ Fm−1 (x) := 1 m−1 m−1 j=1 1 1(0. Um . therefore. Theorem 1.4. sm−1 ) = (m − 1)!. . . . . see. .

i.e. σ 2 )-distributed. Fisher’s Test The preceding results suggest to test the hypothesis Yt = εt with εt independent and N (µ. 1]. . .3 the vector (V1 .1.05. .1 Testing for a White Noise The maximum spacing Mm has the distribution function m 183 Gm (x) := P {Mm ≤ x} = Proof.1. . . Precisely. . . . Um−1 : (V1 . By Theorem 5. j = 1. Put (−1)j j=0 m (max{0. and this is provided by the covering theorem as stated in Theorem 3 in Section I. Table 5. Common values are α = 0. The hypothesis is. Vm ) =D (U1:m−1 . . . 1 − Um−1:m−1 ). therefore. . . . .01 and = 0. .12). m. by testing for the uniform distribution on [0. This is Fisher’s test for hidden periodicities. . rejected at error level α if κm > c α with 1 − Gm (cα ) = α. if one of the values I(j/n) is significantly larger than the average over all.9 of Feller (1971). 1−jx})m−1 . . we will reject this hypothesis if Fisher’s κ-statistic max1≤j≤m I(j/n) = mMm κm := (1/m) m I(k/n) k=1 is significantly large. Vm ) is distributed like the length of the m consecutive intervals into which [0. U2:m−1 − U1:m−1 . 1] is partitioned by the m − 1 random points U1 . .. Note that these quantiles can be approximated by corresponding quantiles of a Gumbel distribution if m is large (Exercise 5. The probability that Mm is less than or equal to x equals the probability that all spacings Vj are less than or equal to x. . lists several critical values cα . . taken from Fuller (1976). . j x > 0. Vj := Sj − Sj−1 .1.5.

701 5.01 m 5. σ 2 )-distributed √ is rejected if ∆m−1 > cα / m − 1. This Bartlett-Kolmogorov-Smirnov test can also be carried out visuˆ ally by plotting for x ∈ [0.358 150 6. σ 2 )distributed.01 9. If actually Yt = εt with εt independent and identically N (µ.889 9.408 5.103 200 6.019 5..63 are the critical values for the levels α = 0.389 8. y =x± √ m−1 . the hypothesis Yt = εt with εt being independent and N (µ.4 that the empirical ˆ distribution function Fm−1 of S1 .36 and c0.473 9.450 5.079 11.184 m 10 15 20 25 30 40 50 60 70 80 90 100 Statistical Analysis in the Frequency Domain c0.594 250 6.967 7.225 600 8.584 8.977 500 8.295 6.122 7. 1] the sample distribution function Fm−1 (x) and the band cα .707 9.344 11.313 9. .916 11.601 800 8.378 c0. For m > 30 i.1: Critical values cα of Fisher’s test for hidden periodicities. where c0. Therefore. .3.785 6. with the Kolmogorov–Smirnov statistic ˆ ∆m−1 := sup |Fm−1 (x) − x| x∈[0.1.1. Sm−1 behaves stochastically exactly like that of m−1 independent and uniformly on (0.334 10. The following rule is quite common.960 10.05 = 1.237 350 7.147 8.258 7.663 400 7.05 4. 1) distributed random variables. 1].220 11.612 9.733 9.372 9.123 9. x ∈ [0.882 1000 c0.750 900 8.480 10.1.567 6. .01 = 1.832 8.935 6.05 7. then we know from Corollary 5.955 300 7.428 700 8.e. n > 62.454 Table 5.748 8.842 c0. The Bartlett–Kolmogorov–Smirnov Test Denote again by Sj the cumulated periodogram as in Theorem 5.721 10.05 and α = 0.01.1] we can measure the maximum difference between the empirical distribution function and the theoretical one F (x) = x.164 10. .

185 SPECTRA Procedure ----. rejects this hypoth√ esis at both√ levels. not reject the hypothesis at the levels α = 0. INFILE ’c :\ data \ airline . where εt are independent and identically normal distributed. TITLE3 ’ adjusted Airline Data ’.Smirnov Statistic : Maximum absolute difference of the standardized partial sums of the periodogram and the CDF of a uniform (0 .63/ 64 = 0. (Airline Data). where m = 65.1) random variable .1 Testing for a White Noise ˆ The hypothesis Yt = εt is rejected if Fm−1 (x) is for some x ∈ [0. The Bartlett–Kolmogorov–Smirnov test.5730 Bartlett ’ s Kolmogorov . TITLE2 ’ for the trend und seasonal ’.20375. We want to test.05 and α = 0. The Fisher test statistic has the value κm = 6.Test for White Noise for variable DLOGNUM ----Fisher ’ s Kappa : M * MAX ( P (*) ) / SUM ( P (*) ) Parameters : M = 65 MAX ( P (*) ) = 0.6. /* Read in the data and compute log . therefore. txt ’.transformation as well as →seasonal and trend adjusted data */ DATA data1 . Example 5. trend eliminated and seasonally adjusted Airline Data from Example 1.36/ 64 = 0. whether the variance stabilized.1.028 SUM ( P (*) ) = 0. however.1.1a: Fisher’s κ and the Bartlett-Kolmogorov-Smirnov test with m = 65 for testing a white noise generation of the adjusted Airline Data.01.275 Test Statistic : Kappa = 6. 1] outside of this confidence band. 1 2 3 4 5 6 /* airline_whitenoise .2319 > 1.5.17 and also ∆64 > 1. Test Statistic = 0. 7 8 .3. since ∆64 = 0.2319 Listing 5.573 and does.1 were generated from a white noise (εt )t∈Z . sas */ TITLE1 ’ Tests for white noise ’.

Fisher’s test and the Bartlett-Kolmogorov-Smirnov test. The following figure visualizes the rejection at both levels by the Bartlett-Kolmogorov-Smirnov test. dlognum = DIF12 ( DIF ( LOG ( num ) ) ) . whereby the options P and OUT=data2 gener- ate a data set containing the periodogram data. In the DATA step the raw data of the airline passengers are read into the variable num. One has to compare these values with the critical values from Table 5. QUIT . The option WHITETEST causes SAS to carry out the two tests for a white noise. A logtransformation. VAR dlognum . 9 10 11 12 13 14 15 Program 5.1 (Critical values for Fisher’s Test in the script) and the √ approximative ones cα / m − 1.186 Statistical Analysis in the Frequency Domain INPUT num @@ . Then PROC SPECTRA is applied to this variable. RUN . building the fist order difference for trend elimination and the 12th order difference for elimination of a seasonal component lead to the variable dlognum.1. . SAS only provides the values of the test statistics but no decision. which is supposed to be generated by a stationary process.1: Testing for white noise. /* Compute periodogram and test for white noise */ PROC SPECTRA DATA = data1 P WHITETEST OUT = data2 .1.

05/0. 1]. SET data2 ( FIRSTOBS =2) . sas ) */ /* Calculate the sum of the periodogram */ PROC MEANS DATA = data2 ( FIRSTOBS =2) NOPRINT .63/ SQRT ( _FREQ_ -1) .63/ SQRT ( _FREQ_ -1) . 13 14 15 16 17 18 19 20 21 . /* Compute empirical distribution function of cumulated periodogram → and its confidence bands */ DATA data4 .5. RETAIN s 0. VAR P_01 . yu_01 = fm +1.36/ SQRT ( _FREQ_ -1) .01. TITLE3 ’ Airline Data ’.1 Testing for a White Noise 187 Plot 5. sas */ TITLE1 ’ Visualisation of the test for white noise ’.2a: Bartlett-Kolmogorov-Smirnov test with m = 65 testing for a white noise generation of the adjusted Airline Data. x ∈ [0. /* Note that this program needs data2 generated by the previous →program ( airline_whitenoise . yu_05 = fm +1. s = s + P_01 / psum . yl_01 = fm -1. Solid ˆ line/broken line = confidence bands for Fm−1 (x). fm = _N_ /( _FREQ_ -1) . IF _N_ =1 THEN SET data3 .1. at levels α = 0. OUTPUT OUT = data3 SUM = psum . 1 2 3 4 5 6 7 8 9 10 11 12 /* a i r l i n e _ w h i t e n o i s e _ p l o t . TITLE2 ’ for the trend und seasonal adjusted ’.

The empirical distribution of the cumulated periodogram is represented as a step function due to the I=STEPJ option in the SYMBOL1 statement. The next DATA step combines every observation of data2 with this sum by means of the IF statement. PROC MEANS calculates the sum (keyword SUM) of the SAS periodogram variable P 0 and stores it in the variable psum of the data set data3. The NOPRINT option suppresses the printing of the output. In the preceding section we computed the distribution of the empirical counterpart of a spectral density.0 BY . The last part of this program contains SYMBOL and AXIS statements and PROC GPLOT to visualize the Bartlett-Kolmogorov-Smirnov statistic. which was created by PROC MEANS and contains the number m. AXIS2 LABEL = NONE . The variable fm contains the values of the empirical distribution function calculated by means of the automatically generated variable N containing the number of observation and the variable FREQ . AXIS1 LABEL =( ’x ’) ORDER =(. The values of the upper and lower band are stored in the y variables.sas).1) . 5. SYMBOL2 V = NONE I = JOIN C = RED L =2. Furthermore a variable s is initialized with the value 0 by the RETAIN statement and then the portion of each periodogram value from the sum is cumulated. PLOT fm * s =1 yu_01 * fm =2 yl_01 * fm =2 yu_05 * fm =3 yl_05 * fm =3 / OVERLAY → HAXIS = AXIS1 VAXIS = AXIS2 . where the first observation belonging to the frequency 0 is dropped.2 Estimating Spectral Densities We suppose in the following that (Yt )t∈Z is a stationary real valued process with mean µ and absolutely summable autocovariance function γ. 22 23 24 25 26 27 28 29 30 31 32 33 34 Program 5.188 Statistical Analysis in the Frequency Domain yl_05 = fm -1.5. /* Plot empirical distribution function of cumulated periodogram →with its confidence bands */ PROC GPLOT DATA = data4 .1. This program uses the data set data2 created by Program 5. According to Corollary 4. the process (Yt ) has the continuous spectral density f (λ) = h∈Z γ(h)e−i2πλh . QUIT .1 (airline whitenoise.2: Testing for white noise with confidence bands. SYMBOL3 V = NONE I = JOIN C = RED L =1. /* Graphical options */ SYMBOL1 V = NONE I = STEPJ C = GREEN .36/ SQRT ( _FREQ_ -1) .1. the periodogram. in the particular . RUN .0 TO 1.1.

[n/2] |h|<n c(h)e and the sample autocovariance function n−|h| In (k/n) = (5. . (5. This leads to the equivalent representation of . In this section we will investigate the limit behavior of the periodogram for arbitrary independent random variables (Yt ). 0 ≤ k ≤ [n/2] we put 1 In (k/n) = n 1 = n n Yt e−i2π(k/n)t t=1 n t=1 2 k Yt cos 2π t n 2 n + t=1 k Yt sin 2π t n 2 . . By representation (5.2) ¯ with Yn := n−1 n t=1 Yt 1 c(h) = n t=1 ¯ ¯ Yt − Yn Yt+|h| − Yn . .2. k=0 −i2π(k/n)h . .7). For Fourier frequencies k/n. k = 1.5).3 we obtain the representation ¯ nYn2 . 189 Asymptotic Properties of the Periodogram In order to establish asymptotic properties of the periodogram.5.6) with the definition of the periodogram as given in (3. From Theorem 3.1) Up to k = 0. the value In (k/n) ¯ does not change for k = 0 if we replace the sample mean Yn in c(h) by the theoretical mean µ. when (Yt ) is a Gaussian white noise.1) and the equations (3. its following modification is quite useful. this coincides by (3.2 Estimating Spectral Densities case.

If µ = 0.190 Statistical Analysis in the Frequency Domain the periodogram for k = 0 In (k/n) = |h|<n 1 n n−|h| t=1 (Yt − µ)(Yt+|h| − µ) e−i2π(k/n)h n−1 1 = n n t=1 (Yt − µ)2 + 2 h=1 1 n n−|h| t=1 k (Yt − µ)(Yt+|h| − µ) cos 2π h . Proof.2.8) we have 1 E(In (0)) − nµ2 = n = = |h|<n n n 1 n t=1 s=1 n n t=1 s=1 E(Yt Ys ) − nµ2 Cov(Yt . By representation (5.2) and the C´saro convergence result (Exe ercise 4. h∈Z . n 2n n 2n (5.5] as a piecewise constant function In (λ) = In (k/n) if k 1 k 1 − <λ≤ + . 0. 0. Theorem 5.5].3) We define now the periodogram for λ ∈ [0. Ys ) 1− |h| n→∞ γ(h) −→ n γ(h) = f (0). Then we have with µ = E(Y t ) E(In (0)) − nµ2 −→ f (0). n→∞ n→∞ E(In (λ)) −→ f (λ). Let (Yt )t∈Z be a stationary process with absolutely summable autocovariance function γ.4) The following result shows that the periodogram In (λ) is for λ = 0 an asymptotically unbiased estimator of the spectral density f (λ). n (5. then the convergence E(In (λ)) −→ f (λ) holds uniformly on [0.1. n→∞ λ = 0.

6) Choose now λ ∈ (0. k ∈ Z. n |h|<n |h|<n and. 0. the series |h|<n γ(h) exp(−i2πλh) converges to f (λ) uniformly for 0 ≤ λ ≤ 0. the series fn (λ) := |h|<n 1− |h| γ(h)e−i2πλh n n→∞ converges to f (λ) uniformly in λ as well. Since gn (λ) −→ λ as n → ∞.5].6) we obtain for such n E(In (λ)) = |h|<n 1 n n−|h| t=1 E (Yt − µ)(Yt+|h| − µ) e−i2πgn (λ)h |h| γ(h)e−i2πgn (λ)h . By (5. it follows that gn (λ) > 0 for n large enough.5) 191 Then we obviously have In (λ) = In (gn (λ)). n if k 1 k 1 − < λ ≤ + .5.8) implies moreover |h| γ(h)e−i2πλh ≤ n |h| n→∞ |γ(h)| −→ 0. The uniform convergence in case of µ = 0 then follows from the uniform convergence of gn (λ) to λ and the uniform continuity of f on the compact interval [0.5. (5.5] | E(In (λ)) − f (λ)| = |fn (gn (λ)) − f (λ)| ≤ |fn (gn (λ)) − f (gn (λ))| + |f (gn (λ)) − f (λ)| −→ 0.2 Estimating Spectral Densities Define now for λ ∈ [0. 0. From gn (λ) −→ λ and the continuity of f we obtain for λ ∈ (0. n 2n n 2n (5. Kronecker’s Lemma (Exercise 5.3) and (5. n = |h|<n 1− Since h∈Z |γ(h)| < ∞. thus.5]. . n→∞ Note that |gn (λ) − λ| ≤ 1/(2n). 0.5] the auxiliary function gn (λ) := k . 0.

Actually. In (k/n) and In (j/n) are for j = k uncorrelated. . . If E(Zt4 ) = ησ 4 < ∞. . . The random vector (In (λ1 ). we established in Corollary 5. The Gaussian case was already established in Corollary 5.8) and Cov(In (j/n). thus.2 that they are independent in this case.2. . Theorem 5. Since In (λ) = 2/n t=1 n Zt cos(2πgn (λ)t).192 Statistical Analysis in the Frequency Domain In the following result we compute the asymptotic distribution of the periodogram for independent and identically distributed random variables with zero mean. if n even σ 4 + n−1 (η − 3)σ 4 elsewhere (5.2. Zt sin(2πgn (λ)t). Denote by In (λ) the pertaining periodogram as defined in (5. . (5.9) and. t=1 2/n 1 2 2 An (λ) + Bn (λ) . 2 .1.6). 1. then we have for k = 0. . σ 2 )-distributed random variables Zt we have η = 3 (Exercise 5. which are not necessarily Gaussian ones. 2. [n/2] Var(In (k/n)) = 2σ 4 + n−1 (η − 3)σ 4 . k = 0 or k = n/2. In (λr )) with 0 < λ1 < · · · < λr < 0. .2.1. Put for λ ∈ (0. Proof. Let Z1 . Zn be independent and identically distributed random variables with mean E(Zt ) = 0 and variance E(Zt2 ) = σ 2 < ∞. In (k/n)) = n−1 (η − 3)σ 4 . .5 converges in distribution for n → ∞ to the distribution of r independent and identically exponential distributed random variables with mean σ 2 .5) n An (λ) := An (gn (λ)) := Bn (λ) := Bn (gn (λ)) := with gn defined in (5. . 0. . .7) j = k. For N (0.5).

. its asymptotic joint normality follows easily by applying the Cram´r-Wold device. and e proceeding as before.5) if n is large enough. by repeating the arguments in the proof of Corollary 5.e. . satisfies the Lindeberg condition implying An (λ) −→D N (0. σ 2 ) as well.2 imply Var(An (λ)) = Var(An (gn (λ))) =σ For ε > 0 we have 1 n n t=1 22 n 193 n cos2 (2πgn (λ)t) = σ 2 .1.7 in Billingsley (1968). Since gn (λ) −→ λ. . . Bn (λ1 ).σ 2 I 2r ). Bn (λ1 ). Bn (λr ) −→D N (0. where I 2r denotes the 2r × 2r-unity matrix and −→D convergence in n→∞ distribution. Since the random vector (An (λ1 ). Bn (λr )) has by (3. σ 2 ). 1 ≤ t ≤ n. . An (λr ).2. .1) we conclude as in the proof of Theorem 3.2 in Billingsley (1968). we have gn (λ) > 0 for λ ∈ (0. An (λr ). n ∈ N.2. t=1 E Zt2 cos2 (2πgn (λ)t)1{|Zt cos(2πgn (λ)t)|>ε√nσ2 } n t=1 2 E Zt2 1{|Zt |>ε√nσ2 } = E Z1 1{|Z1 |>ε√nσ2 } n→∞ 1 ≤ n −→ 0 i. to show that An (λ1 ). Theorem 7.2) the covariance matrix σ 2 I 2r .5. cf. From the definition of the periodogram in (5. .1. . see.2 Estimating Spectral Densities it suffices. the triangular array (2/n)1/2 Zt cos(2πgn (λ)t). Theorem 7..3 1 In (k/n) = n n n Zs Zt e−i2π(k/n)(s−t) s=1 t=1 .2. for example. The independence of Zt together with the definition of gn and the orthogonality equations in Lemma 3. 0. Similarly one shows that Bn (λ) −→D N (0. This proves part (i) of Theorem 5.2.

5)). n 2 t=1 E(Zt ) = σ 2 we finally obtain n Cov(In (j/n). n n e t=1 n −i2π((j+k)/n)t 2 + t=1 e−i2π((j−k)/n)t − 2n n 2 e t=1 i2π((j+k)/n)t 2 1 + 2 n ei2π((j−k)/n)t t=1 2 .194 and.2 can be generalized to filtered processes Yt = u∈Z au Zt−u . thus. s = u. s = u = t = v. In (k/n)) (η − 3)σ 4 σ 4 = + 2 n n n e t=1 i2π((j+k)/n)t 2 + t=1 ei2π((j−k)/n)t 2 . s = t = u = v  E(Zs Zt Zu Zv ) = σ 4 . u = v  = e−i2π((j+k)/n)s ei2π((j+k)/n)t .2. which equals by Example 4. In this case one has to replace σ 2 .7) and (5.1.3.8) follow by using (3. from which (5. t = v  −i2π((j−k)/n)s i2π((j−k)/n)t e e . Theorem 5. with (Zt )t∈Z as in Theorem 5. Remark 5. s = t = u = v. s = v.2. s=1 t=1 u=1 v=1 and  ησ 4 . we obtain E(In (j/n)In (k/n)) 1 = 2 n We have n n n Statistical Analysis in the Frequency Domain n E(Zs Zt Zu Zv )e−i2π(j/n)(s−t) e−i2π(k/n)(u−v) . s = t. t = u.2.2. s = v = t = u   0 elsewhere e−i2π(j/n)(s−t) e−i2π(k/n)(u−v) This implies E(In (j/n)In (k/n)) ησ 4 σ 4 + 2 n(n − 1) + = n n (η − 3)σ 1 = + σ4 1 + 2 n n From E(In (k/n)) = n−1 4  1.3 the constant spectral .

In (k/n)) = O(n−1 ). then we have in (ii) the expansions Var(In (k/n)) = and 2 2fY (k/n) + O(n−1/2 ). that consistency can be achieved for a smoothed version of the periodogram such as a simple moving average 1 k+j In .12. . 2m + 1 n |j|≤m which puts equal weights 1/(2m + 1) on adjacent values. defining the adjacent points of k/n. see Section 2.9) n n |j|≤m The sequence m = m(n).10) and the weights ajn have the properties (a) ajn ≥ 0. Yn . has to satisfy n→∞ n→∞ m −→ ∞ and m/n −→ 0.2 and Remark 2. The law of large numbers together with the above remark motivates. if n is even 2 fY (k/n) + O(n−1/2 ) elsewhere. Dropping the condition of equal weights. which contains in particular ARMA-processes. in (i) by the spectral density fY (λi ). . however. (5. j = k. . . Recall that the class of processes Yt = u∈Z au Zt−u is a fairly rich one. (5. Discrete Spectral Average Estimator The preceding results show that the periodogram is not a consistent estimator of the spectral density.2 Estimating Spectral Densities density fZ (λ). we define a general linear smoother by the linear filter k+j ˆ k := fn ajn In . The above terms O(n−1/2 ) and O(n−1 ) are uniformly bounded in k and j by a constant C.3 of Brockwell and Davis (1991). k = 0 or k = n/2. . 1 ≤ i ≤ r.5. where In is the periodogram pertaining to Y1 . 195 Cov(In (j/n).1. We omit the highly technical proof and refer to Section 10. If in addition u∈Z |au ||u|1/2 < ∞.

ˆ ˆ = Var(fn (λ))+Bias2 (fn (λ)) −→n→∞ 0.5 − λ). is called the discrete spectral average estimator .5] we put In (0. The estimator ˆ ˆ fn (λ) := fn (gn (λ)).4.2. Together ˆ with (i) we. then In ((k + j)/n) is understood as the periodic extension of In with period 1. Theorem 5. λ = µ. with gn as defined in (5.5 u∈Z bu Zt−u .5  ˆ (λ).fn (µ) ˆ Cov fn (ii) limn→∞ = f 2 (λ). λ = µ = 0 or 0.  2f 2 (λ).11) 2 n→∞ |j|≤m ajn −→ For the simple moving average we have. for example ajn = 1/(2m + 1). (5.5].11) on the weights together with (ii) in the precedn→∞ ˆ ing result entails that Var(fn (λ)) −→ 0 for any λ ∈ [0. 0. The following result states its consistency for linear processes. 0 < λ = µ < 0. therefore. which defines the periodogram also on [0. 0. t ∈ Z.5).5 2  |j|≤m ajn 0. E(Zt4 ) < ∞ and 0 ≤ µ.196 (b) ajn = a−jn .5. 1]. . obtain that the mean squared error of fn (λ) vanishes asymptotically: ˆ MSE(fn (λ)) = E ˆ fn (λ)−f (λ) 2 ˆ (i) limn→∞ E fn (λ) = f (λ). 0. Then we have for u∈Z |bu ||u| Condition (d) in (5. (c) (d) |j|≤m ajn Statistical Analysis in the Frequency Domain = 1. Let Yt = E(Zt ) = 0. |j| ≤ m 0 elsewhere n→∞ and |j|≤m a2 = 1/(2m + 1) −→ 0. where Zt are iid with 1/2 < ∞. If (k + j)/n is outside of the interval [0. 1].5 + jn λ) := In (0. This also applies to the spectral density f . For λ ∈ [0. λ ≤ 0.

According to Remark 5.2.16). ˆ From the definition of fn we obtain ˆ ˆ Cov(fn (λ). The spectral density f of the process (Yt ) is continuous (Exercise 5. By the definition of the spectral density estimator in (5.2. From Theorem 5.11) together with the triangular ˆ inequality implies | E(fn (λ)) − f (λ)| < ε for large n.5.1 we know that in the case E(Yt ) = 0 |j|≤m max | E(In (gn (λ) + j/n)) − f (gn (λ) + j/n)| < ε/2 if n is large. where (5. fn (µ)) = |j|≤m |k|≤m ajn akn Cov In (gn (λ) + j/n). If λ = µ and n sufficiently large we have gn (λ) + j/n = gn (µ) + k/n for arbitrary |j|. Condition (c) in (5. This implies part (i). n→∞ Choose ε > 0. In (gn (µ) + k/n) .2 Estimating Spectral Densities Proof. and hence we have |j|≤m max |f (gn (λ) + j/n) − f (λ)| < ε/2 if n is sufficiently large.3 there exists a .10) together with the uniform convergence of gn (λ) to λ implies |j|≤m max |gn (λ) + j/n − λ| −→ 0.9) we have ˆ | E(fn (λ)) − f (λ)| = = |j|≤m 197 |j|≤m ajn E In (gn (λ) + j/n)) − f (λ) ajn E In (gn (λ) + j/n)) − f (gn (λ) + j/n) + f (gn (λ) + j/n) − f (λ) . |k| ≤ m.

they . Utilizing again Remark 5. with a suitable constant C2 > 0. Repeating the arguments in the proof of part (i) one shows that S1 (λ) = |j|≤m a2 f 2 (λ) + o jn |j|≤m a2 . fn (µ))| = ajn akn O(n−1 ) |j|≤m |k|≤m −1 |j|≤m 2 ≤ C1 n ≤ C1 ajn a2 . At frequency λ = 0. Suppose now 0 < λ = µ < 0.5 are shown in a similar way (Exercise 5. jn Furthermore. (Yt − µ)1≤t≤n and (Yt − Y )1≤t≤n coincide at Fourier frequencies different from zero. The preceding result requires zero mean variables Yt . we have 1 |S2 (λ)| ≤ C2 n ajn |j|≤m 2 ≤ C2 2m + 1 n a2 . be too restrictive in practice.3 we have ˆ Var(fn (λ)) = |j|≤m a2 f 2 (gn (λ) + j/n) + O(n−1/2 ) jn ajn akn O(n−1 ) + o(n−1 ) |j|≤m |k|≤m + =: S1 (λ) + S2 (λ) + o(n−1 ). however.5. This might. however.198 Statistical Analysis in the Frequency Domain universal constant C1 > 0 such that ˆ ˆ | Cov(fn (λ). The remaining cases λ = µ = 0 and λ = µ = 0. we have established (ii) in the case λ = µ.17). the periodograms ¯ of (Yt )1≤t≤n . jn |j|≤m 2m + 1 n where the final inequality is an application of the Cauchy–Schwarz n→∞ inequality.2. Since m/n −→ 0.5. jn |j|≤m Thus we established the assertion of part (ii) also in the case 0 < λ = µ < 0.5). Due to (3.

2.2 Estimating Spectral Densities will differ in general.5. To estimate f (0) consistently also in the case µ = E(Yt ) = 0.5. The observation that a periodogram has the tendency to split a peak is known as the leakage phenomenon. (Sunspot Data). These weights pertain to a simple moving average of length 3 of a simple moving average of length 7. Example 5. it is ˆ replaced by fn (0). a3n = 2/21 and a4n = 1/21.2. We want to estimate the spectral density underlying the Sunspot Data. (5. . n = 176. These data. Example 6.2.1 visible in the periodogram.9). are the yearly sunspot numo bers between 1749 and 1924. The smoothed version joins the two peaks close to the frequency λ = 0. one puts 199 m ˆ fn (0) := a0n In (1/n) + 2 j=1 ajn In (1 + j)/n .1a shows the pertaining periodogram and Plot 5. Since the resulting estimator of the spectral density involves only Fourier frequencies different from zero.1b displays the discrete spectral average estimator with weights a0n = a1n = a2n = 3/21.2.12) Each time the value In (0) occurs in the moving average (5. we can assume without loss of generality that the underlying variables Yt have zero mean. For a discussion of these data and further literature we refer to Wei (1990). Plot 5. also known as the Wolf or W¨lfer (a student of Wolf) Data.

1 /* sunspot_dsae .2.200 Statistical Analysis in the Frequency Domain Plot 5.1a: Periodogram for Sunspot Data. Plot 5.1b: Discrete spectral average estimate for Sunspot Data. sas */ .2.

txt ’.1: Periodogram and discrete spectral average estimate for Sunspot Data. Both plots are then printed with PROC GPLOT. Let K : [−1. A mathematically convenient way to generate weights ajn . RUN . In the DATA step the data of the sunspots are read into the variable num. x ∈ n→∞ 1 [−1. Note that SAS automatically normalizes these weights. /* Read in the data */ DATA data1 . 201 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Program 5. whereby the options P (see Program 5. SET data2 ( FIRSTOBS =2) . In following DATA step the slightly different definition of the periodogram by SAS is being adjusted to the definition used here (see Program 3. AXIS2 LABEL = NONE . s = S_01 /2*4* CONSTANT ( ’ PI ’) . AXIS1 LABEL =( F = CGREEK ’l ’) ORDER =(0 TO .1. WEIGHTS 1 2 3 3 3 2 1. ∞) be a symmetric function i.. airline whitenoise. 1]. /* Graphical options */ SYMBOL1 I = JOIN C = RED V = NONE L =1.11).e. is the use of a kernel function. INPUT num @@ . PLOT p * lambda / HAXIS = AXIS1 VAXIS = AXIS2 . VAR num .2. Let now m = m(n) −→ ∞ . INFILE ’c :\ data \ sunspot . QUIT . which satisfies −1 K 2 (x) dx < ∞. PLOT s * lambda / HAXIS = AXIS1 VAXIS = AXIS2 .1. p = P_01 /2.2. /* Plot periodogram and estimated spectral density */ PROC GPLOT DATA = data3 . TITLE2 ’ Woelfer Sunspot Data ’. lambda = FREQ /(2* CONSTANT ( ’ PI ’) ) .2 Estimating Spectral Densities TITLE1 ’ Periodogram and spectral density estimate ’. /* Adjusting different periodogram definitions */ DATA data3 .sas). which satisfy the conditions (5.05) . star periodogram.5.sas) and S generate a data set stored in data2 containing the periodogram data and the estimation of the spectral density which SAS computes with the weights given in the WEIGHTS statement. Then PROC SPECTRA is applied to this variable.1. 1] → [0. /* Computation of peridogram and estimation of spectral density */ PROC SPECTRA DATA = data1 P S OUT = data2 .5 BY . K(−x) = K(x).

11) (Exercise 5. where 0 < a ≤ 1/4. (iv) The Parzen kernel (1957) is given by  1 − 6|x|2 + 6|x|3 . Example 4.13) n→∞ These weights satisfy the conditions (5. n = 160.  0 |x| < 1/2 1/2 ≤ |x| ≤ 1 elsewhere. . −m ≤ j ≤ m. (ii) The Bartlett or triangular kernel is given by KB (x) := 1 − |x|. −1 ≤ x ≤ 1.2.2 implies that the process (Yt ) has the spectral density f (λ) = 1 − 1.202 Statistical Analysis in the Frequency Domain be an arbitrary sequence of integers with m/n −→ 0 and put ajn := K(j/m) . . We estimate f (λ) by means of the Tukey–Hanning kernel. m i=−m K(i/m) −m ≤ j ≤ m.18).  KP (x) := 2(1 − |x|)3 . Then we obtain m − |j| . |x| ≤ 1 0 elsewhere. m2 Example 5. Example 5. . . Take for example K(x) := 1 − |x|. (5.2.2 cos(2πλ) + 0. .3. a = 1/4 yields the Tukey– We refer to Andrews (1991) for a discussion of these kernels. (i) The truncated kernel is defined by ajn = KT (x) = 1. |x| ≤ 1 0 elsewhere. The particular choice Hanning kernel . (iii) The Blackman–Tukey kernel (1959) is defined by KBT (x) = 1 − 2a + 2a cos(x).6. |x| ≤ 1 0 elsewhere. We consider realizations of the MA(1)-process Yt = εt −0.6εt−1 with εt independent and standard normal for t = 1.36.7.

TITLE2 ’ of MA (1) . e = RANNOR (1) . sas */ TITLE1 ’ Spectral density and Blackman .36 (solid line). VAR y .5. DO t =0 TO 160. y =e -.2 cos(2πλ) + 0. RUN . END .process ’. /* Generate MA (1) .Tukey estimator ’.6* LAG ( e ) .process */ DATA data1 .2 Estimating Spectral Densities 203 Plot 5. /* Adjusting different definitions */ DATA data3 . . /* Estimation of spectral density */ PROC SPECTRA DATA = data1 ( FIRSTOBS =2) S OUT = data2 .2a: Discrete spectral average estimator (broken line) with Blackman–Tukey kernel with parameters r = 10. OUTPUT . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 /* ma1_blackman_tukey .2. a = 1/4 and underlying spectral density f (λ) = 1 − 1. WEIGHTS TUKEY 10 0.

QUIT . These are merged with the values of the estimated spectral density and then displayed by PROC GPLOT. DO l =0 TO .2. which accesses the value of e of the preceding loop. /* Compute underlying spectral density */ DATA data4 . s = S_01 /2*4* CONSTANT ( ’ PI ’) .2.1 (sunspot dsae. In the first DATA step the realizations of an MA(1)-process with the given parameters are created.2. which generates standard normally distributed data. Confidence Intervals for Spectral Densities The random variables In ((k+j)/n)/f ((k+j)/n). OUTPUT . /* Merge the data sets */ DATA data5 . AXIS2 LABEL =( F = CGREEK ’l ’) ORDER =(0 TO . lambda = FREQ /(2* CONSTANT ( ’ PI ’) ) .sas) PROC SPECTRA computes the estimator of the spectral density (after dropping the first observation) by the option S and stores them in data2. /* Graphical options */ AXIS1 LABEL = NONE .5 BY .3 for large n approximately behave like independent . The following DATA step generates the values of the underlying spectral density.sas). 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Program 5.2* COS (2* CONSTANT ( ’ PI ’) * l ) +.2. Since this is not needed here it is set to 0. SYMBOL2 I = JOIN C = RED V = NONE L =3. RUN . and LAG.204 Statistical Analysis in the Frequency Domain SET data2 . END .2: Computing discrete spectral average estimator with Blackman–Tukey kernel. SYMBOL1 I = JOIN C = BLUE V = NONE L =1. 0 < k+j < n/2.1) . star periodogram. are used.36. As in Program 5. Thereby the function RANNOR. MERGE data3 ( KEEP = s lambda ) data4 . PLOT f * l =1 s * lambda =2 / OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 . The second number after the TUKEY option can be used to refine the choice of the bandwidth. /* Plot underlying and estimated spectral density */ PROC GPLOT DATA = data5 .5 BY . The weights used here come from the Tukey– Hanning kernel with a specified bandwidth of m = 10.01.1. The next DATA step adjusts the different definitions of the spectral density used here and by SAS (see Program 3. f =1 -1. will by Remark 5.

. Note that the gamma distribution with parameters p = ν/2 and b = 1/2 equals the χ2 -distribution with ν degrees of freedom if ν . t ≥ 0. where Γ(p) := 0 xp−1 exp(−x) dx denotes the gamma function.2 Estimating Spectral Densities and standard exponential distributed random variables Xj .4 (Exercise 5. where Y follows a gamma distribution with parameters p := ν/2 > 0 and b = 1/2 i. Var(cY ) = 2c2 ν = f 2 (k/n) |j|≤m a2 . The parameters ν and c are determined by the method of moments as follows: ν and c are chosen such that cY has mean f (k/n) and its ˆ variance equals the leading term of the variance expansion of f (k/n) in Theorem 5. Tukey (1949) showed that the distribution of this weighted sum can in turn be approximated by that of cY with a suitably chosen c > 0.21): E(cY ) = cν = f (k/n). bp P {Y ≤ t} = Γ(p) ∞ t 0 xp−1 exp(−bx) dx. This suggests that the distribution of the discrete spectral average estimator ˆ f (k/n) = |j|≤m 205 ajn In k+j n can be approximated by that of the weighted sum |j|≤m ajn Xj f ((k + j)/n). jn The solutions are obviously c= f (k/n) 2 a2 jn |j|≤m and ν= 2 2 |j|≤m ajn .2.e.5.

the random variˆ ˆ able ν f (k/n)/f (k/n) = f (k/n)/c now approximately follows a χ2 (ν)distribution with the convention that χ2 (ν) is the gamma distribution with parameters p = ν/2 and b = 1/2 if ν is not an integer. By χ2 (ν) we denote the q-quantile of the χ2 (ν)-distribution q i.2 cos(2πλ)+0.e.8.36 of the MA(1)process Yt = εt − 0. 12. of the true spectral density and the pertaining confidence intervals.. 18. 9. 2 ˆ log(f (k/n)) + log(ν) − log(χα/2 (ν)) .6εt−1 using the discrete spectral average estimator ˆ fn (λ) with the weights 1. 7 and 11. 15.14). 2 This interval has constant length log(χ2 1−α/2 (ν)/χα/2 (ν)).14) is a confidence interval for f (k/n) of approximate level 1 − α. 21. . 1. 20. 12.5).206 Statistical Analysis in the Frequency Domain is an integer.2. Note that Cν.3a displays the logarithms of the estimates.2. 21. The interval ˆ ˆ ν f (k/n) ν f (k/n) .2.7 we want to estimate the spectral density f (λ) = 1−1. 20. called the equivalent degree of freedom. 0. 15. 0 < q < 1. 6. Observe that ν/f (k/n) = 1/c. with 0 < k < [n/2].α (k/n) := ˆ log(f (k/n)) + log(ν) − log(χ2 1−α/2 (ν)). 3. 21. 2 χ2 1−α/2 (ν) χα/2 (ν) (5. we q obtain the confidence interval for log(f (k/n)) Cν.α (k/n) is a level (1 − α)-confidence interval only for log(f (λ)) at a fixed Fourier frequency λ = k/n. The number ν is. but not simultaneously for λ ∈ (0. 18. Taking logarithms in (5. 9. In continuation of Example 5. These weights are generated by iterating simple moving averages of lengths 3. 1). α ∈ (0. 3. Plot 5. Example 5. therefore. each divided by 231. 6. P {Y ≤ χ2 (ν)} = q.

e = RANNOR (1) .2. /* Adjusting different definitions and computation of confidence →bands */ DATA data3 . OUTPUT . VAR y .6* LAG ( e ) . and confidence intervals of level 1 − α = 0. TITLE3 ’ of MA (1) .3a: Logarithms of discrete spectral average estimates (broken line). 20 21 22 . .2 cos(2πλ) + 0. lambda = FREQ /(2* CONSTANT ( ’ PI ’) ) .5. TITLE2 ’ of their estimates and confidence intervals ’. . y =e -. END .2 Estimating Spectral Densities 207 Plot 5.process */ DATA data1 . ’. WEIGHTS 1 3 6 9 12 15 18 20 21 21 21 20 18 15 12 9 6 3 1.95 for log(f (k/n)). . /* Estimation of spectral density */ PROC SPECTRA DATA = data1 ( FIRSTOBS =2) S OUT = data2 . log_s_01 = LOG ( S_01 /2*4* CONSTANT ( ’ PI ’) ) . RUN . . t = 1. n = 160. of spectral density f (λ) = 1 − 1. DO t =0 TO 160.36 (solid line) of MA(1)-process Yt = εt − 0. /* Generate MA (1) .6εt−1 . sas */ TITLE1 ’ Logarithms of spectral density .process ’. SET data2 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 /* ma1_logdsae .

Y having continuous distribution functions it follows that P {X = Y } = 0. QUIT . SYMBOL1 I = JOIN C = BLUE V = NONE L =1. Exercises 5. log_f = LOG ((1 -1.025 . /* Plot underlying and estimated spectral density */ PROC GPLOT DATA = data5 . This program starts identically to Program 5. /* Merge the data sets */ DATA data5 .2 (ma1 blackman tukey.LOG ( CINV (.1.sas).2. MERGE data3 ( KEEP = log_s_01 lambda c1 c2 ) data4 .5 BY 0. of the underlying density and of the confidence intervals is analogous to Program 5. .3: Computation of logarithms of discrete spectral average estimates. This is followed by the computation of ν according to its definition. c2 = log_s_01 + LOG ( nu ) .01. PLOT log_f * l =1 log_s_01 * lambda =2 c1 * lambda =3 c2 * lambda =3 / →OVERLAY VAXIS = AXIS1 HAXIS = AXIS2 . DO l =0 TO . RUN . Only this time the weights are directly given to SAS.36) ) . SYMBOL3 I = JOIN C = GREEN V = NONE L =33. The rest of the program which displays the logarithm of the estimated spectral density. In the next DATA step the usual adjustment of the frequencies is done.975 . END . /* Graphical options */ AXIS1 LABEL = NONE .LOG ( CINV (.2.1) . AXIS2 LABEL =( F = CGREEK ’l ’) ORDER =(0 TO . nu ) ) . nu ) ) . OUTPUT . Statistical Analysis in the Frequency Domain nu =2/(3763/53361) .208 Chapter 5.5 BY .2* COS (2* CONSTANT ( ’ PI ’) * l ) +. of spectral density and confidence intervals.2 (ma1 blackman tukey. The logarithm of the confidence intervals is calculated with the help of the function CINV which returns quantiles of a χ2 -distribution with ν degrees of freedom. c1 = log_s_01 + LOG ( nu ) . SYMBOL2 I = JOIN C = RED V = NONE L =2.2. For independent random variables X. 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Program 5. /* Compute underlying spectral density */ DATA data4 .sas) with the generation of an MA(1)-process and of the computation the spectral density estimator. Hint: Fubini’s theorem.

n).4. .2 that F has a (Lebesgue) density f . . xn ) = n! j=1 f (xj ). j t ∈ R. . . x1 < · · · < x n . .e. The ordered vector (X1:n . Then we have n 209 P {Xk:n ≤ t} = j=k n F (t)j (1 − F (t))n−j . Generate 100 independent and standard normal random variables εt and plot the periodogram. 5.1). .2. 0) and co1 ρ variance matrix . . Hint: Denote by Sn the group of permutations of {1. . . Exercise 5.5. and the minimum X1:n has distribution function 1 − (1 − F )n . . Z)T := (X. is normal distributed with mean vector (0.. ρX+ 1 − ρ2 Y )T . Does this imply the independence of X and Y ? Hint: Let X N (0. . . . . . 5.3.Exercises 5. n} i. Denote by X1:n ≤ · · · ≤ Xn:n the pertaining order statistics. standard normal distributed random variables.. (τ (n)) with τ ∈ Sn is a permutation of (1. (ii) Suppose that X and Y are normal distributed and uncorrelated. Put for τ ∈ Sn the set Bτ := {Xτ (1) < · · · < Xτ (n) }. . −1 < ρ < 1. These sets are disjoint and we have P ( τ ∈Sn Bτ ) = 1 since P {Xj = Xk } = 0 for i = j (cf. . Hint: n {Xk:n ≤ t} = j=1 1(−∞. 1)distributed and define the random variable Y = V X with V independent of X and P {V = −1} = 1/2 = P {V = 1}. Xn be iid random variables with values in R and distribution function F . . Xn:n ) then has a density n fn (x1 . . The maximum Xn:n has in particular the distribution function F n . Suppose in addition to the conditions in Exercise 5. Let X1 . 5. . ρ = 0). (i) Let X and Y be independent. Show that the vector (X. .t] (Xj ) ≥ k . and zero elsewhere. . . Is the hypothesis that the observations . (τ (1). .e. and that X and Z are independent if and ρ 1 only if they are uncorrelated (i.

e. σ 2 )(x) = 0. i. 2π 5. n ∈ N. σ 2 ) satisfies (i) (ii) (iii) x2k+1 dN (0.05(0.01? 5. 1 ≤ j ≤ 48. Plot also the original data. This implies . limn→∞ P {|Yn −c| > ε} = 0 for arbitrary ε > 0.01 cα y =x± √ . m−1 5.9. 300.10. σ 2 )(x) = 2k+1 √ k!σ 2k+1 .6.8. 5. (Kronecker’s lemma) Let (aj )j≥0 be an absolute summable complexed valued filter. . (Slutzky’s lemma) Let X. Suppose that Xn converges in distribution to X (denoted by Xn →D X) i. |x|2k+1 dN (0. FXn (t) → FX (t) for every continuity point of FX as n → ∞. 1). be random variables in R with distribution functions FX and FXn . together with the pertaining bands for α = 0. . Is the hypothesis Yt = εt rejected at level α = 0. Generate the values 1 Yt = cos 2π t + εt . (Share Data) Test the hypothesis that the share data were generated by independent and identically normal distributed random variables and plot the periodogramm.. Let Yn . Show that limn→∞ n (j/n)|aj | = 0. Xn . n ∈ N.01)? Visualize the Bartlett-Kolmogorov-Smirnov test by plotting the empirical distribution function of the cumulated periodograms Sj . The normal distribution N (0. j=0 5. Plot the data and the periodogram. Show that a χ2 (ν)-distributed random variable satisfies E(Y ) = ν and Var(Y ) = 2ν. . x2k dN (0. where εt are independent and standard normal. x ∈ (0. be another sequence of random variables which converges stochastically to some constant c ∈ R.. 5.e.05 and α = 0.9.7. 6 t = 1. Statistical Analysis in the Frequency Domain were generated by a white noise rejected at level α = 0. respectively. k ∈ N ∪ {0}. k ∈ N. k ∈ N ∪ {0}. .11. σ 2 )(x) = 1 · 3 · · · · · (2k − 1)σ 2k . Hint: Exercise 5.210 Chapter 5.

− Uk:n n n . Suppose that U1 .11. . . (iii) Xn /Yn →D X/c. [(n − 1)/2]. σ 2 )-distributed and c = 0 is an arbitrary constant. k = 1. . Un are uniformly distributed on (0. t = t0 εt + c. 1) and ˆ let Fn denote the pertaining empirical distribution function. Which effect has an outlier on the periodogram? Check this for the simple model (Yt )t.13. Show that the distribution function Fm of Fisher’s test statistic κm satisfies under the condition of independent and identically normal observations εt Fm (x+ln(m)) = P {κm ≤ x+ln(m)} −→ exp(−e−x ) =: G(x).. n}) Yt = εt .2 and 5. Hint: Exercise 5.14.12. . . . Show that ˆ sup |Fn (x) − x| = max max Uk:n − (k − 1) k . where the εt are independent and identically normal N (0.Exercises (i) Xn + Yn →D X + c. Hence we have P {κm > x} = 1 − Fm (x) ≈ 1 − exp(−m e−x ).n (t0 ∈ {1. The reverse implication is not true in general.. (ii) Xn Yn →D cX. Show to this end E(IY (k/n)) = E(Iε (k/n)) + c2 /n Var(IY (k/n)) = Var(Iε (k/n)) + 2c2 σ 2 /n. if c = 0. Give an example. . .. 211 This entails in particular that stochastic convergence implies convergence in distribution. t = t0 . . 5. .. . . 5. The limiting distribution G is known as the Gumbel distribution. 0≤x≤1 1≤k≤n . 5. m→∞ x ∈ R.

εn in case of an even sample size n. −1 ≤ x ≤ 1 (see equation (5.16. (Monte Carlo Simulation) For m large we have under the hypothesis √ P { m − 1∆m−1 > cα } ≈ α.22.13).α (k/n) for fixed α (preferably α = 0. Complete the proof of Theorem 5.01 = 1. .2.4 (ii) for the remaining cases λ = µ = 0 and λ = µ = 0. Hint: Exercise 5.3Yt−1 + εt − 0. σ 2 )-distributed random variables ε1 . 5.05) but for various ν. how often this statistic exceeds the critical values c0. Calculate the mean and the variance of Y . Compute the length of the confidence interval Cν. Compare the estimates with the log spectral density of (Yt )t∈Z .18.36 and c0. 5. .21. 5. . For the calculation of ν use the weights generated by the kernel K(x) = 1 − |x|.19. For different values of m (> 30) generate 1000 times the test statistic √ m − 1∆m−1 based on independent random variables and check. Verify that the weights (5.11). .15. Compute the distribution of the periodogram Iε (1/2) for independent and identically normal N (0.20.05 = 1. . Plot the periodogram and estimates of the log spectral density together with confidence intervals. 5. Statistical Analysis in the Frequency Domain 5.5εt−1 .212 Chapter 5. 5. Suppose that Y follows a gamma distribution with parameters p and b.2. Use the IML function ARMASIM to simulate the process Yt = 0.17. 5. 5. 1 ≤ t ≤ 150.5.14.13) defined via a kernel function satisfy the conditions (5.4 show that the spectral density f of (Yt )t is continuous. where εt are independent and standard normal. In the situation of Theorem 5.63.

n  n. These data are among the longest time series in hydrology.5 −0.23. . or are these hidden periodicities? Estimate the spectral density in this case.24. Compare with the spectral density of an AR(1)-process. (Nile Data) Between 715 and 1284 the river Nile had its lowest annual minimum levels.Exercises 5. / y∈Z is the Fejer kernel of order n. (iv) (v) 0. Verify that it has the properties (i) Kn (y) ≥ 0. 1 n sin(πyn) sin(πy) 2 . Kn (y) dy −→ 1. Show that for y ∈ R 1 n where Kn (y) = n−1 213 ei2πyt t=0 2 = |t|<n 1− |t| i2πyt e = Kn (y). Use discrete spectral estimators as well as lag window spectral density estimators. y ∈ Z. δ > 0. (ii) the Fejer kernel is a periodic function of period length one. (iii) Kn (y) = Kn (−y). Can the trend removed Nile Data be considered as being generated by a white noise.5 Kn (y) dy δ −δ = 1. n→∞ 5.

214 Chapter 5. Statistical Analysis in the Frequency Domain .

J. P.E.A.E. G. Generalized autoregressive conditional heteroscedasticity. New York.E. and Tiao. 355–365.B. ed. Time Series: Theory and Methods. (2002).W. [11] Cleveland. printing.Bibliography [1] Andrews. Assoc. 581–587. [13] Engle. (1986). 26.P.P. B. Econometrics 31. (1975). Springer. and Cox. (1982). J. (1976). 307–327. 211–252. (1976). J. [5] Box. [3] Bloomfield. New York. P. Amex. Statist. Biometrika 64. 2nd ed. and Tiao. and Davis. New York.P. New York. Stat.M. Distribution of residual autocorrelation in autoregressive-integrated moving average time series models. [9] Brockwell. Sob. [7] Box. 2nd. [4] Bollerslev. Amex.F. Springer. Heteroscedasticity and autocorrelation consistent covariance matrix estimation. [6] Box. G. and Davis. G. A canonical analysis of multiple time series. Springer Texts in Statistics. G. John Wiley. (1976).C. Assoc. 71. and Pierce. (1991). and Jenkins. (1970).R. W. D. Springer. G. R. R. P. Holden–Day. G. [10] Brockwell. Statist. Rev. San Francisco. [2] Billingsley. Autoregressive conditional heteroscedasticity with estimates of the variance of UK inflation. 987–1007. . Convergence of Probability Measures. T. R. (1977).E. 817–858. Introduction to Time Series and Forecasting.J. Times Series Analysis: Forecasting and Control. Econometrica 59. J. An analysis of transformations (with discussion). J. John Wiley. Functions of One Complex Variable. [12] Conway. P. [8] Box. Decomposition of seasonal time series: a model for the census X–11 program. (1968).P. New York. D. (1964).P. Springer Series in Statistics. Fourier Analysis of Time Series: An Introduction.J. 2nd ed.K. Econometrica 50.A.C. G. D.A. 1509–1526.R. (1991). 65.

(1957). Simulation. Brooks/Cole. J. Applications. On a measure of lack of fit in time series models. A new approach to linear filtering. (1988). Forecasting. J.J. (1957). 35–45. London. C. Biometrika 65.W. Statist. 335–346. 3rd ed. [18] Fuller. New York. [15] Falk. Birkh¨user. Asymptotic properties of residual based tests for cointegration. The American Statistician 48. [29] Phillips. New York. John Wiley.K. [21] Hannan. Vol. R. (1988). G. (1993). The Analysis of Multiple Time Series.F.E. [24] Kendall. [22] Janacek. An Introduction to Probability Theory and Its Applications. The determination of the order of an autoregression.G. [26] Murray.H. II. of Basic Engineering 82. [30] Phillips.. [17] Feller. estimation and testing. Introduction to Statistical Time Series. Co–integration and error correction: representation.A. H. and Tewes. 165–193. J.216 BIBLIOGRAPHY [14] Engle.G.P. [27] Newton. E. G. Timeslab: A Time Series Analysis Laboratory. B 41. C. [20] Hamilton. Ann.J. M. 190–195. New York. Ellis Horwood. B. Princeton University Press. Time Series. Econometrica 55. Soc. R.J. and Granger. Math. G. J. and Gijbels. 121–130.C.P.D. Biometrika 75. (1981). Arnold. W. (1994). Foundations of Statistical Analyses and Applications with SAS. (2002). 251–276. (1971). (1960). Statist. and Quinn.R. (1996). Local Polynominal Modeling and Its Application — Theory and Methodologies. Basel. J. M. B.E. 37–39. (1987). and Swift. and Ouliaris. I. . 28. Time Series. E. (1990). (1979). P. L. and Ord.W.J. Sevenoaks. Marohn. Some properties of time series data and their use in econometric model specification. and Box. and Perron.M. F. ed. Econometrics 16. a [16] Fan. Griffin. Time Series Analysis. M.C. On consistent estimates of the spectrum of the stationary time series. 329–348. [19] Granger. Testing for a unit root in time series regression. ASME J. (1978). M. 297–303. Princeton. A drunk and her dog: an illustration of cointegration and error correction. Chapman and Hall. S. (1976). [31] Quenouille. Pacific Grove. (1994). (1993). P. [23] Kalman. 2nd. Trans. [28] Parzen. [25] Ljung. P. New York. W. Econometrica 58. John Wiley.

SAS Procedures Guide. With Applications to Nonparametric Statistics. G. (1974). on Applications of Autocorrelation Analysis to Physical Problems. Springer Series in Statistics. The sampling theory of power spectrum estimates. [39] Tukey. New York. (1949). [37] Simonoff. Springer Series in Statistics. Technical paper no. Unvariate and Multivariate Methods. [35] Shiskin. and Eisenpress. J. New York. J. and Musgrave. Real and Complex Analysis. (1958). N.S. [33] Rudin.C. New York.F. Third Edition. Seasonal adjustment by electronic computer methods. Eine neue Methode f¨r die Sch¨tzung der logistischen Funku a tion. Amer. 47–67. [34] SAS Institute Inc.H. Assoc. J. [36] Shiskin. Metrika 1. J. Statist. Symp. Bureau of the Census. J. Addison–Wesley.BIBLIOGRAPHY [32] Reiss. Version 6. Proc. J. (1967). ed. (1957). A. (1992). Springer. [40] Wallis.. (1974). Springer. Dept.W. Amer. SAS Institute Inc. Approximate Distributions of Order Statistics. J. 2nd. NAVEXOS-P-735. Office of Naval Research. Time Series Analysis. Assoc. Seasonal adjustment and relations between variables.C. New York. (1990).S. 15. of Commerce. The X–11 variant of census method II seasonal adjustment program. 154–157. Young. 415–449.D. [41] Wei. McGraw-Hill. (1989). H. Statist. Smoothing Methods in Statistics. (1966). 217 . W. Cary. U. R. W. 52. 18–31. 69. K. Washington.S. [38] Tintner.

24. 21 X-12 Program. 34 Bandwidth. 52. 80 Hongkong. 213 Population1. 23 C´saro convergence. 53 Covering theorem.Index Autocorrelation -function. 45 Confidence band. 45 Correlogram. 181. see Sunspot Data W¨lfer. 140 Car. 34 Covariance generating function. 80 Hogsuppl. 2. 80 Complex number. 41 Wolf. 40 Public Expenditures. 183 Cram´r–Wold device. 79 regression. 125 Hog. 41 Share. 175 e Data Airline. 43 Unemployed2. iii. 9 Population2. 41 Unemployed1. 195. 43. 13 Nile. 39 Cauchy–Schwarz inequality. 114. 23 Cointegration. 183 Cycle Juglar. 185 Conjugate complex number. 196 Distribution χ2 -. 36 Census U. 210 Star. 128.S. 199 temperatures. 165 Box–Cox Transformation. 136 Sunspot. 80. 124 Electricity. 19. 29 Gas. 21 X-11 Program. 86 exponential. 193 e Critical value. 20 Unemployed Females. 88 Income. 205 . 36. 34 Autocovariance -function. 182 gamma. 35. Bureau of the. 31 Discrete spectral average estimator. 83 Hogprice. see Sunspot Data o Difference seasonal. 185 Bankruptcy. 181. 205 t-.

201 Parzen. 145 Function quadratic. 206 Error correction. 152. 203 Fejer. 165 low-pass. 202. 182 Drunkard’s walk. 113 gain. 95 Hannan-Quinn. 17 Fisher’s kappa statistic. 195 low pass. 202 triangular. 211 Hang Seng closing index. 183 Forecast by an exponential smoother. 11 positive semidefinite. 139 Exponential Smoother. 79. 165 difference. 112 h-step prediction. 32 Fatou’s lemma. 83 Euler’s equation. 113 prediction step of. 97 allometric. 45 Information criterion Akaike’s. 213 function. 210 219 . 16. 13 Kalman filter. 28 exponential smoother. 95 Innovation. 109. 111. 202 truncated. 109 Kernel Bartlett. 17 Intercept. see Power transfer function Gamma function. 86. 165 inverse. 182 Distribution function empirical. 13 Cobb–Douglas. 159 Gain. 155 Henderson moving average. 202 Blackman–Tukey. 47 band pass. 112 recursions. 202 Tukey–Hanning. 139 inverse. 34 Fourier transform. 202 Kolmogorov’s theorem. 153 Imaginary part of a complex number. 6 Mitscherlich. 183 uniform. 13 logistic. 184 Kronecker’s lemma. 95 Bayesian. 112 updating step of. 153 Kolmogorov–Smirnov statistic. 23 Herglotz’s theorem.INDEX Gumbel. 49 Filter absolutely summable. 152 transfer. 86 Input. 167 Gumbel distribution. 88 Helly’s selection theorem. 79 Equivalent degree of freedom. 205 Gibb’s phenomenon. 32 high pass. 55 linear.

97 Moving average. 47 autoregressive moving average. 98 estimate. q). 17 Periodogram. 60 stationary. 153 Spectrum. 105 equation. 80 Bartlett–Kolmogorov–Smirnov. 165 Likelihood function. 168 of ARMA-processes. 158 Observation equation. 80 Durbin–Watson. 105 Test augmented Dickey–Fuller. 182 Output. 136. 80 Volatility. 97 Maximum impregnation. 16 White noise . 184 Box–Ljung. 83 Portmanteau. 159 Principle of parsimony. 199 Least squares. 154 of AR(p)-processes. 99 Slope. 8 Nyquist frequency. 97 Lindeberg condition. 195 North Rhine-Westphalia. 86 GARCH(p. 85 Weights. 8 Maximum likelihood estimator. 100 Dickey–Fuller. 105 Order statistics. 100 Time series seasonally adjusted. 168 distribution function. 80 of Fisher for hidden periodicities. 151. 88. 1 Unit root test. 100 Box–Pierce. 105 representation. 183 of Phillips–Ouliaris for cointegration. 71 Random walk. 78 Real part of a complex number. 184 Bartlett-Kolmogorov-Smirnov. 80 filter design. 210 Spectral density. 88 Loglikelihood function. 193 Linear process. 196 Log returns. 94 Process ARCH(p). 167 of MA(q)-processes. 13 INDEX Slutzky’s lemma. 105 State-space model. 151 State. 45 Residual sum of squares.220 Leakage phenomenon. 88 autoregressive . 182 Power transfer function. 181 cumulated. 19 Time Series Analysis.

INDEX spectral density of a. 65 221 . 156 X-11 Program. see Census Yule–Walker equations. see Census X-12 Program.

167 DOT. 21. 36. 31 GREEK. 39 MERGE. 31 INPUT. 93 GOPTION. 63 GARCH. 21 DELETE. 84. 188 IGOUT. 208 COEF. 93 DO. 167. 25. 8 I=display style. 31 DIST. 63 COMPRESS. 36 DATA. 138 FORMAT. 8 GREEN. 77 CONSTANT. 5. 8. 130 CORR. 36. 5. 5 AXIS. 188 ADDITIVE. 130 MONTHLY. 188 IDENTIFY. 91 DATE. 5. 8 @@. 25 . 36 FREQ . 63 CINV. 21 FREQ. 65. 11. 31 GOUT=. 8. 171 LAG. 36 INTNX. 25. 63. 8 LABEL.SAS-Index | |. 63 LOG. 63 GOPTIONS. 71. 147 Kernel Tukey–Hanning. 36. 71. 5 H=height. 5. 25. 65 MODEL. 8. 25 JOIN. 25 DECOMP. 5 F=font. 63. 38. 188 C=color. 93 IF. 4. 93. 8. 130. 11 MINOR. 204 LEAD=. 188 N . 36. 116 LEGEND. 91. 138 COMPLEX. 21. 8 FIRSTOBS. 5 CGREEK. 138 FTEXT. 204 L=line type. 31 DISPLAY. 5. 25 ANGLE.

31 TUKEY. 188 NLIN. 138 REG. 138. 65. 4 TREPLAY. 138 PHILLIPS. 93 NONE. 21 OUTEST. 8. 93 TC=template catalog. 5. 138. 204 V3 (template). 130 P (periodogram). 204 STATESPACE. 31 TEMPLATE. 84. 31 S (spectral density). 11 PARTCORR. 11 PRINT. 171. 4. 25 QUIT. 5 VAR. 188 T. 5 WEIGHTS. 63. 186 223 . 77. 141. 63 SORT. 91. 21 SHAPE. 201. 31 TITLE. 188 PARAMETERS. 71 PERIOD. 84 PI.SAS-INDEX NDISPLAY. 93. 25. 8. 70. 4. 4 ARIMA. 82 W=width. 31 NOINT. 36 OUT. 6. 21. 11 OUTPUT. 130 SPECTRA. 143. 82 V= display style. 31 WHITETEST. 167 PROC. 8. 63. 84 STEPJ (display style). 31. 186 OUTCOV. 36. 11. 70. 65. 138 STATIONARITY. 31 NLAG=. 201. 138 VREF=display style. 31. 11. 130 PLOT. 201 P 0 (periodogram variable). 143. 188 SUM. 93. 188. 116. 36 NOFS. 201. 21 X11. 147. 204 GREPLAY. 116 TIMESERIES. 71. 93 GPLOT. 5. 171. 36. 204 SEASONALITY. 186. 65 WHITE. 82. 91 MEANS. 167. 91. 188 SYMBOL. 4 RANNOR. 138. 65 NOOBS. 36 OUTDECOMP. 82. 201 WHERE. 141 AUTOREG. 4 ORDER. 36. 188 RUN. 186. 4 NOPRINT. 8. 4. 204 RETAIN. 188 OBS. 138.

224 SAS-INDEX .

2001. Suite 330. royalty-free license. regardless of subject matter or whether it is published as a printed book. Boston. while not being considered responsible for modifications made by others. that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. either commercially or noncommercially. 1. it can be used for any textual work.GNU Free Documentation License Version 1. November 2002 Copyright c 2000. or other functional and useful document ”free” in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it. We have designed this License in order to use it for manuals for free software.2. Preamble The purpose of this License is to make a manual. It complements the GNU General Public License. Such a notice grants a world-wide. The ”Document”. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work. because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. . in any medium. This License is a kind of ”copyleft”. textbook. Secondarily. but changing it is not allowed. 59 Temple Place. this License preserves for the author and publisher a way to get credit for their work. Inc. with or without modifying it. which is a copyleft license designed for free software. which means that derivative works of the document must themselves be free in the same sense. to use that work under the conditions stated herein. We recommend this License principally for works whose purpose is instruction or reference. MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document.2002 Free Software Foundation. unlimited in duration. But this License is not limited to software manuals.

XCF and JPG. A ”Modified Version” of the Document means any work containing the Document or a portion of it. as FrontCover Texts or Back-Cover Texts. A copy made in an otherwise Transparent file format whose markup. if the Document is in part a textbook of mathematics. plus such following pages as are needed to hold. represented in a format whose specification is available to the general public. or with modifications and/or translated into another language. in the notice that says that the Document is released under this License. Examples of transparent image formats include PNG. The ”Invariant Sections” are certain Secondary Sections whose titles are designated. or of legal. or absence of markup. ethical or political position regarding them. has been arranged to thwart or discourage subsequent modification by readers is not Transparent. The ”Title Page” means. for a printed book. the title page itself. If the Document does not identify any Invariant Sections then there are none. refers to any such manual or work. in the notice that says that the Document is released under this License. and the machine-generated HTML. Texinfo input format. You accept the license if you copy. A ”Transparent” copy of the Document means a machine-readable copy. legibly. The ”Cover Texts” are certain short passages of text that are listed. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. A Front-Cover Text may be at most 5 words. The Document may contain zero Invariant Sections. PostScript or PDF produced by some word processors for output purposes only. Any member of the public is a licensee. the material this License requires to . An image format is not Transparent if used for any substantial amount of text. LaTeX input format. modify or distribute the work in a way requiring permission under copyright law. that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor. A ”Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. A copy that is not ”Transparent” is called ”Opaque”. a Secondary Section may not explain any mathematics. either copied verbatim. commercial. Examples of suitable formats for Transparent copies include plain ASCII without markup. as being those of Invariant Sections. philosophical. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors. PostScript or PDF designed for human modification. and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. and standard-conforming simple HTML.) The relationship could be a matter of historical connection with the subject or with related matters. and is addressed as ”you”. SGML or XML for which the DTD and/or processing tools are not generally available. and a Back-Cover Text may be at most 25 words. SGML or XML using a publicly available DTD. (Thus.226 GNU Free Documentation Licence below.

you may accept compensation in exchange for copies. such as ”Acknowledgements”. and continue the rest onto adjacent pages. numbering more than 100. clearly and legibly. (Here XYZ stands for a specific section name mentioned below. under the same conditions stated above. ”Endorsements”. and the Document’s license notice requires Cover Texts. preceding the beginning of the body of the text. . Both covers must also clearly and legibly identify you as the publisher of these copies. ”Dedications”. If the required texts for either cover are too voluminous to fit legibly. A section ”Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language.) To ”Preserve the Title” of such a section when you modify the Document means that it remains a section ”Entitled XYZ” according to this definition. VERBATIM COPYING You may copy and distribute the Document in any medium. you must enclose the copies in covers that carry. Copying with changes limited to the covers. you should put the first ones listed (as many as fit reasonably) on the actual cover. as long as they preserve the title of the Document and satisfy these conditions. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document. You may also lend copies. and the license notice saying this License applies to the Document are reproduced in all copies. all these Cover Texts: Front-Cover Texts on the front cover. If you distribute a large enough number of copies you must also follow the conditions in section 3. the copyright notices. 3. and Back-Cover Texts on the back cover. can be treated as verbatim copying in other respects.GNU Free Documentation Licence appear in the title page. For works in formats which do not have any title page as such. The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. and you may publicly display copies. You may add other material on the covers in addition. However. 227 2. and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. either commercially or noncommercially. or ”History”. ”Title Page” means the text near the most prominent appearance of the work’s title. The front cover must present the full title with all words of the title equally prominent and visible. These Warranty Disclaimers are considered to be included by reference in this License. but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License. provided that this License.

In addition. a license notice giving the public permission to use the Modified Version under the terms of this License. that you contact the authors of the Document well before redistributing any large number of copies. . B. or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using publicstandard network protocols a complete Transparent copy of the Document. you must do these things in the Modified Version: A. immediately after the copyright notices. be listed in the History section of the Document). Use in the Title Page (and on the covers. It is requested. 4. but not required. you must take reasonably prudent steps. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above. Include. you must either include a machine-readable Transparent copy along with each Opaque copy. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. to give them a chance to provide you with an updated version of the Document. to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. if it has fewer than five). one or more persons or entities responsible for authorship of the modifications in the Modified Version. if there were any. in the form shown in the Addendum below. State on the Title page the name of the publisher of the Modified Version. with the Modified Version filling the role of the Document. provided that you release the Modified Version under precisely this License. G. unless they release you from this requirement. D. You may use the same title as a previous version if the original publisher of that version gives permission. together with at least five of the principal authors of the Document (all of its principal authors. free of added material. List on the Title Page. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice.228 GNU Free Documentation Licence If you publish or distribute Opaque copies of the Document numbering more than 100. as the publisher. E. F. when you begin distribution of Opaque copies in quantity. C. If you use the latter option. and from those of previous versions (which should. as authors. Preserve all the copyright notices of the Document. if any) a title distinct from that of the Document.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document. you 229 . Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. create one stating the title. new authors. and a passage of up to 25 words as a Back-Cover Text. or if the original publisher of the version it refers to gives permission. You may add a section Entitled ”Endorsements”. then add an item describing the Modified Version as stated in the previous sentence. Preserve all the Invariant Sections of the Document. If there is no section Entitled ”History” in the Document. Section numbers or the equivalent are not considered part of the section titles. J. L. provided it contains nothing but endorsements of your Modified Version by various parties–for example.GNU Free Documentation Licence H. Preserve the section Entitled ”History”. Preserve its Title. and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. If the Document already includes a cover text for the same cover. For any section Entitled ”Acknowledgements” or ”Dedications”. and add to it an item stating at least the title. year. and publisher of the Modified Version as given on the Title Page. to the end of the list of Cover Texts in the Modified Version. statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may omit a network location for a work that was published at least four years before the Document itself. you may at your option designate some or all of these sections as invariant. You may add a passage of up to five words as a Front-Cover Text. These titles must be distinct from any other section titles. Delete any section Entitled ”Endorsements”. and publisher of the Document as given on its Title Page. These may be placed in the ”History” section. N. Preserve any Warranty Disclaimers. I. year. Preserve the network location. and likewise the network locations given in the Document for previous versions it was based on. previously added by you or by arrangement made by the same entity you are acting on behalf of. O. given in the Document for public access to a Transparent copy of the Document. Do not retitle any existing section to be Entitled ”Endorsements” or to conflict in title with any Invariant Section. Preserve the Title of the section. Include an unaltered copy of this License. authors. Such a section may not be included in the Modified Version. add their titles to the list of Invariant Sections in the Modified Version’s license notice. M. To do this. K. unaltered in their text and in their titles. if any.

provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. 5. on explicit permission from the previous publisher that added the old one. The combined work need only contain one copy of this License. likewise combine any sections Entitled ”Acknowledgements”. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works. 7. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. 6. make the title of each such section unique by adding at the end of it. and follow this License in all other respects regarding verbatim copying of that document. the name of the original author or publisher of that section if known. you must combine any sections Entitled ”History” in the various original documents. forming one section Entitled ”History”. is called an ”aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works . under the terms defined in section 4 above for modified versions. or else a unique number. and distribute it individually under this License. and list them all as Invariant Sections of your combined work in its license notice. If there are multiple Invariant Sections with the same name but different contents. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License. provided that you include in the combination all of the Invariant Sections of all of the original documents.230 GNU Free Documentation Licence may not add another. in or on a volume of a storage or distribution medium. COMBINING DOCUMENTS You may combine the Document with other documents released under this License. and replace the individual copies of this License in the various documents with a single copy that is included in the collection. In the combination. unmodified. You may extract a single document from such a collection. provided you insert a copy of this License into the extracted document. in parentheses. but you may replace the old one. and any sections Entitled ”Dedications”. You must delete all sections Entitled ”Endorsements”. and that you preserve all their Warranty Disclaimers. and multiple identical Invariant Sections may be replaced with a single copy. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

Any other attempt to copy. Replacing Invariant Sections with translations requires special permission from their copyright holders. but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. revised versions of the GNU Free Documentation License from time to time. TERMINATION You may not copy. When the Document is included in an aggregate. from you under this License will not have their licenses terminated so long as such parties remain in full compliance. provided that you also include the original English version of this License and the original versions of those notices and disclaimers. If the Document specifies that a particular numbered version of this License ”or any later version” applies to it. TRANSLATION Translation is considered a kind of modification. so you may distribute translations of the Document under the terms of section 4. 231 8. the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. or distribute the Document except as expressly provided for under this License. 9.GNU Free Documentation Licence permit. or rights. You may include a translation of this License. See http://www. this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document. If a section in the Document is Entitled ”Acknowledgements”. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer. the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new. Such new versions will be similar in spirit to the present version. If the Cover Text requirement of section 3 is applicable to these copies of the Document.gnu. and will automatically terminate your rights under this License. but may differ in detail to address new problems or concerns. the original version will prevail. or the electronic equivalent of covers if the Document is in electronic form. or ”History”. Otherwise they must appear on printed covers that bracket the whole aggregate. modify. ”Dedications”. and any Warranty Disclaimers. parties who have received copies. then if the Document is less than one half of the entire aggregate. you have the option of following the terms and conditions . modify. sublicense or distribute the Document is void. sublicense.org/copyleft/. However. Each version of the License is given a distinguishing version number. and all the license notices in the Document. 10.

Front-Cover Texts and Back-Cover Texts. we recommend releasing these examples in parallel under your choice of free software license. you may choose any version ever published (not as a draft) by the Free Software Foundation. such as the GNU General Public License. Permission is granted to copy. If the Document does not specify a version number of this License.” line with this: with the Invariant Sections being LIST THEIR TITLES. and no Back-Cover Texts..232 GNU Free Documentation Licence either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. replace the ”with. to permit their use in free software. distribute and/or modify this document under the terms of the GNU Free Documentation License. merge those two alternatives to suit the situation. If your document contains nontrivial examples of program code.2 or any later version published by the Free Software Foundation..Texts. . If you have Invariant Sections. with no Invariant Sections. or some other combination of the three. Version 1. and with the Back-Cover Texts being LIST. include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright c YEAR YOUR NAME. A copy of the license is included in the section entitled ”GNU Free Documentation License”. If you have Invariant Sections without Cover Texts. with the Front-Cover Texts being LIST. no Front-Cover Texts. ADDENDUM: How to use this License for your documents To use this License in a document you have written.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.