You are on page 1of 7

STAT 5383 — Lab 1: Exploratory tools for Time Series

analysis
Giovanni Petris
Fall 2006

1 Introduction
Our software of choice is R. R and accompanying manuals are available for free downloading
at http://www.r-project.org. To get started with R I suggest that you read the first few
sections of An Introduction to R. This is one of the manuals that come with the program.
You can access it through the Help menu of R. It is highly recommended that you try all
the examples in R. They will help you learn the concepts, give you a little programming
experience, and give you facility with a very flexible statistical software package. And don’t
just try the examples as written. Vary them a little; play around with them; experiment!
In this Lab we will gain familiarity with R by considering ways of performing descriptive
analysis of time series data. In particular, we will learn the following:

ˆ how to plot the data;

ˆ how to estimate trend and seasonal component;

ˆ how to estimate the autocorrelation function of a stationary time series.

We will also consider simulating observations from a given model.


Before proceeding further, let me mention two commands in R that I think are the most
important of all: they are help.search, and help, which can be abbreviated in ?. You
can use the first one to look for the name of a particular function, or too check if there are
functions that perform specific tasks. For example,

> help.search("time series")

will give you the list of all functions and built-in data sets somehow related to time series
analysis. As a more specific example, if you don’t recall how the function that draws lag
plots is called, try

> help.search("lag plot")

1
Once you discover that it is called lag.plot, you can query R about its usage using the
help function in one of the two equivalent forms:

> help("lag.plot")
> ?lag.plot

The help function can also be used to obtain a description of any of the data sets that
are shipped with R. Use it to see the content of the data sets used in the examples in the
following sections.

2 Plotting the data


R has very good graphical capabilities. Plots can be customized in many different ways. In
its simplest form, plotting a time series is as easy as that:

> plot(AirPassengers)

Plot it and see how it looks like. A log transformation in this case works well to stabilize
the variance of the series. You can plot the transformed data as follows (the resulting plot
is shown in Fig 1):

> plot(log(AirPassengers))
6.5
6.0
log(AirPassengers)

5.5
5.0

1950 1952 1954 1956 1958 1960

Time

Figure 1: Logged Air Passengers data

By default R plots time series with a continuous line. Sometimes you want to identify the
data points on this continuous line. Try plot(log(AirPassengers), type='o') (the 'o' is

2
for overlapping – points and line, that is). If you don’t like the default character used to draw
points, you can change it with the optional argument pch, as in plot(log(AirPassengers), type='o',
A plot that you can draw to look at patterns of dependency between consecutive ob-
servations is a lag plot, in which you plot xt versus xt−1 . This can be generalized to ob-
servations more than one time lag apart. Look at the plot produced by the command
lag.plot(LakeHuron, do.lines=F). You should see that the dots tend to cluster around
a straight line, which implies that xt−1 is a good (linear) predictor for xt .

3 Time series decomposition


In this section we will see how to decompose an observed time series into its trend and
seasonal component, according to the classical decomposition
x = trend + seasonal component + residual.
Two functions that perform the decomposition are stl and decompose. The first one
uses a nonparametric fit of the trend and seasonal component, while the second is based
on moving averages. We will not go into the details of these two functions; rather, we will
consider them as black-box tools for graphical exploratory analysis. decompose deals also
with multiplicative time series without the need of taking logs. To see an example of how
the function works, try example("decompose")1 . Here is an example of the usage of stl:

> plot(stl(log(AirPassengers), s.window = "periodic"))


6.5
6.0
data
5.5
5.0

0.2
0.1
seasonal

0.0
−0.1
−0.2
4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2
trend

0.05
remainder

0.00
−0.05
−0.10

1950 1952 1954 1956 1958 1960

time

Figure 2: Decomposition of a Time Series using stl

Instead of estimating the trend and/or seasonal component, we can difference a series
to make it approximately stationary. The function diff can be used in R for this purpose.
1
Another very useful feature of R’s help system!

3
Consider again the logged AirPassengers time series. By default, diff takes one difference
at lag one. This would remove the trend but leave a seasonal component (try it!). On the
other hand, a lag 12 difference removes both seasonality and trend (why?), as Figure 3 shows.

> plot(diff(log(AirPassengers), lag = 12))


0.3
0.2
diff(log(AirPassengers), lag = 12)

0.1
0.0

1950 1952 1954 1956 1958 1960

Time

Figure 3: Lag-12 difference of log Air Passengers data

Consider now the data set JohnsonJohnson. Make an appropriate transformation and
difference it so as to remove the seasonal component and the trend. You may use the function
frequency to get the sampling frequency of a time series.

Do it now!

4 Linear and nonlinear regression


A linear regression model can be fitted in R using the command lm(y~z), where y and z are
vectors containing the dependent and independent variable, respectively. If the independent
variable is just time, as is the case for time series, one can proceed as shown below.

> m <- lm(log(AirPassengers) ~ time(AirPassengers))


> coefficients(m)

(Intercept) time(AirPassengers)
-230.1878355 0.1205806

4
> plot(log(AirPassengers), type = "o")
> abline(m)

Take a look at time(AirPassengers) and make sure you understand the time unit that
goes with the slope 0.12.
Now take a look at the data set UKgas. Use lm to fit an appropriate curve to the data.
Plot the data and the curve superimposed on the same display. Do not transform the data
to draw the plot. You may want to use the command lines to superimpose the fitted curve.

Do it now!

In addition to linear models, in R you can use many sophisticated nonlinear smoothers
to estimate the trend of a time series.

4.1 Kernel smoothers


The smoothing parameter is the bandwidth of the kernel. Sometimes you have to play a little
with it in order to get pleasing results, i.e. not too smooth and not following the data too
closely. In the following example the default value produces a result that essentially does
not smooth.

> plot(LakeHuron, type = "o")


> lines(ksmooth(time(LakeHuron), LakeHuron, "normal"), col = "green")
> lines(ksmooth(time(LakeHuron), LakeHuron, "normal", 10), col = "blue")

4.2 Smoothing splines


Splines are piecewise polynomial curves. The smoothing parameter in this case is the number
of (equivalent) degrees of freedom. Splines can be used to smooth a time series as follows.

> plot(LakeHuron, type = "o")


> lines(smooth.spline(time(LakeHuron), LakeHuron), col = "green")
> lines(smooth.spline(time(LakeHuron), LakeHuron, df = 6), col = "blue")

As you can see, also in this case the default choice made by R was not very appropriate.

4.3 Other nonparametric smoothers


Without entering into the details, we mention other smoothers tha you can find in R. They
are: lowess, and supsmu. Try them on the Lake Huron data set.

5
5 Simulating time series data
Why would somebody want to simulage a time series? Aren’t there enough real data sets
around? Well, one answer is that simulating from a specific model gives you a feeling for the
typical behavior of observations coming from that model. There are other reasons that have
to do with Monte Carlo studies, the bootstrap, and more.
R has a fairly extensive set of functions that you can use to generate independent random
variables from the most common distributions, like Normal, Gamma, Beta, Poisson, Bino-
mial, etc. For example, to generate a hundred iid N (0, 3) random variables, you just have
to issue the command rnorm(100, sd=sqrt(3)). This gives you a hundred observations
from a Gaussian White Noise with variance 3. To generate time series from ARIMA models,
R provides the function arima.sim The principal arguments it takes are the length of the
series, n, and the model, specified as a list with components ar, ma and, optionally, order.
The following chunk of code generates and plots an AR(1) and an MA(3) process. I omit the
plots, but try it out and see what you get. Try simulating the same process several times to
familiarize with sampling variability.

> ar1 <- arima.sim(model = list(ar = c(0.7)), n = 100)


> plot(ar1, type = "o")
> ma3 <- arima.sim(model = list(ma = c(0.2, 0.4, 0.1)), n = 100)
> plot(ma3, type = "o")

Note that in the first case we are generating the process


xt = 0.7xt−1 + wt ,
while in the second from the process
xt = wt + 0.2wt−1 + 0.4wt−2 + 0.1wt−3 .
Other software packages, and other books, use different conventions about the signs of the
autoregressive and moving average coefficients: keep this in mind!
Now generate 50 observations from the model xt = 0.9xt−1 + wt and 50 observations from
the model xt = −0.9xt−1 + wt . Before plotting the two series, try to guess the features of the
two data sets. Then plot them and check your guess. You can draw two plots on the same
graphical device by giving the command par(mfrow=c(2,1)) before the plot command.

Do it now!

A random walk without drift is defined by the equation


xt = xt−1 + wt .

6
By successive substitutions, one obtains the alternative representation
X
xt = x0 + twk .
k=1

The second representation suggests the following as a possible way of generating a random
walk, setting for simplicity x0 = 0.

> rw <- cumsum(rnorm(100))

Try doing it and plotting the resulting series. Repeat the exercise several times and look
at the different shapes the plot can take. If you want a random walk with drift, follow the
same procedure, but inlude the drift parameter as the mean of the normal random variables.
The following gives a random walk with drift equal 0.5.

> rw <- cumsum(rnorm(100, mean = 0.5))


> plot.ts(rw)
> abline(a = 0, b = 0.5, lty = 3)

You might also like