You are on page 1of 24

U NIVERSITY OF U TAH

G UIDED R EADING
T IME S ERIES

Problems from Chapter 3 of Shumway and


Stoffers Book

Supervisor:
Prof. Lajos H ORVATH

Author:
Curtis M ILLER

November 10, 2015

U NIVERSITY OF U TAH D EPARTMENT OF M ATHEMATICS

ARIMA Models

Curtis Miller
November 10, 2015

1 E STIMATION
1.1 AR(2) M ODEL FOR cmort
To estimate the AR(2) process, I first use ordinary least squares (OLS). I then use the YuleWalker estimate. This is shown in the R code below:

# OLS esimate
# demean = T results in looking at cmort - mean(cmort)
# intercept = F sets the intercept to 0
cmort.ar2.ols <- ar.ols(cmort, order = 2, demean = T)
# Yule-Walker estimate
cmort.ar2.yw <- ar.yw(cmort, order = 2, demean = T)

1.1.1 PARAMETER E STIMATE C OMPARISON

# OLS estimate
cmort.ar2.ols
##
## Call:
## ar.ols(x = cmort, order.max = 2, demean = T)
##

##
##
##
##
##
##
##

Coefficients:
1
2
0.4286 0.4418
Intercept: -0.04672 (0.2527)
Order selected 2

sigma^2 estimated as

32.32

# Yule-Walker estimate
cmort.ar2.yw
##
##
##
##
##
##
##
##
##

Call:
ar.yw.default(x = cmort, order.max = 2, demean = T)
Coefficients:
1
2
0.4339 0.4376
Order selected 2

sigma^2 estimated as

32.84

Looking at the coefficients of the AR(2) model estimated using the two methods, I see very
little difference. OLS and Yule-Walker estimation produce similar results.
1.1.2 S TANDARD E RROR C OMPARISON

# The standard error of the OLS estimates


cmort.ar2.ols$asy.se.coef$ar
## [1] 0.03979433 0.03976163
# The variance matrix of the Yule-Walker estimates
cmort.ar2.yw$asy.var.coef
##
[,1]
[,2]
## [1,] 0.001601043 -0.001235314
## [2,] -0.001235314 0.001601043
# Corresponding standard error of both parameters
sqrt(cmort.ar2.yw$asy.var.coef[1,1])
## [1] 0.04001303
Looking at the above R output, it appears that both models have the same standard error
for the parameters.

1.2 AR(1) S IMULATION AND E STIMATION


First I generate the data:

ar1.sim <- arima.sim(n = 50, list(ar = c(.99), sd = c(1)))


I first estimate the parameter from the simulation using the Yule-Walker estimate.

ar1.sim.yw <- ar.yw(ar1.sim, order = 1)


# Model estimates
ar1.sim.yw
##
##
##
##
##
##
##
##
##

Call:
ar.yw.default(x = ar1.sim, order.max = 1)
Coefficients:
1
0.8946
Order selected 1

sigma^2 estimated as

1.144

# Model covariance matrix


ar1.sim.yw$asy.var.coef
##
[,1]
## [1,] 0.004158676
Here, I would perform inference on the model by assuming it is Normally distributed. I
would use the covariance matrix listed above for estimating the standard error.
Bootstrap results in R could be done as follows:

tsboot(ar1.sim, function(d) {
return(ar.yw(d, order = 1)$ar)
}, R = 2000)
##
##
##
##
##
##
##
##
##

MODEL BASED BOOTSTRAP FOR TIME SERIES


Call:
tsboot(tseries = ar1.sim, statistic = function(d) {
return(ar.yw(d, order = 1)$ar)
}, R = 2000)

##
## Bootstrap Statistics :
##
original bias
std. error
## t1* 0.8946416
0
0
The bootstrap standard error is zero, while the theoretical standard error is non-zero.

2 I NTEGRATED M ODELS FOR N ONSTATIONARY D ATA


2.1 EWMA M ODEL FOR G LACIAL VARVE D ATA
Here I am interested in the varve dataset. In fact, I am interested in analyzing log(v ar ve),
since I believe this may actually be a stationary process. I will be estimating a EWMA model
for this data.

logvarve <- log(varve)


# EWMA for logvarve with lambda = .25
logvarve.ima.25 <- HoltWinters(logvarve[1:100],
alpha = 1 - .25,
beta = FALSE,
gamma = FALSE)
logvarve.ima.5 <- HoltWinters(logvarve[1:100],
alpha = 1 - .5,
beta = FALSE,
gamma = FALSE)
logvarve.ima.75 <- HoltWinters(logvarve[1:100],
alpha = 1 - .75,
beta = FALSE,
gamma = FALSE)
# Plotting results
par(mfrow = c(3,1))
plot(logvarve.ima.25, main = "EWMA Fit with Lambda = .25")
plot(logvarve.ima.5, main = "EWMA Fit with Lambda = .5")
plot(logvarve.ima.75, main = "EWMA Fit with Lambda = .75")
The results are shown in 2.1. With a small smoothing parameter (), the results are very
sensetive to the immediate past, while a high smoothing parameter leads to more stable predictions.

3 B UILDING ARIMA M ODELS

3.5
2.5
1.5

Observed / Fitted

EWMA Fit with Lambda = .25

20

40

60

80

100

80

100

80

100

Time

3.5
2.5
1.5

Observed / Fitted

EWMA Fit with Lambda = .5

20

40

60
Time

3.5
2.5
1.5

Observed / Fitted

EWMA Fit with Lambda = .75

20

40

60
Time

Figure 2.1: EWMA fit for different smoothing parameters

3.1 AR(1) M ODEL FOR GNP D ATA


Here I am investigating how well an AR(1) (or, more exactly, an ARIMA(1,1,0)) model fits the
natural log of U.S. GNP data. I estimate this ARIMA model.

gnpgr = diff(log(gnp)) # growth rate of GNP


gnp.model <- sarima(gnpgr, 1, 0, 0, details = F)

# AR(1) model fit

I see disturbing trends in the diagnostic plots shown in Figure 3.1. The residual plot should
look like white noise, but I see the variance decreasing as the year increases. The ACF is
fine, but the Q-Q plot suggests non-normality. Fortunately, the ACF and p-values for LjungBox statistic look as they should be. Still, other models (probably ones that do not assume
Gaussian white noise) may be better.

3.2 F ITTING C RUDE O IL P RICES WITH AN ARIMA(p, d , q) M ODEL


My objective is to fit an ARIMA(p, d , q) model for the oil dataset. I start by examining the
data:

# Prepare layout
old.par <- par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(4, 1),
cex.axis = .75)
plot(oil, xaxt = 'n'); mtext(text = "Oil Price", side = 2, line = 2,
cex = .75)
plot(log(oil), xaxt = 'n'); mtext(text = "Natural Logarithm of Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(oil), xaxt = 'n'); mtext(text = "First Difference in Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(log(oil))); mtext(text = "Percent Change in Oil Price",
side = 2, line = 2, cex = .75)
The first plot in Figure 3.2 shows that oil prices clearly are not a stationary process, and
it appears that the variance of the process increases with time. Taking the natural log of oil
prices helps control the increasing variability, but not the nonstationary behavior of the series. When looking at the change in oil price from one period to the next, I do see a process
that looks more stationary, but the nonconstant variance is not removed. The final attempt
is to look at the differences in the natural log of oil prices (which can be interpreted as the
percentage change in oil prices). This appears to be stationary and with a mostly constant
variance. However, there are large deviations around 2009, and even prior, that would lead
one to conclude that the white noise is not Gaussian, which threatens estimation and inference.
I now look at the ACF and PACF of log(oilt ):

1 0

Standardized Residuals

1950

1960

1970

1980

1990

2000

Time

Normal QQ Plot of Std Residuals

3
2
1
3

1 0

Sample Quantiles

0.2
0.2

0.0

ACF

0.4

ACF of Residuals

LAG

Theoretical Quantiles

0.6
0.4
0.2
0.0

p value

0.8

1.0

p values for LjungBox statistic

10

15

20

lag

Figure 3.1: Diagnostic plots for the AR(1) model

0.2

0.1

0.0

0.1

0.2

15

10

10

15 3.0

3.5

4.0

4.5

5.0 20

40

60

80

100

oil

Oil Price

log(oil)

Natural Logarithm of Oil Price

diff(oil)

First Difference in Oil Price

diff(log(oil))

Percent Change in Oil Price

Time

Time

Time

2000

2002

2004

2006

2008

2010

Time

Figure 3.2: Basic plots of the oil series

120

140

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),


cex.axis = .75)
acf(diff(log(oil)), xaxt = 'n'); mtext(text = "Sample ACF",
side = 2, line = 2)
pacf(diff(log(oil))); mtext(text = "Sample PACF", side = 2,
line = 2)
When looking at the sample PACF in Figure 3.3, I see that the PACF is nonzero as far out as
eight lags, which may suggest that p = 8 + 1 = 9, or that we should consider lagging the AR
term out to as far as nine lags.

# noquote(capture.output(), "") used only to make presentation


# easier
write(capture.output(sarima(log(oil),
p = 9, d = 1, q = 0))[32:38],"")
##
## Coefficients:
##
ar1
ar2
##
0.1678 -0.1189
## s.e. 0.0429
0.0432
##
ar7
ar8
##
-0.0158 0.1135

ar3
0.1844
0.0436
ar9
0.0525

ar4
ar5
-0.0713 0.0486
0.0442 0.0444
constant
0.0017

ar6
-0.0715
0.0443

The ninth lag does not appear statistically significant, so I drop the number of lags down to
eight. I now use the following ARIMA model (with diagnostic plots shown):

oil.model <- sarima(log(oil), p = 8, d = 1, q = 0, details = F)


write(capture.output(oil.model)[8:14], "")
## Coefficients:
##
ar1
ar2
##
0.1742 -0.1200
## s.e. 0.0426
0.0433
##
ar7
ar8
##
-0.0218 0.1224
## s.e.
0.0435 0.0428

ar3
ar4
0.1814 -0.0689
0.0436
0.0442
constant
0.0017
0.0026

ar5
0.0448
0.0443

ar6
-0.0621
0.0437

The residuals clearly do not appear to be Gaussian; there are large price movements that
make this assumption doubtful, and the Q-Q plot does not support the Normality assumption. The ACF of the residuals can get large for some distant lags but otherwise are within the
band of reasonable values. The p-values of the Ljung-Box statistics suggest that we do not
have dependence in our residuals for large lags. This may be the best fit an ARIMA model can
provide.

10

1.0
0.8
0.6
0.4
0.2

ACF
Sample ACF

0.00

0.05

Lag

0.10 0.05

Partial ACF
Sample PACF

0.10

0.15

0.0

Series diff(log(oil))

0.0

0.1

0.2

0.3

0.4

0.5

Lag
Figure 3.3: Sample ACF and PACF for percentage change in oil price

11

Standardized Residuals

2000

2002

2004

2006

2008

2010

Time

Normal QQ Plot of Std Residuals

4
2
0
4

Sample Quantiles

0.2
0.2

0.0

ACF

0.4

ACF of Residuals

0.0

0.1

0.2

0.3

0.4

0.5

0.6

LAG

Theoretical Quantiles

0.6
0.4
0.2
0.0

p value

0.8

1.0

p values for LjungBox statistic

10

12

14

16

18

20

lag

Figure 3.4: Diagnostic plots for the ARIMA(8,1,0) model for the log(oil) series

12

4 R EGRESSION WITH AUTOCORRELATED E RRORS


4.1 M ONTHLY S ALES D ATA
4.1.1 ARIMA M ODEL F ITTING
The problem first asks for an ARIMA model for the sales data series. I first plot the series.

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(3, 1),


cex.axis = .75)
plot(sales, xaxt = 'n'); mtext(text = "Sales", side = 2, line = 2,
cex = .75)
plot(diff(sales), xaxt = 'n')
mtext(text = "First Order Difference in Sales",
side = 2, line = 2, cex = .75)
plot(diff(diff(sales))); mtext(text = "Second Order Difference in Sales",
side = 2, line = 2, cex = .75)
Figure 4.1 shows the plots of the sales series. Clearly, salest is not stationary. Surprisingly, neither is salest ; this series shows periodicity. It takes a second-order differencing,
(salest ), to find a stationary series.
I next examine the ACF and PACF functions to try and identify the order of the AR and MA
terms.

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),


cex.axis = .75)
acf(diff(diff(sales)), xaxt = 'n'); mtext(text = "Sample ACF",
side = 2, line = 2)
pacf(diff(diff(sales))); mtext(text = "Sample PACF", side = 2,
line = 2)
As shown in Figure 4.2, the sample ACF cuts off after one lag and the sample PACF appears
to be trailing off, so I believe that an ARIMA(0,2,1) should provide a good fit for the data.

par(old.par)
oil.model <- sarima(sales, p = 0, d = 2, q = 1, details = F)
write(capture.output(oil.model)[8:14], "")
## Coefficients:
##
ma1
##
-0.7480
## s.e.
0.0662
##
## sigma^2 estimated as 1.866:

log likelihood = -256.57,

aic = 517.14

13

200

210

220

230

sales

Sales

diff(sales)

First Order Difference in Sales

diff(diff(sales))

Second Order Difference in Sales

Time

Time

50

100

150

Time

Figure 4.1: Basic plots of the sales series

14

240

250

260

1.0
0.5
0.0

ACF
Sample ACF

0.0

Lag

0.5 0.4 0.3 0.2 0.1

Partial ACF
Sample PACF

0.1

0.5

Series diff(diff(sales))

10

15

20

Lag
Figure 4.2: Sample ACF and PACF for second order difference in sales

15

3 2 1

Standardized Residuals

50

100

150

Time

Normal QQ Plot of Std Residuals

10

15

20

2
5

3 2 1

Sample Quantiles

0.2
0.2

0.0

ACF

0.4

ACF of Residuals

LAG

Theoretical Quantiles

0.6
0.4
0.2
0.0

p value

0.8

1.0

p values for LjungBox statistic

10

15

20

lag

Figure 4.3: Diagnostic plots for the ARIMA(0,2,1) model for the sales series

16

Looking at the diagnostic plots in Figure 4.3, the ARIMA(0,2,1) seems to fit well. The error
terms appear Gaussian, there are no strong autocorrelations in the residuals, and the error
terms do not appear to be dependent.
4.1.2 R ELATIONSHIP BETWEEN sales AND lead
I examine the CCF of sales and lead and a lag plot of salest and leadt 3 to determine if a
regression involving these variables is reasonable.

ccf(diff(sales), diff(lead), main = "CCF of sales and lead")


As seen in Figure 4.4, while sales and lead are often uncorrelated, around lag 3 they become
highly correlated. This fact is emphasized by a lag plot.

lag2.plot(lead, sales, max.lag = 3)


Figure 4.5 shows a linear relationship with a third lag of lead and contemporary sales. This
would justify regressing salest on leadt 3 .
4.1.3 R EGRESSION WITH ARMA E RRORS
Given that the variable lead seems to provide useful information about sales, I try to regress
sales on lead. More specifically, I try to regress salest on leadt 3 , while viewing the error
term as being some unknown ARMA process.

saleslead <- ts.intersect(diff(sales), lag(diff(lead), k = -3))


salesnew <- saleslead[,1]
leadnew <- saleslead[,2]
fit <- lm(salesnew ~ leadnew)
acf2(resid(fit))
##
## [1,]
## [2,]
## [3,]
## [4,]
## [5,]
## [6,]
## [7,]
## [8,]
## [9,]
## [10,]
## [11,]

ACF PACF
0.59 0.59
0.40 0.09
0.34 0.11
0.31 0.10
0.23 -0.02
0.15 -0.04
0.13 0.03
0.13 0.03
0.01 -0.15
0.02 0.07
0.09 0.10

17

0.2
0.0
0.2
0.4

ACF

0.4

0.6

CCF of sales and lead

15

10

10

15

Lag

Figure 4.4: CCF of sales and lead

18

260

lead(t1)

260

lead(t0)

240

250

0.95

200

210

220

230

sales(t)

230
200

210

220

sales(t)

240

250

0.95

10

11

12

13

14

10

11

14

260
210

220

230

240

250

0.94

200

210

220

230

sales(t)

240

250

0.94

200

sales(t)

13

lead(t3)

260

lead(t2)

12

10

11

12

13

14

10

11

12

13

14

Figure 4.5: Lag plot of sales and lead

19

0.2

ACF
0.2

0.6

Series: resid(fit)

10

15

20

15

20

0.2

PACF
0.2

0.6

LAG

10
LAG

Figure 4.6: Sample ACF and PACF for residuals from linear fit

20

##
##
##
##
##
##
##
##
##
##
##
##

[12,]
[13,]
[14,]
[15,]
[16,]
[17,]
[18,]
[19,]
[20,]
[21,]
[22,]
[23,]

0.01
-0.01
-0.07
-0.07
-0.02
-0.05
-0.03
0.04
0.05
0.02
0.00
-0.01

-0.13
0.03
-0.09
-0.04
0.09
-0.03
0.02
0.11
0.03
-0.07
-0.01
-0.04

Figure 4.6 shows the ACF and the PACF of the residuals of the "nave" fit. The PACF cuts off
at 1 and the ACF trails off, so this appears to be an AR(1) process.

arima.fit <- sarima(salesnew, 1, 0, 0, xreg=cbind(leadnew), details = F)


As shown in Figure 4.7, the diagnostic plots for the process, when interpreting the error
terms as an ARMA(1,0) process, look very good. Normality of the white noise residuals, the
ACF of the white noise residuals, and the tests of dependence all show desirable properties.

stargazer(arima.fit$fit,
covariate.labels = c("$\\phi$", "Intercept",
"$\\Delta \\text{lead}_{t-3}$"),
dep.var.labels = c("$\\Delta \\text{sales}_t$"),
label = "tab:prob35h",
title = "Coefficients of the model for $\\Delta \\text{sales}_t$",
table.placement = "ht")
Table 4.1 shows the estimates of the coefficients of the model. The AR(1) term () is statistically significant and so is the intercept and the coefficient of the leadt 3 term.

5 M ULTIPLICATIVE S EASONAL ARIMA M ODELS


5.1 ACF OF AN ARIMA(p, d , q) (P, D,Q)s M ODEL
The problem asks for a plot of the theoretical ACF of an ARIMA(1, 0, 0)(0, 0, 1)1 2 model, with
= 0.8 and = 0.5. This model is:

x t = .8x t 12 + w t + .5w t 1

(5.1)

The ACF is computed and plotted below:

21

3 2 1

Standardized Residuals

50

100

150

Time

Normal QQ Plot of Std Residuals

10

15

20

1
0
3 2 1

Sample Quantiles

0.2
0.2

0.0

ACF

0.4

ACF of Residuals

LAG

Theoretical Quantiles

0.6
0.4
0.0

0.2

p value

0.8

1.0

p values for LjungBox statistic

10

15

20

lag

Figure 4.7: Diagnostic plots for the model for the sales series with ARMA(1,0) error terms

22

Table 4.1: Coefficients of the model for salest


Dependent variable:
salest

0.645
(0.063)

Intercept

0.362
(0.177)

leadt 3

2.788
(0.143)

Observations
Log Likelihood
2
Akaike Inf. Crit.

146
168.717
0.588
345.433

Note:

p<0.1; p<0.05; p<0.01

ACF <- ARMAacf(ar = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .8),


ma = c(.5))
plot(ACF, type = "h", xlab = "lag",
xlim = c(1, 15), ylim = c(-.5,1)); abline(h=0)
Figure 5.1 shows the theoretical ACF of the process.

23

1.0
0.5
0.5

0.0

ACF

10

12

14

lag

Figure 5.1: ACF of a seasonal ARIMA process

24

You might also like