Timeseries 3

U NIVERSITY OF U TAH
G UIDED R EADING
T IME S ERIES
Problems from Chapter 3 of Shumway and

Stoffers Book
Supervisor:
Prof. Lajos H ORVATH
Author:
Curtis M ILLER
November 10, 2015
U NIVERSITY OF U TAH D EPARTMENT OF M ATHEMATICS
ARIMA Models
Curtis Miller
November 10, 2015
1 E STIMATION
1.1 AR(2) M ODEL FOR cmort
To estimate the AR(2) process, I first use ordinary least squares (OLS). I then use the YuleWalker estimate. This is shown in the R code below:
# OLS esimate
# demean = T results in looking at cmort - mean(cmort)
# intercept = F sets the intercept to 0
cmort.ar2.ols <- ar.ols(cmort, order = 2, demean = T)
# Yule-Walker estimate
cmort.ar2.yw <- ar.yw(cmort, order = 2, demean = T)
1.1.1 PARAMETER E STIMATE C OMPARISON
# OLS estimate
cmort.ar2.ols
##
## Call:
## ar.ols(x = cmort, order.max = 2, demean = T)
##
##
##
##
##
##
##
##
Coefficients:
1
2
0.4286 0.4418
Intercept: -0.04672 (0.2527)
Order selected 2
sigma^2 estimated as
32.32
# Yule-Walker estimate
cmort.ar2.yw
##
##
##
##
##
##
##
##
##
Call:
ar.yw.default(x = cmort, order.max = 2, demean = T)
Coefficients:
1
2
0.4339 0.4376
Order selected 2
32.84
Looking at the coefficients of the AR(2) model estimated using the two methods, I see very
little difference. OLS and Yule-Walker estimation produce similar results.
1.1.2 S TANDARD E RROR C OMPARISON
# The standard error of the OLS estimates

cmort.ar2.ols$asy.se.coef$ar
## [1] 0.03979433 0.03976163
# The variance matrix of the Yule-Walker estimates
cmort.ar2.yw$asy.var.coef
##
[,1]
[,2]
## [1,] 0.001601043 -0.001235314
## [2,] -0.001235314 0.001601043
# Corresponding standard error of both parameters
sqrt(cmort.ar2.yw$asy.var.coef[1,1])
## [1] 0.04001303
Looking at the above R output, it appears that both models have the same standard error
for the parameters.
1.2 AR(1) S IMULATION AND E STIMATION

First I generate the data:
ar1.sim <- arima.sim(n = 50, list(ar = c(.99), sd = c(1)))

I first estimate the parameter from the simulation using the Yule-Walker estimate.
ar1.sim.yw <- ar.yw(ar1.sim, order = 1)

# Model estimates
ar1.sim.yw
##
##
##
##
##
##
##
##
##
Call:
ar.yw.default(x = ar1.sim, order.max = 1)
Coefficients:
1
0.8946
Order selected 1
1.144
# Model covariance matrix

ar1.sim.yw$asy.var.coef
##
[,1]
## [1,] 0.004158676
Here, I would perform inference on the model by assuming it is Normally distributed. I
would use the covariance matrix listed above for estimating the standard error.
Bootstrap results in R could be done as follows:
tsboot(ar1.sim, function(d) {
return(ar.yw(d, order = 1)$ar)
}, R = 2000)
##
##
##
##
##
##
##
##
##
MODEL BASED BOOTSTRAP FOR TIME SERIES

Call:
tsboot(tseries = ar1.sim, statistic = function(d) {
return(ar.yw(d, order = 1)$ar)
}, R = 2000)
##
## Bootstrap Statistics :
##
original bias
std. error
## t1* 0.8946416
0
0
The bootstrap standard error is zero, while the theoretical standard error is non-zero.
2 I NTEGRATED M ODELS FOR N ONSTATIONARY D ATA

2.1 EWMA M ODEL FOR G LACIAL VARVE D ATA
Here I am interested in the varve dataset. In fact, I am interested in analyzing log(v ar ve),
since I believe this may actually be a stationary process. I will be estimating a EWMA model
for this data.
logvarve <- log(varve)

# EWMA for logvarve with lambda = .25
logvarve.ima.25 <- HoltWinters(logvarve[1:100],
alpha = 1 - .25,
beta = FALSE,
gamma = FALSE)
alpha = 1 - .5,
beta = FALSE,
gamma = FALSE)
alpha = 1 - .75,
beta = FALSE,
gamma = FALSE)
# Plotting results
par(mfrow = c(3,1))
plot(logvarve.ima.25, main = "EWMA Fit with Lambda = .25")
The results are shown in 2.1. With a small smoothing parameter (), the results are very
sensetive to the immediate past, while a high smoothing parameter leads to more stable predictions.
3 B UILDING ARIMA M ODELS
3.5
2.5
1.5
Observed / Fitted
EWMA Fit with Lambda = .25
20
40
60
80
100
80
100
80
100
Time
3.5
2.5
1.5
Observed / Fitted
20
40
60
Time
3.5
2.5
1.5
Observed / Fitted
20
40
60
Time
Figure 2.1: EWMA fit for different smoothing parameters
3.1 AR(1) M ODEL FOR GNP D ATA

Here I am investigating how well an AR(1) (or, more exactly, an ARIMA(1,1,0)) model fits the
natural log of U.S. GNP data. I estimate this ARIMA model.
gnpgr = diff(log(gnp)) # growth rate of GNP

gnp.model <- sarima(gnpgr, 1, 0, 0, details = F)
# AR(1) model fit
I see disturbing trends in the diagnostic plots shown in Figure 3.1. The residual plot should
look like white noise, but I see the variance decreasing as the year increases. The ACF is
fine, but the Q-Q plot suggests non-normality. Fortunately, the ACF and p-values for LjungBox statistic look as they should be. Still, other models (probably ones that do not assume
Gaussian white noise) may be better.
3.2 F ITTING C RUDE O IL P RICES WITH AN ARIMA(p, d , q) M ODEL

My objective is to fit an ARIMA(p, d , q) model for the oil dataset. I start by examining the
data:
# Prepare layout
old.par <- par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(4, 1),
cex.axis = .75)
plot(oil, xaxt = 'n'); mtext(text = "Oil Price", side = 2, line = 2,
cex = .75)
plot(log(oil), xaxt = 'n'); mtext(text = "Natural Logarithm of Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(oil), xaxt = 'n'); mtext(text = "First Difference in Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(log(oil))); mtext(text = "Percent Change in Oil Price",
side = 2, line = 2, cex = .75)
The first plot in Figure 3.2 shows that oil prices clearly are not a stationary process, and
it appears that the variance of the process increases with time. Taking the natural log of oil
prices helps control the increasing variability, but not the nonstationary behavior of the series. When looking at the change in oil price from one period to the next, I do see a process
that looks more stationary, but the nonconstant variance is not removed. The final attempt
is to look at the differences in the natural log of oil prices (which can be interpreted as the
percentage change in oil prices). This appears to be stationary and with a mostly constant
variance. However, there are large deviations around 2009, and even prior, that would lead
one to conclude that the white noise is not Gaussian, which threatens estimation and inference.
I now look at the ACF and PACF of log(oilt ):
1 0
Standardized Residuals
1950
1960
1970
1980
1990
2000
Time
Normal QQ Plot of Std Residuals
3
2
1
3
1 0
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
Theoretical Quantiles
0.6
0.4
0.2
0.0
p value
0.8
1.0
p values for LjungBox statistic
10
15
20
lag
Figure 3.1: Diagnostic plots for the AR(1) model
0.2
0.1
0.0
0.1
0.2
15
10
10
15 3.0
3.5
4.0
4.5
5.0 20
40
60
80
100
oil
Oil Price
log(oil)
Natural Logarithm of Oil Price
diff(oil)
First Difference in Oil Price
diff(log(oil))
Percent Change in Oil Price
Time
Time
Time
2000
2002
2004
2006
2008
2010
Time
Figure 3.2: Basic plots of the oil series
120
140
par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),

cex.axis = .75)
acf(diff(log(oil)), xaxt = 'n'); mtext(text = "Sample ACF",
side = 2, line = 2)
pacf(diff(log(oil))); mtext(text = "Sample PACF", side = 2,
line = 2)
When looking at the sample PACF in Figure 3.3, I see that the PACF is nonzero as far out as
eight lags, which may suggest that p = 8 + 1 = 9, or that we should consider lagging the AR
term out to as far as nine lags.
# noquote(capture.output(), "") used only to make presentation

# easier
write(capture.output(sarima(log(oil),
p = 9, d = 1, q = 0))[32:38],"")
##
## Coefficients:
##
ar1
ar2
##
0.1678 -0.1189
## s.e. 0.0429
0.0432
##
ar7
ar8
##
-0.0158 0.1135
ar3
0.1844
0.0436
ar9
0.0525
ar4
ar5
-0.0713 0.0486
0.0442 0.0444
constant
0.0017
ar6
-0.0715
0.0443
The ninth lag does not appear statistically significant, so I drop the number of lags down to
eight. I now use the following ARIMA model (with diagnostic plots shown):
oil.model <- sarima(log(oil), p = 8, d = 1, q = 0, details = F)

write(capture.output(oil.model)[8:14], "")
## Coefficients:
##
ar1
ar2
##
0.1742 -0.1200
## s.e. 0.0426
0.0433
##
ar7
ar8
##
-0.0218 0.1224
## s.e.
0.0435 0.0428
ar3
ar4
0.1814 -0.0689
0.0436
0.0442
constant
0.0017
0.0026
ar5
0.0448
0.0443
ar6
-0.0621
0.0437
The residuals clearly do not appear to be Gaussian; there are large price movements that
make this assumption doubtful, and the Q-Q plot does not support the Normality assumption. The ACF of the residuals can get large for some distant lags but otherwise are within the
band of reasonable values. The p-values of the Ljung-Box statistics suggest that we do not
have dependence in our residuals for large lags. This may be the best fit an ARIMA model can
provide.
10
1.0
0.8
0.6
0.4
0.2
ACF
Sample ACF
0.00
0.05
Lag
0.10 0.05
Partial ACF
Sample PACF
0.10
0.15
0.0
Series diff(log(oil))
0.0
0.1
0.2
0.3
0.4
0.5
Lag
Figure 3.3: Sample ACF and PACF for percentage change in oil price
11
2000
2002
2004
2006
2008
2010
Time
4
2
0
4
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
0.0
0.1
0.2
0.3
0.4
0.5
0.6
LAG
0.6
0.4
0.2
0.0
p value
0.8
1.0
10
12
14
16
18
20
lag
Figure 3.4: Diagnostic plots for the ARIMA(8,1,0) model for the log(oil) series
12
4 R EGRESSION WITH AUTOCORRELATED E RRORS

4.1 M ONTHLY S ALES D ATA
4.1.1 ARIMA M ODEL F ITTING
The problem first asks for an ARIMA model for the sales data series. I first plot the series.

cex.axis = .75)
plot(sales, xaxt = 'n'); mtext(text = "Sales", side = 2, line = 2,
cex = .75)
plot(diff(sales), xaxt = 'n')
mtext(text = "First Order Difference in Sales",
side = 2, line = 2, cex = .75)
plot(diff(diff(sales))); mtext(text = "Second Order Difference in Sales",
side = 2, line = 2, cex = .75)
Figure 4.1 shows the plots of the sales series. Clearly, salest is not stationary. Surprisingly, neither is salest ; this series shows periodicity. It takes a second-order differencing,
(salest ), to find a stationary series.
I next examine the ACF and PACF functions to try and identify the order of the AR and MA
terms.

cex.axis = .75)
acf(diff(diff(sales)), xaxt = 'n'); mtext(text = "Sample ACF",
side = 2, line = 2)
pacf(diff(diff(sales))); mtext(text = "Sample PACF", side = 2,
line = 2)
As shown in Figure 4.2, the sample ACF cuts off after one lag and the sample PACF appears
to be trailing off, so I believe that an ARIMA(0,2,1) should provide a good fit for the data.
par(old.par)
oil.model <- sarima(sales, p = 0, d = 2, q = 1, details = F)
write(capture.output(oil.model)[8:14], "")
## Coefficients:
##
ma1
##
-0.7480
## s.e.
0.0662
##
## sigma^2 estimated as 1.866:
log likelihood = -256.57,
aic = 517.14
13
200
210
220
230
sales
Sales
diff(sales)
First Order Difference in Sales
diff(diff(sales))
Second Order Difference in Sales
Time
Time
50
100
150
Time
Figure 4.1: Basic plots of the sales series
14
240
250
260
1.0
0.5
0.0
ACF
Sample ACF
0.0
Lag
0.5 0.4 0.3 0.2 0.1
Partial ACF
Sample PACF
0.1
0.5
Series diff(diff(sales))
10
15
20
Lag
Figure 4.2: Sample ACF and PACF for second order difference in sales
15
3 2 1
50
100
150
Time
10
15
20
2
5
3 2 1
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
0.6
0.4
0.2
0.0
p value
0.8
1.0
10
15
20
lag
Figure 4.3: Diagnostic plots for the ARIMA(0,2,1) model for the sales series
16
Looking at the diagnostic plots in Figure 4.3, the ARIMA(0,2,1) seems to fit well. The error
terms appear Gaussian, there are no strong autocorrelations in the residuals, and the error
terms do not appear to be dependent.
4.1.2 R ELATIONSHIP BETWEEN sales AND lead
I examine the CCF of sales and lead and a lag plot of salest and leadt 3 to determine if a
regression involving these variables is reasonable.
ccf(diff(sales), diff(lead), main = "CCF of sales and lead")

As seen in Figure 4.4, while sales and lead are often uncorrelated, around lag 3 they become
highly correlated. This fact is emphasized by a lag plot.
lag2.plot(lead, sales, max.lag = 3)

Figure 4.5 shows a linear relationship with a third lag of lead and contemporary sales. This
would justify regressing salest on leadt 3 .
4.1.3 R EGRESSION WITH ARMA E RRORS
Given that the variable lead seems to provide useful information about sales, I try to regress
sales on lead. More specifically, I try to regress salest on leadt 3 , while viewing the error
term as being some unknown ARMA process.
saleslead <- ts.intersect(diff(sales), lag(diff(lead), k = -3))

salesnew <- saleslead[,1]
leadnew <- saleslead[,2]
fit <- lm(salesnew ~ leadnew)
acf2(resid(fit))
##
## [1,]
## [2,]
## [3,]
## [4,]
## [5,]
## [6,]
## [7,]
## [8,]
## [9,]
## [10,]
## [11,]
ACF PACF
0.59 0.59
0.40 0.09
0.34 0.11
0.31 0.10
0.23 -0.02
0.15 -0.04
0.13 0.03
0.13 0.03
0.01 -0.15
0.02 0.07
0.09 0.10
17
0.2
0.0
0.2
0.4
ACF
0.4
0.6
CCF of sales and lead
15
10
10
15
Lag
Figure 4.4: CCF of sales and lead
18
260
lead(t1)
260
lead(t0)
240
250
0.95
200
210
220
230
sales(t)
230
200
210
220
sales(t)
240
250
0.95
10
11
12
13
14
10
11
14
260
210
220
230
240
250
0.94
200
210
220
230
sales(t)
240
250
0.94
200
sales(t)
13
lead(t3)
260
lead(t2)
12
10
11
12
13
14
10
11
12
13
14
Figure 4.5: Lag plot of sales and lead
19
0.2
ACF
0.2
0.6
Series: resid(fit)
10
15
20
15
20
0.2
PACF
0.2
0.6
LAG
10
LAG
Figure 4.6: Sample ACF and PACF for residuals from linear fit
20
##
##
##
##
##
##
##
##
##
##
##
##
[12,]
[13,]
[14,]
[15,]
[16,]
[17,]
[18,]
[19,]
[20,]
[21,]
[22,]
[23,]
0.01
-0.01
-0.07
-0.07
-0.02
-0.05
-0.03
0.04
0.05
0.02
0.00
-0.01
-0.13
0.03
-0.09
-0.04
0.09
-0.03
0.02
0.11
0.03
-0.07
-0.01
-0.04
Figure 4.6 shows the ACF and the PACF of the residuals of the "nave" fit. The PACF cuts off
at 1 and the ACF trails off, so this appears to be an AR(1) process.
arima.fit <- sarima(salesnew, 1, 0, 0, xreg=cbind(leadnew), details = F)

As shown in Figure 4.7, the diagnostic plots for the process, when interpreting the error
terms as an ARMA(1,0) process, look very good. Normality of the white noise residuals, the
ACF of the white noise residuals, and the tests of dependence all show desirable properties.
stargazer(arima.fit$fit,
covariate.labels = c("$\\phi$", "Intercept",
"$\\Delta \\text{lead}_{t-3}$"),
dep.var.labels = c("$\\Delta \\text{sales}_t$"),
label = "tab:prob35h",
title = "Coefficients of the model for $\\Delta \\text{sales}_t$",
table.placement = "ht")
Table 4.1 shows the estimates of the coefficients of the model. The AR(1) term () is statistically significant and so is the intercept and the coefficient of the leadt 3 term.
5 M ULTIPLICATIVE S EASONAL ARIMA M ODELS

5.1 ACF OF AN ARIMA(p, d , q) (P, D,Q)s M ODEL
The problem asks for a plot of the theoretical ACF of an ARIMA(1, 0, 0)(0, 0, 1)1 2 model, with
= 0.8 and = 0.5. This model is:
x t = .8x t 12 + w t + .5w t 1
(5.1)
The ACF is computed and plotted below:
21
3 2 1
50
100
150
Time
10
15
20
1
0
3 2 1
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
0.6
0.4
0.0
0.2
p value
0.8
1.0
10
15
20
lag
Figure 4.7: Diagnostic plots for the model for the sales series with ARMA(1,0) error terms
22
Table 4.1: Coefficients of the model for salest

Dependent variable:
salest
0.645
(0.063)
Intercept
0.362
(0.177)
leadt 3
2.788
(0.143)
Observations
Log Likelihood
2
Akaike Inf. Crit.
146
168.717
0.588
345.433
Note:
p<0.1; p<0.05; p<0.01
ACF <- ARMAacf(ar = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .8),

ma = c(.5))
plot(ACF, type = "h", xlab = "lag",
xlim = c(1, 15), ylim = c(-.5,1)); abline(h=0)
Figure 5.1 shows the theoretical ACF of the process.
23
1.0
0.5
0.5
0.0
ACF
10
12
14
lag
Figure 5.1: ACF of a seasonal ARIMA process
24

Timeseries 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Timeseries 3

Uploaded by

Copyright:

Available Formats

U NIVERSITY OF U TAH

Problems from Chapter 3 of Shumway and

November 10, 2015

U NIVERSITY OF U TAH D EPARTMENT OF M ATHEMATICS

1.1.1 PARAMETER E STIMATE C OMPARISON

# The standard error of the OLS estimates

1.2 AR(1) S IMULATION AND E STIMATION

ar1.sim <- arima.sim(n = 50, list(ar = c(.99), sd = c(1)))

ar1.sim.yw <- ar.yw(ar1.sim, order = 1)

# Model covariance matrix

MODEL BASED BOOTSTRAP FOR TIME SERIES

2 I NTEGRATED M ODELS FOR N ONSTATIONARY D ATA

logvarve <- log(varve)

3 B UILDING ARIMA M ODELS

EWMA Fit with Lambda = .25

EWMA Fit with Lambda = .5

EWMA Fit with Lambda = .75

Figure 2.1: EWMA fit for different smoothing parameters

3.1 AR(1) M ODEL FOR GNP D ATA

gnpgr = diff(log(gnp)) # growth rate of GNP

# AR(1) model fit

3.2 F ITTING C RUDE O IL P RICES WITH AN ARIMA(p, d , q) M ODEL

Normal QQ Plot of Std Residuals

p values for LjungBox statistic

Figure 3.1: Diagnostic plots for the AR(1) model

Natural Logarithm of Oil Price

First Difference in Oil Price

Percent Change in Oil Price

Figure 3.2: Basic plots of the oil series

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),

# noquote(capture.output(), "") used only to make presentation

oil.model <- sarima(log(oil), p = 8, d = 1, q = 0, details = F)

Normal QQ Plot of Std Residuals

p values for LjungBox statistic

4 R EGRESSION WITH AUTOCORRELATED E RRORS

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(3, 1),

par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),

log likelihood = -256.57,

First Order Difference in Sales

Second Order Difference in Sales

Figure 4.1: Basic plots of the sales series

0.5 0.4 0.3 0.2 0.1

Normal QQ Plot of Std Residuals

p values for LjungBox statistic

ccf(diff(sales), diff(lead), main = "CCF of sales and lead")

lag2.plot(lead, sales, max.lag = 3)

saleslead <- ts.intersect(diff(sales), lag(diff(lead), k = -3))

CCF of sales and lead

Figure 4.4: CCF of sales and lead

Figure 4.5: Lag plot of sales and lead

arima.fit <- sarima(salesnew, 1, 0, 0, xreg=cbind(leadnew), details = F)

5 M ULTIPLICATIVE S EASONAL ARIMA M ODELS

The ACF is computed and plotted below:

Normal QQ Plot of Std Residuals

p values for LjungBox statistic

Table 4.1: Coefficients of the model for salest

p<0.1; p<0.05; p<0.01

ACF <- ARMAacf(ar = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .8),

Figure 5.1: ACF of a seasonal ARIMA process

You might also like