Professional Documents
Culture Documents
G UIDED R EADING
T IME S ERIES
Supervisor:
Prof. Lajos H ORVATH
Author:
Curtis M ILLER
ARIMA Models
Curtis Miller
November 10, 2015
1 E STIMATION
1.1 AR(2) M ODEL FOR cmort
To estimate the AR(2) process, I first use ordinary least squares (OLS). I then use the YuleWalker estimate. This is shown in the R code below:
# OLS esimate
# demean = T results in looking at cmort - mean(cmort)
# intercept = F sets the intercept to 0
cmort.ar2.ols <- ar.ols(cmort, order = 2, demean = T)
# Yule-Walker estimate
cmort.ar2.yw <- ar.yw(cmort, order = 2, demean = T)
# OLS estimate
cmort.ar2.ols
##
## Call:
## ar.ols(x = cmort, order.max = 2, demean = T)
##
##
##
##
##
##
##
##
Coefficients:
1
2
0.4286 0.4418
Intercept: -0.04672 (0.2527)
Order selected 2
sigma^2 estimated as
32.32
# Yule-Walker estimate
cmort.ar2.yw
##
##
##
##
##
##
##
##
##
Call:
ar.yw.default(x = cmort, order.max = 2, demean = T)
Coefficients:
1
2
0.4339 0.4376
Order selected 2
sigma^2 estimated as
32.84
Looking at the coefficients of the AR(2) model estimated using the two methods, I see very
little difference. OLS and Yule-Walker estimation produce similar results.
1.1.2 S TANDARD E RROR C OMPARISON
Call:
ar.yw.default(x = ar1.sim, order.max = 1)
Coefficients:
1
0.8946
Order selected 1
sigma^2 estimated as
1.144
tsboot(ar1.sim, function(d) {
return(ar.yw(d, order = 1)$ar)
}, R = 2000)
##
##
##
##
##
##
##
##
##
##
## Bootstrap Statistics :
##
original bias
std. error
## t1* 0.8946416
0
0
The bootstrap standard error is zero, while the theoretical standard error is non-zero.
3.5
2.5
1.5
Observed / Fitted
20
40
60
80
100
80
100
80
100
Time
3.5
2.5
1.5
Observed / Fitted
20
40
60
Time
3.5
2.5
1.5
Observed / Fitted
20
40
60
Time
I see disturbing trends in the diagnostic plots shown in Figure 3.1. The residual plot should
look like white noise, but I see the variance decreasing as the year increases. The ACF is
fine, but the Q-Q plot suggests non-normality. Fortunately, the ACF and p-values for LjungBox statistic look as they should be. Still, other models (probably ones that do not assume
Gaussian white noise) may be better.
# Prepare layout
old.par <- par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(4, 1),
cex.axis = .75)
plot(oil, xaxt = 'n'); mtext(text = "Oil Price", side = 2, line = 2,
cex = .75)
plot(log(oil), xaxt = 'n'); mtext(text = "Natural Logarithm of Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(oil), xaxt = 'n'); mtext(text = "First Difference in Oil Price",
side = 2, line = 2, cex = .75)
plot(diff(log(oil))); mtext(text = "Percent Change in Oil Price",
side = 2, line = 2, cex = .75)
The first plot in Figure 3.2 shows that oil prices clearly are not a stationary process, and
it appears that the variance of the process increases with time. Taking the natural log of oil
prices helps control the increasing variability, but not the nonstationary behavior of the series. When looking at the change in oil price from one period to the next, I do see a process
that looks more stationary, but the nonconstant variance is not removed. The final attempt
is to look at the differences in the natural log of oil prices (which can be interpreted as the
percentage change in oil prices). This appears to be stationary and with a mostly constant
variance. However, there are large deviations around 2009, and even prior, that would lead
one to conclude that the white noise is not Gaussian, which threatens estimation and inference.
I now look at the ACF and PACF of log(oilt ):
1 0
Standardized Residuals
1950
1960
1970
1980
1990
2000
Time
3
2
1
3
1 0
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
Theoretical Quantiles
0.6
0.4
0.2
0.0
p value
0.8
1.0
10
15
20
lag
0.2
0.1
0.0
0.1
0.2
15
10
10
15 3.0
3.5
4.0
4.5
5.0 20
40
60
80
100
oil
Oil Price
log(oil)
diff(oil)
diff(log(oil))
Time
Time
Time
2000
2002
2004
2006
2008
2010
Time
120
140
ar3
0.1844
0.0436
ar9
0.0525
ar4
ar5
-0.0713 0.0486
0.0442 0.0444
constant
0.0017
ar6
-0.0715
0.0443
The ninth lag does not appear statistically significant, so I drop the number of lags down to
eight. I now use the following ARIMA model (with diagnostic plots shown):
ar3
ar4
0.1814 -0.0689
0.0436
0.0442
constant
0.0017
0.0026
ar5
0.0448
0.0443
ar6
-0.0621
0.0437
The residuals clearly do not appear to be Gaussian; there are large price movements that
make this assumption doubtful, and the Q-Q plot does not support the Normality assumption. The ACF of the residuals can get large for some distant lags but otherwise are within the
band of reasonable values. The p-values of the Ljung-Box statistics suggest that we do not
have dependence in our residuals for large lags. This may be the best fit an ARIMA model can
provide.
10
1.0
0.8
0.6
0.4
0.2
ACF
Sample ACF
0.00
0.05
Lag
0.10 0.05
Partial ACF
Sample PACF
0.10
0.15
0.0
Series diff(log(oil))
0.0
0.1
0.2
0.3
0.4
0.5
Lag
Figure 3.3: Sample ACF and PACF for percentage change in oil price
11
Standardized Residuals
2000
2002
2004
2006
2008
2010
Time
4
2
0
4
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
0.0
0.1
0.2
0.3
0.4
0.5
0.6
LAG
Theoretical Quantiles
0.6
0.4
0.2
0.0
p value
0.8
1.0
10
12
14
16
18
20
lag
Figure 3.4: Diagnostic plots for the ARIMA(8,1,0) model for the log(oil) series
12
par(old.par)
oil.model <- sarima(sales, p = 0, d = 2, q = 1, details = F)
write(capture.output(oil.model)[8:14], "")
## Coefficients:
##
ma1
##
-0.7480
## s.e.
0.0662
##
## sigma^2 estimated as 1.866:
aic = 517.14
13
200
210
220
230
sales
Sales
diff(sales)
diff(diff(sales))
Time
Time
50
100
150
Time
14
240
250
260
1.0
0.5
0.0
ACF
Sample ACF
0.0
Lag
Partial ACF
Sample PACF
0.1
0.5
Series diff(diff(sales))
10
15
20
Lag
Figure 4.2: Sample ACF and PACF for second order difference in sales
15
3 2 1
Standardized Residuals
50
100
150
Time
10
15
20
2
5
3 2 1
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
Theoretical Quantiles
0.6
0.4
0.2
0.0
p value
0.8
1.0
10
15
20
lag
Figure 4.3: Diagnostic plots for the ARIMA(0,2,1) model for the sales series
16
Looking at the diagnostic plots in Figure 4.3, the ARIMA(0,2,1) seems to fit well. The error
terms appear Gaussian, there are no strong autocorrelations in the residuals, and the error
terms do not appear to be dependent.
4.1.2 R ELATIONSHIP BETWEEN sales AND lead
I examine the CCF of sales and lead and a lag plot of salest and leadt 3 to determine if a
regression involving these variables is reasonable.
ACF PACF
0.59 0.59
0.40 0.09
0.34 0.11
0.31 0.10
0.23 -0.02
0.15 -0.04
0.13 0.03
0.13 0.03
0.01 -0.15
0.02 0.07
0.09 0.10
17
0.2
0.0
0.2
0.4
ACF
0.4
0.6
15
10
10
15
Lag
18
260
lead(t1)
260
lead(t0)
240
250
0.95
200
210
220
230
sales(t)
230
200
210
220
sales(t)
240
250
0.95
10
11
12
13
14
10
11
14
260
210
220
230
240
250
0.94
200
210
220
230
sales(t)
240
250
0.94
200
sales(t)
13
lead(t3)
260
lead(t2)
12
10
11
12
13
14
10
11
12
13
14
19
0.2
ACF
0.2
0.6
Series: resid(fit)
10
15
20
15
20
0.2
PACF
0.2
0.6
LAG
10
LAG
Figure 4.6: Sample ACF and PACF for residuals from linear fit
20
##
##
##
##
##
##
##
##
##
##
##
##
[12,]
[13,]
[14,]
[15,]
[16,]
[17,]
[18,]
[19,]
[20,]
[21,]
[22,]
[23,]
0.01
-0.01
-0.07
-0.07
-0.02
-0.05
-0.03
0.04
0.05
0.02
0.00
-0.01
-0.13
0.03
-0.09
-0.04
0.09
-0.03
0.02
0.11
0.03
-0.07
-0.01
-0.04
Figure 4.6 shows the ACF and the PACF of the residuals of the "nave" fit. The PACF cuts off
at 1 and the ACF trails off, so this appears to be an AR(1) process.
stargazer(arima.fit$fit,
covariate.labels = c("$\\phi$", "Intercept",
"$\\Delta \\text{lead}_{t-3}$"),
dep.var.labels = c("$\\Delta \\text{sales}_t$"),
label = "tab:prob35h",
title = "Coefficients of the model for $\\Delta \\text{sales}_t$",
table.placement = "ht")
Table 4.1 shows the estimates of the coefficients of the model. The AR(1) term () is statistically significant and so is the intercept and the coefficient of the leadt 3 term.
x t = .8x t 12 + w t + .5w t 1
(5.1)
21
3 2 1
Standardized Residuals
50
100
150
Time
10
15
20
1
0
3 2 1
Sample Quantiles
0.2
0.2
0.0
ACF
0.4
ACF of Residuals
LAG
Theoretical Quantiles
0.6
0.4
0.0
0.2
p value
0.8
1.0
10
15
20
lag
Figure 4.7: Diagnostic plots for the model for the sales series with ARMA(1,0) error terms
22
0.645
(0.063)
Intercept
0.362
(0.177)
leadt 3
2.788
(0.143)
Observations
Log Likelihood
2
Akaike Inf. Crit.
146
168.717
0.588
345.433
Note:
23
1.0
0.5
0.5
0.0
ACF
10
12
14
lag
24