You are on page 1of 37

3/1/24, 10:23 AM 3 ARIMA Models - 3.

1 Autoregressive Moving Average Models

3 ARIMA Models - 3.1 Autoregressive Moving


Average Models
Aaron Smith
2022-11-30

This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway, David S. Stoffer https://github.com/nickpoison/tsa4
(https://github.com/nickpoison/tsa4)

The most recent version of the package can be found at https://github.com/nickpoison/astsa/ (https://github.com/nickpoison/astsa/)

You can find demonstrations of astsa capabilities at https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md


(https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md)

In addition, the News and ChangeLog files are at https://github.com/nickpoison/astsa/blob/master/NEWS.md


(https://github.com/nickpoison/astsa/blob/master/NEWS.md).

The webpages for the texts and some help on using R for time series analysis can be found at https://nickpoison.github.io/
(https://nickpoison.github.io/).

UCF students can download it for free through the library.

Classic regression only allows the dependent variable to be influenced by current values of the independent variables.

In the time series we allow the dependent variable to be influenced by

the past values of the independent variables and


its own past values.

3.1.1 Introduction to Autoregressive Models


Autoregressive models are based on the idea that the current value of the series, x t, can be explained as a function of p past values,
x t − 1, x t − 2, . . . , x t − p, where p determines the number of steps into the past needed to forecast the current value.

Consider the case

x t = x t − 1 − 0.90x t − 2 + w tw t is Gaussian white noise with variance σ 2 = 1

N <- 5
n <- 500
M <- as.data.frame(
x = matrix(
data = NA,
nrow = n + 2,
ncol = N
)
)
for(j in 1:N){
v <- rep(NA,n+2)
v[1:2] <- runif(
n = 2,
min = -10,
max = 10
)
for(k in 1:n){
v[k+2] <- v[k+1] - 0.9*v[k] + rnorm(1)
}
M[,j] <- v
}
M$time <- (-1):n
gather_M <- tidyr::gather(
data = M[1:20,],
key = "time_series",
value = "value",
-time
)
library(ggplot2)
ggplot(gather_M) +
aes(x = time,y = value,color = time_series) +
geom_line() +
theme_bw() +
theme(legend.position = "none")

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 1/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

We have now assumed the current value is a particular linear function of past values.

If regularity persists, then forecasting for such a model might be a possibility.

x nn + 1 = x n − 0.9x n − 1x nn + 1 is the forecasted value using n observed data points

The feasiblity of such a model can be assessed using autocorrelation and lagged scatter plot matrices.

astsa::acf1(
series = M$V1,
max.lag = 10
)

## [1] 0.51 -0.39 -0.87 -0.52 0.28 0.77 0.52 -0.18 -0.66 -0.51

astsa::lag1.plot(
series = M$V1,
max.lag = 4
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 2/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

astsa::scatter.hist(
x = M$V1,
y = c(
M$V1[-1],M$V1[nrow(M)]
)
)

The lagged scatterplot matrix for the Southern Oscillation Index (SOI), indicates that lags 1 and 2 are linearly associated with the current value.

data(
list = c(
"soi","rec"
),
package = "astsa"
)
astsa::lag1.plot(
series = soi,
max.lag = 12
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 3/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

The ACF shows relatively large positive values at lags 1, 2, 12, 24, and 36 and large negative values at 18, 30, and 42.

astsa::acf1(
series = soi,
max.lag = 36
)

## [1] 0.60 0.37 0.21 0.05 -0.11 -0.19 -0.18 -0.10 0.05 0.22 0.36 0.41
## [13] 0.31 0.10 -0.06 -0.17 -0.29 -0.37 -0.32 -0.19 -0.04 0.15 0.31 0.35
## [25] 0.25 0.10 -0.03 -0.16 -0.28 -0.37 -0.32 -0.16 -0.02 0.17 0.33 0.39

We note also the possible relation between the SOI and Recruitment series indicated in the scatterplot matrix.

astsa::lag2.plot(
series1 = soi,
series2 = rec,
max.lag = 8
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 4/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Definition 3.1 autoregressive model of order p


An autoregressive model of order p, abbreviated AR(p), is of the form

x t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w tx t is stationaryw t is white noise with expected value of zero and constant varianceϕ j are constant coefficientsϕ p ≠ 0E(x t) = 0

If E(x t) ≠ 0, then we can modify the model to get back to an expected value of zero

(x t − μ) = ϕ 1(x t − 1 − μ) + ϕ 2(x t − 2 − μ) + . . . + ϕ p(x t − p − μ) + w tx t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w t + αα = μ(1 − ϕ 1 − . . . − ϕ p)

Some equations are easier to work with using the backshift operator.

(B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)x t = w t

Definition 3.2 The autoregressive operator


The autoregressive operator is defined to be

ϕ(B) = (B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)

Combining this notation with an autoregressive model, we get

ϕ(B)x t = w t

Example 3.1 The AR(1) Model


Let’s look at a first-order model, AR(1), given by

x t = ϕx t − 1 + w t

Iterating backwards k times, we get

x t = ϕx t − 1 + w t
= ϕ 2x t − 2 + ϕw t − 1 + w t

k−1

= ϕ kx t − k + ∑ ϕ jw t − j
j=0

Going backwards all the way gives

t−1

x t = ϕ tx 0 + ∑ ϕ jw t − j
j=0

This naturally leads to using infinite series, to make the math work out we accept the assumptions that | ϕ | < 1, sup t variance(x t) < ∞, under these
assumptions

xt = ∑ ϕ jw t − j
j=0

Theorem A.1 Mean Square Convergence of a Sequence


Let {x t} be a sequence in L 2, then ∃ a x ∈ L 2 s.t.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 5/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
ms
xn → x ⟺ lim sup n ≥ mE x n − x m
m→∞
( )2 = 0

Proposition: Convergence of an AR(1) model


When x t is an AR(1) process,

x t = ϕx t − 1 + w twith | ϕ | < 1, sup t variance(x t) < ∞

ms
∞ ∞
then x n → ∑ j = 0ϕ jw t − j and ∑ j = 0ϕ jw t − j ∈ L 2

Proof:
By theorem A.1 when need to show that

lim sup n ≥ mE x n − x m
m→∞
( )2 = 0
Without loss of generality, say that n = m + d, d ≥ 0.

([ ][ ])
m+d−1 m−1 2
(x m + d − x m ) 2 = ϕ m + dx 0 +
j=0
∑ ϕ j w m + d − j − ϕ mx 0 + ∑ ϕ jw m − j
j=0

( )
m+d−1 2
= (ϕ d − 1)ϕ mx 0 + ∑ ϕ jw m + d − j
j=m

m+d−1 m + d − 1m + d − 1
2
= (ϕ d − 1) 2ϕ 2mx 0 + 2(ϕ d − 1)ϕ mx 0 ∑ ϕ jw m + d − j + ∑ ∑ ϕ j + kw m + d − jw m + d − k
j=m j=m k=m

Take the expected value.

m+d−1 m + d − 1m + d − 1

(
E xm + d − xm ) 2 = (ϕd − 1)2ϕ2mx20 + 2(ϕd − 1)ϕmx0j =∑m ϕ jE(w m + d − j) +
j=m

k=m
∑ ϕ j + kE(w m + d − jw m + d − k)

m+d−1
= (ϕ d − 1) 2ϕ 2mx 0 +
2
∑ 2
ϕ 2jE(w m + d − j)
j=m
m+d−1
2
= (ϕ d − 1) 2ϕ 2mx 0 + σ w
2
∑ ϕ 2j
j=m

ϕ 2m − ϕ 2m + 2d + 2
= (ϕ d − 1) 2ϕ 2mx 20 + σ 2w
1−ϕ

[ ]
2
σw
2
= ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ

Let’s take the supremum with respect to d ≥ 0.

[ ]
2
σw
(
sup d ≥ 0E x m + d − x m ) 2 2
= sup d ≥ 0ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ

σ 2w
Consider the series defined by the brackets with d as the index. Since | ϕ | < 1, the term in the bracket converges to x 20 + as d → ∞. This
1−ϕ
2
σw
means that for any given ϵ > 0, there is a finite number of terms that are greater than x 20 + + ϵ. Since there is a finite number of terms, the
1−ϕ
supremum is achieved as a maximum and it is finite.

(
sup d ≥ 0E x m + d − x m ) 2
[ 2
= ϕ 2mmax d ≥ 0 (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
σ 2w

] (
lim sup d ≥ 0E x m + d − x m
m→∞
)2 = 0
This shows that the AR(1) process converges. The formula for x t that uses the summation with all past w t gives the limit of the series.

Proposition: Convergence of an AR(1) model

( )
k−1 2
lim k → ∞E x t − ∑ ϕ jw t − j
j=0
( )
= lim k → ∞ϕ 2kE x 2t − k = 0

Combining the equations before the proof we get

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 6/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
∞ ∞ ∞
x t = ϕx t − 1 + w tx t = ∑ ϕ jw t − j ∑ ϕ jw t − j = ϕ ∑ ϕ jw t − j − 1 + w t
j=0 j=0 j=0

This gives an expected value of zero.

xt = ∑ ϕ jw t − j
j=0

( )

E(x t) = E ∑ ϕ jw t − j
j=0

= ∑ ϕ jE ( w t − j )
j=0
=0

Autocorrelation function (the computation here is different from the book):

γ(h) = cov(x t + h, x t)

[( )( )]
∞ ∞
=E ∑ ϕ jw t + h − j ∑ ϕ kw t − k
j=0 k=0

[( )( )]
∞ ∞
=E ∑ ϕ jw t − ( j − h ) ∑ ϕ kw t − k
j=0 k=0

[( )( )]
∞ ∞
=E ∑ ϕ j + hw t − j ∑ ϕ kw t − k
j= −h k=0

[ ]
∞ ∞
=E ∑ ∑ ϕ j + hϕ k w t − j w t − k
j = − hk = 0

∞ ∞

= ∑ ∑ ϕ j + k + hE [w t − jw t − k ] (expected value is zero when t − j ≠ t − k)


j = − hk = 0

= ∑ ϕ 2k + hE [w 2t − k ]
k=0

= σ wϕ h ∑ ϕ 2k ( | ϕ | < 1)
2

k=0

2
1
= σ wϕ h
1 − ϕ2

Recall that

γ(h) = γ( − h)

The autocorrelation of our AR(1) model is

γ(h)
ρ(h) =
γ(h)
2
1
σ wϕ h
1 − ϕ2
=
1
σ 2wϕ 0
1 − ϕ2
= ϕh

Recursive formula for autocorrelation:

ρ(h) = ϕρ(h − 1)

Example 3.2 The Sample Path of an AR(1) Process


2
The shows a time plot of two AR(1) processes with σ w = 1,

one with ϕ = 0.9, ρ(h) = 0.9 h and one with ϕ = − 0.9, ρ(h) = ( − 0.9) h.

h ≥ 0.

In the first case observations close together in time are positively correlated with each other. This result means that observations at contiguous
time points will tend to be close in value to each other.

This fact shows up in the first figure as a very smooth sample path for x t.

Contrast this with the case in which ϕ = − 0.9, so that ρ(h) = ( − 0.9) h, for h ≥ 0. This result means that observations at contiguous time points are
negatively correlated but observations two time points apart are positively correlated.

This fact shows up in the second figure, where, for example, if an observation, x t, is positive, the next observation, x t + 1, is typically negative, and
the next observation, x t + 2, is typically positive. Thus, in this case, the sample path is very choppy.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 7/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

#par(mfrow=c(2,1))
# in the expressions below, ~ is a space and == is equal
astsa::tsplot(
x = astsa::sarima.sim(
ar = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==+0.9
)
)
abline(
h = 0,
col = "red"
)

astsa::tsplot(
x = astsa::sarima.sim(
ar = -0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==-0.9
)
)
abline(
h = 0,
col = "red"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 8/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Example 3.3 Explosive AR Models and Causality


Random walks are not stationary;

xt = xt − 1 + wt

is not stationary.

Consider an AR(1) model with | ϕ | > 1. Such processes are called explosive because the values of the time series quickly become large in
magnitude.

Clearly, because | ϕ | j increases without bound as j → ∞.

k−1
∑ ϕ jw t − j
j=0

will not converge as k → ∞.

Let’s reverse the recursive equation:

k−1
x t + 1 = ϕx t + w t + 1x t = ϕ − 1x t + 1 − ϕ − 1w t + 1x t = ϕ − 1(ϕ − 1x t + 2 − ϕ − 1w t + 2) − ϕ − 1w t + 1 ⋮ x t = ϕ − kx t + k − ∑ ϕ − jw t + j
j=1

Since | ϕ − 1 | < 1, the AR(1) model is stationary future dependent. Unfortunately this model is useless.

Example 3.4 Every Explosion Has a Cause


Excluding explosive models from consideration is not a problem because the models have causal counterparts.

For example, if

2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)

then {x t} is non-causal stationary Gaussian process with

( )
∞ ∞
ϕ −2
E(x t) = 0γ x(h) = cov(x t + h, x t) = cov − ∑ ϕ − jw t + h + j, − ∑ ϕ − jw t + j 2
= σ wϕ − h
j=1 j=1 1 − ϕ −2

Let

2
y t = ϕ − 1y t − 1 + v tv t ∼ Normal(0, σ wϕ − 2)

x t and y t are stochastically the same, all finite distributions of the processes are the same.

Example:

If

2
x t = 2x t − 1 + w tσ w = 1

then

1 2 1
yt = y t − 1 + v tσ v =
2 4

is an equivalent causal process.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 9/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Iterating backwards
To iterate backwards, let’s invoke the backwards operator.

As a first step consider the AR(1) model.


2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)x t = ∑ ϕ jw t − jϕ(B)x t = w tϕ(B) = B 0 − ϕB
j=0

Let’s define an operator for the stochastic term.

∞ ∞

xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j
j=0 j=0

Putting the two operators together we see that

ϕ(B)ψ(B)w t = w t

The coefficients on the left must match the coefficients on the right

(B 0 − ϕB)(B 0 + ψ 1B 1 + ψ 2B 2 + … + ψ jB j + …) = B 0B 0 + (ψ 1 − ϕ)B + (ψ 2 − ψ 1ϕ)B 2 + … + (ψ j − ψ j − 1ϕ)B j + … = B 0

Matching the coefficients we see that

ψ 0 = 1ψ 1 = ϕψ j = ψ j − 1ϕ

Leading to the solution

ψj = ϕj

Another approach is to invoke an inverse, ϕ − 1(B).

ϕ(B)x t = w tϕ − 1(B)ϕ(B)x t = ϕ − 1(B)w tx t = ϕ − 1(B)w t

Thus we see that

ϕ − 1(B) = ψ(B)

Consider the polynomial function and its rational function inverse

1
ϕ(z) = 1 − ϕz, | z | < 1ϕ(z) − 1 = = 1 + ϕz + ϕ 2z 2 + …
1 − ϕz

We will use similar polynomial to backshift operator techniques when we discuss ARMA models.

3.1.2 Introduction to Moving Average Models


Another way to model a time series is to assume that the time series is a linear combination of white noise.

Definition 3.3 The moving average model


The moving average model of order q, or MA(q) model, is defined to be

2
x t = w t + θ 1w t − 1 + θ 2w t − 2 + θ 3w t − 3 + … + θ qw t − qw t ∼ whitenoise(0, σ w)θ j are constant parametersθ q ≠ 0

Note: Some software and texts write the moving average model coefficients with negative coefficients. Check the help documentation before using.

The moving average model with backshift notation


x t = θ(B)w t

Definition 3.4 The moving average operator is


The moving average operator is

θ(B) = B 0 + θ 1B 1 + θ 2B 2 + … + θ qB q

Note that a moving average model is stationary.

Example 3.5 The MA(1) Process


Consider the MA(1) model

x t = w t + θw t − 1

then

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 10/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

{ {
2 1 h=0
(1 + θ 2)σ w h=0
θ
E(x t) = 0γ(h) = θσ 2w h = 1 ρ(h) = h=1
1 + θ2
0 h>1 0 h>1

Some quick derivatives show that

1
| ρ(1) | ≤ and the bounds are achieved for θ = ± 1
2

1
Also notice that ρ(h) are the same if we have the coefficient of θ or
θ

1
θ θ
=
θ2 + 1
1+ () 1
θ
2

Replacing
1
θ with , and
θ
σ w with θ 2σ 2w:
2

{ ( ) 1 2 2
1+ (θ 2σ w) = (θ 2 + 1)σ w h=0
θ2
γ(h) =
1 2 2
(θ 2σ w) = θσ w h=1
θ
0 h>1

MA(1) have zero correlation for two or greater backshits, while AR(1) never have zero correlation.

Notice how much smoother the MA(1) model with θ = 0.9 is than θ = − 0.9.

#par(mfrow=c(2,1))
astsa::tsplot(
x = astsa::sarima.sim(
ma = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
MA(1)~~~theta==+0.9
)
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 11/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

astsa::tsplot(
x = astsa::sarima.sim(
ma = -0.9,
n = 100
),
col = 4,
ylab = "",
main=expression(
MA(1)~~~theta==-0.9
)
)

Example 3.6 Non-uniqueness of MA Models and Invertibility


x t = w t + θw t − 1

These two MA(1) models have the same autocorrelation, the same autocovariance, and are stochastically the same.

2
1 1
σ w = 25, θ = , xt = wt + w t − 1, w t ∼ iid normal(0, 25)
5 5
2
σw = 1, θ = 5, y t = v t + 5v t − 1, v t ∼ iid normal(0, 1)

{
26 h=0
γ(h) = 5 h=1
0 h>1

If we observed one of these processes, we would not be able to mathematically tell which one we were looking at. When we have to select a
model, we prefer to use a model that is invertible. Choose the model with | θ | < 1

x t = w t + θw t − 1w t = x t − θw t − 1w t = x t − θ(x t − 1 − θw t − 2) = x t − θx t − 1 + θ 2w t − 2w t = x t − θx t − 1 + θ 2(x t − 2 − θw t − 3) = x t − θx t − 1 + θ 2x t − 2 − θ 3w t − 3 ⋮ w t = ( − θ) k + 1w t − k − 1 +

If | ϕ | < 1, using w t = x t − θw t − 1 and iterating backwards we get


wt = ∑ ( − θ) jx t − j
j=0

For our example we would choose


2
σ w = 25θ =
1
y = v t + 5v t − 1v t ∼ iid normal(0, 1)v t =
5 t

j=0
( )
−1
5
j
yt − j

Series/polynomial tools for analyzing MA(1) models


x t = θ(B)w tθ(B) = B 0 + θB

If | θ | < 1, then

π(B)x t = w tπ(B) = θ − 1(B)

Let

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 12/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
∞ ∞
1
θ(z) = 1 + θzIf | θ | < 1, π(z) = θ − 1(z) =
1 + θz
= ∑ ( − θ) jz jπ(B) = ∑ ( − θ) jB j
j=0 j=0

3.1.3 Autoregressive Moving Average Models


Definition 3.5 ARMA(p,q)

A time series {x t | t ∈ Z} is ARMA(p, q) if it is stationary and

p q
xt = ∑ ϕ jx t − j + w t + ∑ θ kw t − kϕ p ≠ 0, θ q ≠ 0w t ∼ wn(0, σ 2w)σ 2w > 0
j=1 k=1

If the time series has a non-zero expected value, we adjust the model to get zero expected value.

( )
p p q

α=μ 1− ∑ ϕj xt = α + ∑ ϕ jx t − j + w t + ∑ θ kw t − k
j=1 j=1 k=1

If p = 0, then the model is a moving average model.


If q = 0, then the model is an autoregressive model

Let’s move all the autoregressive terms to the left hand side of the equation.

p q
xt − ∑ ϕ j x t − j = w t + ∑ θ k w t − k x t − ϕ 1x t − 1 − ϕ 2x t − 2 − … − ϕ px t − p = w t + θ 1w t − 1 + θ 2w t − 2 + … + θ qw t − q
j=1 k=1

Invoking the backshift operator, we can write this equation as

ϕ(B)x t = θ(B)w t

This presentation illuminates a potential pit-fall while modeling. If ϕ(B)x t = θ(B)w t is the correct model, but we mistakenly multiply both sides of the
equation by another operator on B, η(B), then we get a mathematically correct equation that will lead to over-parameterization.

η(B)ϕ(B)x t = η(B)θ(B)w t

Example 3.7 Parameter Redundancy


Consider the white noise process ARMA(0,0)

xt = wt

1
Say that while fitting out model, we make the error of multiplying both sides of the equation by η(B) = (B 0 − B).
2

1 1 1 1 1 1
(B 0 − B)x t = (B 0 − B)w tx t − xt − 1 = wt − w t − 1x t = xt − 1 + wt − wt − 1
2 2 2 2 2 2

The correct model is ARMA(0,0), but we go with an ARMA(1,1) model. x t is white noise, but we missed that fact.

Notice that the R code gives a statistically significant incorrect model.

The intercept is estimating the mean.

set.seed(
seed = 823
)
rnorm_5 = rnorm(
n = 100,
mean = 5
) # generate iid N(5,1)s
arima(
x = rnorm_5,
order = c(
1,0,1
) # since the observations are random noise, 0,0,0 is the correct order
)

##
## Call:
## arima(x = rnorm_5, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.7567 0.8308 4.9066
## s.e. 0.2621 0.2264 0.1028
##
## sigma^2 estimated as 0.9723: log likelihood = -140.51, aic = 289.02

Our over-parameterized model is

(B 0 + 0.76B)x t = (B 0 + 0.26)w t

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 13/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Three major problems with ARMA(p,q) models


parameter redundant models (over-parameterized)
stationary autoregressive models that depend of the future
moving average models that are not unique

Definition 3.6 AR and MA polynomials


The AR and MA polynomials are defined as

ϕ(z) = 1 − ϕ 1z − ϕ 2z 2 − … − ϕ pz p, ϕ p ≠ 0θ(z) = 1 + θ 1z + θ 2z 2 + … + θ qz q, θ p ≠ 0z ∈ C

To protect us from parameter redundant models, we will require that ϕ(z) and θ(z) do not have a common factor. This will help protect from
incorrectly multiplying the correct model by an extraneous operator.

Definition 3.7 Causal ARMA(p,q)


An ARMA(p, q) model is said to be causal, if the time series {x t | t ∈ Z} can be written as a one-sided linear process:

∞ ∞ ∞
xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j ∑ | ψ j | < ∞ψ 0 = 1
j=0 j=0 j=0

Example
The AR(1) process

x t = ϕx t − 1 + w t

is causal when | ϕ | < 1 or equivalently the root of ϕ(z) = 1 − ϕz is greater than one in magnitude.

Property 3.1 Causality of an ARMA(p, q) Process


An ARMA(p,q) model is causal if and only if ϕ(z) ≠ 0 ∀ | z | ≤ 1. The coefficients of the linear process can be determined by solving


θ(z)
ψ(z) = ∑ ψ jz j = ϕ(z)
|z| < 1
j=0

Another way to phrase this property is that an ARMA process is causal only when the roots of ϕ(z) lie outside the unit circle; that is, ϕ(z) = 0 only
when | z | > 1.

Finally, to address the problem of uniqueness, we choose the model that allows an infinite autoregressive representation.

Definition 3.8 Invertible ARMA(p,q)


An ARMA(p, q) model is said to be invertible, if the time series {x t | t ∈ Z} can be written as

∞ ∞ ∞
π(B)x t = ∑ π jx t − j = w tπ(B) = ∑ π jB j ∑ | π j | < ∞π 0 = 1
j=0 j=0 j=0

Property 3.2 Invertibility of an ARMA(p, q) Process


An ARMA(p,q) model is invertible if and only if θ(z) ≠ 0 ∀ | z | ≤ 1. The coefficients π j of π(B) can be determined by solving


ϕ(z)
π(z) = ∑ π jz j = θ(z)
|z| < 1
j=0

an ARMA process is invertible only when the roots of θ(z) lie outside the unit circle; that is, θ(z) = 0 only when | z | > 1.

Example 3.8 Parameter Redundancy, Causality, Invertibility


x t = 0.4x t − 1 + 0.45x t − 2 + w t + w t − 1 + 0.25w t − 2x t − 0.4x t − 1 − 0.45x t − 2 = w t + w t − 1 + 0.25w t − 2(B 0 − 0.4B − 0.45B 2)x t = (B 0 + B + 0.25B 2)w tϕ(z) = 1 − 0.4z − 0.45z 2θ(z) = 1

This looks like an ARMA(2,2) model, but

ϕ(z) = 1 − 0.4z − 0.45z 2 = (1 + 0.5z)(1 − 0.9z)θ(z) = 1 + z + 0.25z 2 = (1 + 0.5z) 2

Reducing the model, we get an ARMA(1,1) model.

x t = 0.9x t − 1 + 0.5w t − 1 + w t(1 − 0.9B)x t = (1 + 0.5z)w tϕ(z) = 1 − 0.9zθ(z) = 1 + 0.5z

Let’s find ψ(z) s.t.

∞ ∞ ∞ ∞ ∞ ∞ ∞
θ(z)
ψ(z) == , | z | < 1ϕ(z)ψ(z) = θ(z)(1 − 0.9z) ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j − 0.9z ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ jz j + 1 = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ j − 1z j = 1 + 0.5zψ 0 +
ϕ(z)
j=0 j=0 j=0 j=0 j=0 j=0 j=1

This gives us these equations:

ψ 0 = 1ψ 1 − 0.9ψ 0 = 0.5ψ j − 0.9ψ j − 1 = 0 ∀j > 1

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 14/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
∞ ∞
14
ψ 0 = 1ψ j = 1.4 ∗ 0.9 j − 1 ∀j > 0ψ(z) = 1 + ∑ 1.4 ∗ 0.9 j − 1z jx t = w t + ∑ 9
∗ 0.9 jw t − j
j=1 j=1

Let’s compute the ψ(z) coefficients manually.

c(
1,
(14/9)*(0.9^{1:10})
)

## [1] 1.0000000 1.4000000 1.2600000 1.1340000 1.0206000 0.9185400 0.8266860


## [8] 0.7440174 0.6696157 0.6026541 0.5423887

Let’s use R to find the ψ(z) coefficients. Notice that it omits ψ 0

ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 psi-weights

## [1] 1.4000000 1.2600000 1.1340000 1.0206000 0.9185400 0.8266860 0.7440174


## [8] 0.6696157 0.6026541 0.5423887

Now, let’s find π(z) s.t.

∞ ∞ ∞ ∞ ∞ ∞
ϕ(z)
ϕ(z) = 1 − 0.9zθ(z) = 1 + 0.5zπ(z) = ∑ π jz j = θ(z)
, | z | < 1θ(z)π(z) = ϕ(z)(1 + 0.5z) ∑ π jz j = 1 − 0.9z ∑ π jz j + ∑ 0.5π jz j + 1 = 1 − 0.9z ∑ π jz j + ∑ 0.5π j − 1z j = 1 − 0.9zπ 0 + ∑
j=0 j=0 j=0 j=0 j=0 j=1 j

This leads to the equations:

π 0 = 1π 1 + 0.5π 0 = − 0.9π j + 0.5π j − 1 = 0 ∀j > 1

Solving gives these results. Notice how the exponents directly match the subscript (different from textbook).

π 0 = 1π j = ( − 1.4) ∗ ( − 0.5) j − 1 = − 2.8 ∗ ( − 0.5) jw t = 1 + ∑ 2.8 ∗ ( − 0.5) jx t − j


j=1

Let’s manually compute the coefficients, notice that the coefficients get cut in half as the index increases

c(
1,
2.8*((-1/2)^(1:10))
)

## [1] 1.000000000 -1.400000000 0.700000000 -0.350000000 0.175000000


## [6] -0.087500000 0.043750000 -0.021875000 0.010937500 -0.005468750
## [11] 0.002734375

Let’s use the astsa package to compute the weights of π(z)

astsa::ARMAtoAR(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 pi-weights

## [1] -1.400000000 0.700000000 -0.350000000 0.175000000 -0.087500000


## [6] 0.043750000 -0.021875000 0.010937500 -0.005468750 0.002734375

Example 3.9 Causal Conditions for an AR(2) Process


For an AR(1) model,

(1 − ϕB)x t = w t

to be causal, the root of

ϕ(z) = 1 − ϕz

must lie outside of the unit circle. In this case,

ϕ(z) = 0 when z = 1 / ϕ | ϕ | < 1

It is not so easy to establish this relationship for higher order models.

For example, the AR(2) model,

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 15/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
(1 − ϕ 1B − ϕ 2B 2)x t = w t

is causal when both of the two roots are outside the unit circle.

| |
2 2

2
− ϕ1 ±
√ϕ 1
+ 4ϕ 2 − ϕ 1 ±
√ϕ 1
+ 4ϕ 2
1 1 −1
ϕ(z) = 1 − ϕ 1z − ϕ 2z z roots = > 1ϕ 1 = + ϕ2 = ϕ 1 + ϕ 2 < 1ϕ 2 − ϕ 1 < 1 | ϕ 2 | < 1
2ϕ 2 2ϕ 2 z+ z− z +z −

ϕ(z) = 1 − ϕ 1z − ϕ 2z 2
−1 −1
= (1 − z + z)(1 − z − z)

(
= 1−
− ϕ1 +
2ϕ 2


2
ϕ1 + 4ϕ 2
z
)( 1−
− ϕ1 −
2ϕ 2

√ϕ
2
1 + 4ϕ 2
z
)
M <- expand.grid(
phi_1 = seq(
from = -3,
to = 3,
length = 100
),
phi_2 = seq(
from = -2,
to = 1,
length = 100
),
root_p = complex(
real = NA,
imaginary = NA
),
root_m = complex(
real = NA,
imaginary = NA
)
)
M <- M[M$phi_2 != 0,]
M$discriminant = M$phi_1^2 + 4*M$phi_2
M$root_p[M$discriminant >= 0] <- complex(
real = (-M$phi_1 + sqrt(M$phi_1^2 + 4*M$phi_2))/(2*M$phi_2),
imaginary = 0
)[M$discriminant >= 0]

## Warning in sqrt(M$phi_1^2 + 4 * M$phi_2): NaNs produced

M$root_m[M$discriminant >= 0] <- complex(


real = (-M$phi_1 - sqrt(M$phi_1^2 + 4*M$phi_2))/(2*M$phi_2),
imaginary = 0
)[M$discriminant >= 0]

## Warning in sqrt(M$phi_1^2 + 4 * M$phi_2): NaNs produced

M$root_p[M$discriminant < 0] <- complex(


real = -M$phi_1/(2*M$phi_2),
imaginary = sqrt(-M$phi_1^2 - 4*M$phi_2)/(2*M$phi_2)
)[M$discriminant < 0]

## Warning in sqrt(-M$phi_1^2 - 4 * M$phi_2): NaNs produced

M$root_m[M$discriminant < 0] <- complex(


real = -M$phi_1/(2*M$phi_2),
imaginary = -sqrt(-M$phi_1^2 - 4*M$phi_2)/(2*M$phi_2)
)[M$discriminant < 0]

## Warning in sqrt(-M$phi_1^2 - 4 * M$phi_2): NaNs produced

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 16/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

M$Mod_root_p <- Mod(


z = M$root_p
)
M$Mod_root_m <- Mod(
z = M$root_m
)
M$min_Mod_root <- apply(
X = M[,c("Mod_root_m","Mod_root_p")],
MARGIN = 1,
FUN = min
)
M$roots[M$discriminant >= 0 & M$min_Mod_root > 1] <- "real roots, outside unit circle"
M$roots[M$discriminant >= 0 & M$min_Mod_root <= 1] <- "real roots, inside unit circle"
M$roots[M$discriminant < 0 & M$min_Mod_root > 1] <- "complex roots, outside unit circle"
M$roots[M$discriminant < 0 & M$min_Mod_root <= 1] <- "complex roots, inside unit circle"
library(ggplot2)
ggplot(M) + aes(x = phi_1,y = phi_2,color = roots) +
geom_point() +
coord_fixed() +
geom_hline(yintercept = 0)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 17/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

# this is how Figure 3.3 was generated


seg1 = seq(
from = 0,
to = 2,
by = 0.1
)
seg2 = seq(
from = -2,
to = 2,
by = 0.1
)
name1 = expression(
phi[1]
)
name2 = expression(
phi[2]
)
astsa::tsplot(
x = seg1,
y = 1-seg1,
ylim = c(-1,1),
xlim = c(-2,2),
ylab = name2,
xlab = name1,
main = 'Causal Region of an AR(2)'
)
lines(
x = -seg1,
y = 1-seg1,
ylim = c(-1,1),
xlim = c(-2,2)
)
abline(
h = 0,
v = 0,
lty = 2,
col = 8
)
lines(
x = seg2,
y = -(seg2^2/4),
ylim = c(-1,1)
)
lines(
x = c(-2,2),
y = c(-1,-1),
ylim = c(-1,1)
)
text(
x = 0,
y = .35,
labels = 'real roots'
)
text(
x = 0,
y = -0.5,
labels = 'complex roots'
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 18/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

3.2 Difference Equations


The study of ARMA processes and their autocorrelation functions is greatly enhanced by a basic knowledge of difference equations, because they
are difference equations.

Suppose {u n} is a time-series such that,

u n − αu n − 1 = 0α ≠ 0n ∈ N

To solve the equation

u 1 = αu 0
u 2 = αu 1 = α 2u 0

u 3 = αu 2 = α 3u 0

u n = α nu 0

Operator notation:

(B 0 − αB)u n = 0

Associated polynomial:

1 −1
α(z) = 1 − α(z)z 0 = , α = z0
α

If we know the initial condition u 0 = c, then

−1
u n = (z 0 ) nc

The solution to the difference equation depends on

the initial condition


the root of the associated polynomial

Autocorrelation of AR(1)
Here is an example of such a sequence:

The autocorrelation function of an AR(1) model x n = ϕx n − 1 + w n is

ρ(h) − ϕρ(h − 1) = 0

degree 2 difference equation


u n − α 1u n − 1 − α 2u n − 2 = 0, α 2 ≠ 0, n = 2, 3, 4, …

∞ ∞ ∞ ∞ ∞
u n − α 1u n − 1 − α 2u n − 2 = 0u nx n − α 1u n − 1z n − α 2u n − 2z n = 0u nz n − α 1zu n − 1z n − 1 − α 2z 2u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u n − 1z n − 1 − α 2z 2 ∑ u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u nz
n=2 n=2 n=2 n=2 n=1

Set


U(z) = ∑ u nz n
n=0

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 19/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
∞ ∞ ∞
∑ u nz n − α 1z ∑ u nz n − α 2z 2 ∑ u nz n = 0 (U(z) − u 0 − u 1z ) − α 1z (U(z) − u 0 ) − α 2z 2U(z) = 0U(z) − u 0 − u 1z − α 1zU(z) + α 1zu 0 − α 2z 2U(z) = 0(1 − α 1z − α 2z 2)U(z) − u 0 − u 1z
n=2 n=1 n=0

Now let’s take the partial fraction decomposition of the right hand side.

If the associated quadratic has two distinct roots, z 1 ≠ z 2, then for some constants c 1, c 2:

∞ ∞ ∞ ∞ ∞ ∞
c1 c2
1 − α 1z − α 2z 2 = (1 − z 1− 1z)(1 − z 2− 1z), z 1 ≠ z 2U(z) = −1
+ −1
∑ u nz n = c 1 ∑ (z 1− 1z) n + c 2 ∑ (z 2− 1z) n, | z 1− 1z | < 1, | z 2− 1z | < 1 ∑ u nz n = c 1 ∑ z 1− nz n + c 2 ∑ z
1 − z1 z 1 − z2 z n = 0 n=0 n=0 n=0 n=0 n=0

If the associated quadratic is square with root z 0, then we get the following equations. Notice that undetermined constants which are multiplied
together are merged into c 2.

∞ ∞ ∞ ∞
c1 c2 d 1
−1
1 − α 1z − α 2z 2 = (1 − z 0 z) 2U(z) = + ∑ u nz n = c 1 ∑ (z 0− 1z) n + c 2 dz (c 2 is merged with constant from antiderivative) ∑ u nz n = c 1 ∑ z 0 z n +
−n
1 − z 0− 1z (1 − z 0− 1z) 2
n=0 n=0 1 − z 0− 1z n=0 n=0

Check that the solutions solve the original equation


−n −n
u n = c 1z 1 + c 2z 2
u n − α 1u n − 1 − α 2u n − 2 = 0

c 1z 1− n + c 2z 2− n − α 1(c 1z 1− n + 1 + c 2z 2− n + 1) − α 2(c 1z 1− n + 2 + c 2z 2− n + 2) = 0
−n −n −n+1 −n+1 −n+2 −n+2
c 1z 1 + c 2z 2 − α 1c 1z 1 − α 1c 2z 2 − α 2c 1z 1 − α 2c 2z 2 =0
−n −n+1 −n+2 −n −n+1 −n+2
c 1z 1 − α 1c 1z 1 − α 2c 1z 1 + c 2z 2 − α 1c 2z 2 − α 2c 2z 2 =0
−n 2 −n 2
c 1z 1 (1 − α 1z 1 − α 2z 1) + c 2z 2 (1 − α 1z 2 − α 2z 2) =0
−n −n
c 1z 1 (0) + c 2z 2 (0) =0

When the associated polynomial is square, our solution is:

−n − (n+1)
u n = c 1z 0 + c 2(n + 1)z 0

Factoring the associated polynomial gives:

−1 −1 −2 −1 −2 −1 −2
u n − α 1u n − 1 − α 2u n − 2 = 01 − α 1z − α 2z 2 = (1 − z 0 z) 2 = 1 − 2z 0 z + z 0 z 2 − α 1 = − 2z 0 − α 2 = z 0 u n − 2z 0 u n − 1 + z 0 u n − 2 = 0

Plugging our solution into the difference equations with factored coefficients shows our solution is correct.

−n −n+1 −1 −n+1 −1 −n −2 −n+2 −2 −n+1 −n −n −n −n−1 −n−1 −n−1 −n−


c 1z 0 + c 2(n + 1)z 0 − 2z 0 c 1z 0 − 2z 0 c 2nz 0 + z 0 c 1z 0 + z 0 c 2(n − 1)z 0 = 0(c 1z 0 − 2c 1z 0 + c 1z 0 ) + (c 2nz 0 − 2c 2nz 0 + c 2nz 0 ) + (c 2z 0

Example 3.10 The ACF of an AR(2) Process


Suppose we have a causal AR(2) process, multiply both sides of the equation by x t − h, take expected values, then divide by γ(0).

(

x t = ϕ 1x t − 1 + ϕ 2x t − 2 + w tx tx t − h = ϕ 1x t − 1x t − h + ϕ 2x t − 2x t − h + w tx t − hE(x tx t − h) = ϕ 1E(x t − 1x t − h) + ϕ 2E(x t − 2x t − h) + E(w tx t − h)γ(h) = ϕ 1γ(h − 1) + ϕ 2γ(h − 2) + E w t ∑ ψ jw t − h


j=0

This gives us a difference equation with an associated polynomial.

ρ(h) − ϕ 1ρ(h − 1) − ϕ 2ρ(h − 2) = 0ϕ(z) = 1 − ϕ 1z − ϕ 2z 2

Let’s take the initial conditions.

ϕ1
ρ(0) = 1ρ(1) = ρ( − 1) =
1 − ϕ2

When the roots of the associated polynomial are distinct and real:

−h −h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + (1 − c 1)z 2

When the associated polynomial is square:

−1 −h − (h+1)
ρ(h) = (1 − c 2z 0 )z 0 + c 2(h + 1)z 0

When the roots are complex, the associated polynomial has real coefficients, so the roots are conjugate. To get real ρ(h), the constants will need to
be conjugate.

¯ ¯ −h
−h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + c 1z 1

Write the roots in polar coordinates:

¯
z 1 = | z 1 | e θiz 1 = | z 1 | e − θi

¯
− h − hθi − h hθi −h
ρ(h) = c 1 | z 1 | e + c1 | z1 | e ρ(h) = a | z 1 | cos(hθ + b)

a, and b are constants determined by initial conditions.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 20/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Example 3.11 An AR(2) with Complex Roots


Let’s consider the model

2
x t = 1.5x t − 1 − 0.75x t − 2 + w tσ w = 1roots = 1 ± i / √3θ = arctan(1 / √3) = 2π / 121 / 12 cycles per unit time

Set coefficients of autoregressive model, establish polynomial.

coef_phi <- c(
1,-1.5,0.75
) # coefficients of the polynomial
polyroot_phi <- polyroot(
z = coef_phi
)
a <- polyroot_phi[1] # = 1+0.57735i, print one root which is 1 + i 1/sqrt(3)
Arg_a = Arg(
z = a
)/(2*pi) # arg in cycles/pt
1/Arg_a # = 12, the period

## [1] 12

Simulate the autoregressive time-series

set.seed(
seed = 823
)
sarima.sim_phi = astsa::sarima.sim(
ar = c(
1.5,-.75
),
n = 144,
S = 12
)

## Note that S > 0 but no seasonal parameter is specified

Plot the simulated autoregressive time-series

astsa::tsplot(
x = sarima.sim_phi,
xlab = "Year"
)

Compute and plot the autocorrelation function of the autoregressive model using model (not simulation)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 21/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

ARMAacf_phi = ARMAacf(
ar = c(
1.5,-0.75
),
ma = 0,
lag.max = 50
)
astsa::tsplot(
x = ARMAacf_phi,
type = "h",
xlab = "lag"
)
abline(
h = 0,
col = 8
)

Convert the autoregressive model to a moving average

# psi-weights - not in text


psi = ts(
data = ARMAtoMA(
ar = c(
1.5,-0.75
),
ma = 0,
lag.max = 50
),
start = 0,
freq = 12
)
astsa::tsplot(
x = psi,
type = 'o',
cex = 1.1,
ylab = expression(
psi-weights
),
xaxt = 'n',
xlab = 'Index'
)
axis(
side = 1,
at = 0:4,
labels = c(
'0','12','24','36','48'
)
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 22/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Example 3.12 The ψ-weights for an ARMA Model


Consider a causal ARMA(p,q) model with roots outside the unit circle.


ϕ(B)x t = θ(B)w tx t = ∑ ψ jw t − j
j=0

For a pure MA(q) model (p = 0),

ψ 0 = 1ψ j = θ j, j = 1, 2, …, qψ j = 0, j > q

Otherwise there is we need to do some multiplication.

ϕ(z)ψ(z) = θ(z)(1 − ϕ 1z − … − ϕ pz p)(ψ 0 + ψ 1z + ψ 2z 2 + …) = (1 + θ 1z + θ 2z 2 + … + θ qz q)

This gives us these equations:

ψ0 = 1
ψ 1 − ϕ 1ψ 0 = θ 1
ψ 2 − ϕ 1ψ 1 − ϕ 2ψ 1 = θ 2
ψ 3 − ϕ 1ψ 2 − ϕ 2ψ 1 − ϕ 3ψ 0 = θ 3

Notice that ψ j = 0 for larger j.

The actual solution will depend on the roots of the polynomials and the initial conditions.

Create an ARMA model with geometric autoregressive portion, geometric moving average portion

psi = ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 50
) # for a list
astsa::tsplot(
x = psi,
type = 'h',
ylab = expression(
psi-weights
),
xlab = 'Index'
) # for a graph

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 23/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

3.3 Autocorrelation and Partial Autocorrelation

Behavior of the ACF and PACF of ARMA processes


Metric AR(P) MA(q) ARMA(p,q)

ACF tails off cuts off after lag q tails off

PACF cuts of after lag p tails off tails off

3.3.1 The Autocorrelation Function (ACF)


Consider a MA(q) process, it is a finite linear combination of white noise, its expected value is zero.

q q
x t = θ(B)w tθ(B) = B 0 + θ 1B + … + θ qB qx t = ∑ θ jw t − jθ 0 = 1E(x t) = ∑ θ jE(w t − j) = 0
j=0 j=0

The autocovariance function for a MA(q) process is

γ(h) = cov(x t + h, x t)

( )
q q

= cov ∑ θ jw t + h − j, ∑ θ kw t − k
j=0 k=0

=
{ σ 2w ∑ qj =−0hθ jθ j + h
0
0≤h≤q
h>q

Recall:

γ(h) = γ( − h)

Note:

θ q ≠ 0 ⇒ γ(q) = σ 2wθ 0θ q = σ 2wθ q ≠ 0

The autocorrelation of a MA(q) process:

{
q−h
∑ j = 0 θ jθ j + h
q 2
0≤h≤q
ρ(h) = ∑ j = 0θ j
0 h>q

For a causal ARMA(p,q) process

( )
∞ ∞ ∞
ϕ(B)x t = θ(B)w t | z | > 1 ∀z s. t. ϕ(z) = 0x t = ∑ ψ jw t − jE(x t) = E ∑ ψ jw t − j = ∑ ψ jE(w t − j) = 0
j=0 j=0 j=0

Autocovariance of a causal ARMA(p,q) process

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 24/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

γ(h)
γ(h) = cov(x t + h, x t) = σ 2w ∑ ψ jψ j + h h ≥ 0ρ(h) =
γ(0)
j=0

Let’s write the autocovariance equation in another way.

γ(h) = cov(x t + h, x t)

( )
p q

= cov ∑ ϕ jx t + h − j + ∑ θ jw t + h − j, x t
j=1 j=0

( ) ( )
p q

= cov ∑ ϕ jx t + h − j, x t + cov ∑ θ jw t + h − j, x t
j=1 j=0

p q

= ∑ ϕ jcov (x t + h − j, x t ) + ∑ θ jcov (w t + h − j, x t )
j=1 j=0

( )
p q ∞
= ∑ ϕ jγ(h − j) + ∑ θ jcov w t + h − j, ∑ ψ kw t − k
j=1 j=0 k=0

p q ∞

= ∑ ϕ jγ(h − j) + ∑ ∑ θ jψ kcov (w t + h − j, w t − k )
j=1 j = 0k = 0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=h
p q

= ∑ ϕ jγ(h − j) + σ 2w ∑ θ jψ j − h
j=1 j=h

general homogeneous equation for the ACF of a causal ARMA process:

γ(h) − ∑ ϕ jγ(h − j) = 0, h ≥ max(p, q + 1)


j=1

with initial conditions

p q

γ(h) − ∑ ϕ jγ(h − j) = σ 2w ∑ θ jψ j − h, 0 ≤ h ≤ max(p, q + 1)


j=1 j=h

Example 3.13 The ACF of an AR(p) (q = 0)


p
γ(h) − ∑ ϕ jγ(h − j) = 0, h ≥ p
j=1

Say that z 1, …, z r are the roots of ϕ(z) with multiplicity m 1, …, \m r, m 1 + … + m r = p. Using difference equations, we see that there are polynomials in
h of degree m j − 1, P j(h) such that

r
ρ(h) = ∑ z j− hP j(h), h≥p
j=1

If the process is causal, then | z j | > 1 ∀j, and ρ(h) → 0 exponentially fast as h → ∞. If there are conjugate roots, conjugates will cancel the
imaginary parts and the dampening will be sinusoidal (the time series will appear cyclic).

Example 3.14 The ACF of an ARMA(1,1)


Consider an ARMA(1,1) process

x t = ϕx t − 1 + θw t − 1 + w t | ϕ | < 1

The autocovariance function is

γ(h) − ϕγ(h − 1) = 0, h ≥ 2γ(h) = cϕ h h ≥ 1

with initial conditions

(1 + θϕ)(ϕ + θ)
γ(0) = ϕγ(1) + σ 2w[1 + θϕ + θ 2]γ(1) = σ 2w
1 − ϕ2

γ(1) γ(1) h (1 + θϕ)(ϕ + θ) h − 1 (1 + θϕ)(ϕ + θ) h − 1


γ(1) = cϕc = γ(h) = ϕ = σ 2w ϕ ρ(h) = ϕ , h≥1
ϕ ϕ 1 − ϕ2 1 + 2θϕ + θ 2

Note: ρ(h) for AR(1) and ARMA(1,1) are similar. We will be unable to tell the difference between AR(1) and ARMA(1,1) using ACF only.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 25/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

3.3.2 The Partial Autocorrelation Function (PACF)


MA(q) models will have zero ACF for lags greater than q, and ρ(q) ≠ 0.

We can use these facts to identify the order of a moving average process.

However if the process is AR or ARMA, the ACF tells us little.

MA process: use ACF to select your model


AR or ARMA process: use PACF to select model

ρ XY | Z = corr(X − X̂, Y − Ŷ)X̂ is X regressed on ZŶ is Y regressed on Z

ρ XY | Z measures the correlation between X and Y with the linear effect of Z removed (partialled out).

If X, Y, Z are multivariate normal, then

ρ XY | Z = corr(X, Y | Z)

Example
Say that we have an AR(1) model

x t = ϕx t − 1 + w t
γ x(2) = cov(x t, x t − 2)
= cov(ϕx t − 1 + w t, x t − 2)
= cov(ϕ 2x t − 2 + ϕw t − 1 + w t − 2, x t − 2)

= ϕ 2γ x(0)

h
x t + h = ϕ hx t + ∑ ϕ h − jw t + jγ x(h) = cov(x t + h, x t) = ϕ hγ x(0)
j=1

x t + h is dependent on x t + h − 1, x t + h − 1 is dependent on x t + h − 2, x t + h − 2 is dependent on x t + h − 3,…,x t + 1 is dependent on x t.

Because of this chain, x t + h is dependent on x t.

For x t and x t − 2, let’s remove the middle.

(
cov x t − ϕx t − 1, x t − 2 −
1
x
ϕ t−1 ) ( 1
= cov w t, − w t − 1 = 0
ϕ )
PACF for mean-zero stationary time series
For h ≥ 2, let x̂ t + h be the regression of x t + h onto {x t + h − 1, x t + h − 2, …, x t + 1} (minimizing the mean squared error).

x̂ t + h = β 1x t + h − 1 + β 2x t + h − 2 + … + β h − 1x t + 1

No intercept is needed because E(x t) = 0 ∀t, if this is not the case, replace x t with x t − μ x.

Because of stationarity, the coefficient are the same if we shift the index.

x̂ t = β 1x t − 1 + β 2x t − 2 + … + β h − 1x t − h + 1

Definition 3.9 partial autocorrelation function (PACF)


The partial autocorrelation function (PACF) of a stationary process, x t, denoted ϕ hh, h = 1, 2, 3, … is

ϕ 11 = corr(x t + 1, x t)ϕ hh = corr(x t + h − x̂ t + h, x t − x̂ t), h ≥ 2

The partial autocorrelation function, ϕ hh, is the correlation between x t + h and x t with the linear dependence on {x t + 1, x t + 2, …, x t + h − 1} removed

If x t is Gaussian, then ϕ hh is the correlation coefficient between x t + h and x t in the bivariate distribution (x t + h, x t) conditioned on
{x t + 1, x t + 2, …, x t + h − 1}.

ϕ hh = corr(x t + h, x t | x t + 1, x t + 2, …, x t + h − 1)

Example 3.15 The PACF of an AR(1)


Consider the AR(1) process

x t = ϕx t − 1 + w t | ϕ | < 1ϕ 11 = ρ(1) = ϕ

To calculate ϕ 22, let’s regress x t + 2 onto x t + 1.

2 2
x̂ t + 2 = βx t + 1E(x t + 2 − x̂ t + 2) 2 = E(x t + 2 − βx t + 1) 2 = E(x t + 2 − 2βx t + 2x t + 1 + β 2x t + 1) = γ(0) − 2βγ(1) + β 2γ(0)

Minimizing the quadratic on β gives

γ(1) ϕγ(0)
β= = =ϕ
γ(0) γ(0)

ϕ 22 = corr(x t + 2 − x̂ t + 2, x t − x̂ t) = corr(x t + 2 − ϕx t + 1, x t − ϕx t − 1) = corr(w t + 2, w t) = 0

In fact ϕ hh = 0 ∀h > 1.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 26/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Example 3.16 The PACF of an AR(p)


Say the x t is an AR(p) process with roots outside the unit circle.

xt + h = ∑ ϕ jx t + h − j + w t + h
j=1

When h > p, the regression of x t + h onto {x t + 1, x t + 2, …, x t + h − 1} is

x̂ t + h = ∑ ϕ jx t + h − j
j=1

When h > p, there is zero partial autocorrelation between observations.

ϕ hh = corr(x t + h − x̂ t + h, x t − x̂ t) = corr(w t + h, x t − x̂ t) = 0

Note that x t − x̂ t depends on white noise of lower indices.

When h ≤ p, ϕ hh ≠ 0, and ϕ 11, ϕ 22, …, ϕ p = 1 , p − 1 may or may not be zero.

Let’s demonstrate this with an AR(2) process.

coef_ar_2 <- c(
1.5,-0.75
)
ARMAacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24
)[-1]
ARMApacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24,
pacf = TRUE
)
#par(mfrow=1:2)
astsa::tsplot(
x = ARMAacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 27/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

astsa::tsplot(
x = ARMApacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)

Example 3.17 The PACF of an Invertible MA(q)


Say that we have an invertible MA(q) process.


xt = − ∑ π jx t − j + w t
j=1

No finite representation exists. The PACF will never cut off (in contrast to an AR(p) process).

If we have a MA(1) process

x t = w t + θw t − 1 | θ | < 1

then

( − θ) h(1 − θ 2)
ϕ hh = , h≥1
1 − θ2 ( h + 1 )

Example 3.18 Preliminary Analysis of the Recruitment Series


Let’s use ACF and PACF to select a model for the rec time series.

The PACF cuts of after 2, and ACF tails off. Let’s use an AR(2) model.

data(
list = "rec",
package = "astsa"
)
astsa::acf2(
series = rec,
max.lag = 48
) # will produce values and a graphic

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 28/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.92 0.78 0.63 0.48 0.36 0.26 0.18 0.13 0.09 0.07 0.06 0.02 -0.04
## PACF 0.92 -0.44 -0.05 -0.02 0.07 -0.03 -0.03 0.04 0.05 -0.02 -0.05 -0.14 -0.15
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.12 -0.19 -0.24 -0.27 -0.27 -0.24 -0.19 -0.11 -0.03 0.03 0.06 0.06
## PACF -0.05 0.05 0.01 0.01 0.02 0.09 0.11 0.03 -0.03 -0.01 -0.07 -0.12
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.02 -0.02 -0.06 -0.09 -0.12 -0.13 -0.11 -0.05 0.02 0.08 0.12 0.10
## PACF -0.03 0.05 -0.08 -0.04 -0.03 0.06 0.05 0.15 0.09 -0.04 -0.10 -0.09
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.06 0.01 -0.02 -0.03 -0.03 -0.02 0.01 0.06 0.12 0.17 0.20
## PACF -0.02 0.05 0.08 -0.02 -0.01 -0.02 0.05 0.01 0.05 0.08 -0.04

ar.ols_rec = ar.ols(
x = rec,
order = 2,
demean = FALSE,
intercept = TRUE
) # regression
ar.ols_rec

##
## Call:
## ar.ols(x = rec, order.max = 2, demean = FALSE, intercept = TRUE)
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: 6.737 (1.111)
##
## Order selected 2 sigma^2 estimated as 89.72

ar.ols_rec$asy.se.coef # standard errors

## $x.mean
## [1] 1.110599
##
## $ar
## [1] 0.04178901 0.04187942

list_ar <- list()


for(j in c("yule-walker","burg","ols","mle","yw")) list_ar[[j]] <- ar(
x = rec,
method = j,
order.max = 12
)
list_ar

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 29/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

## $`yule-walker`
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8
##
## $burg
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
## Order selected 2 sigma^2 estimated as 89.34
##
## $ols
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: -0.05644 (0.446)
##
## Order selected 2 sigma^2 estimated as 89.72
##
## $mle
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
## Order selected 2 sigma^2 estimated as 89.34
##
## $yw
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8

ar.burg(
x = rec,
order.max = 12
)

##
## Call:
## ar.burg.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
## Order selected 2 sigma^2 estimated as 89.34

ar.yw(
x = rec,
order.max = 12
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 30/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

##
## Call:
## ar.yw.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8

ar.mle(
x = rec
)

##
## Call:
## ar.mle(x = rec)
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
## Order selected 2 sigma^2 estimated as 89.34

Plot the time series and the modeled values

predict_rec <- predict(


object = ar.ols_rec,
newdata = rec,
n.ahead = 24
)
ts.plot(
ts.union(
rec,predict_rec$pred
),
col = 1:2
)

3.4 Forecasting
3.4.1 Forecasting AR Processes
When we forecast, we are predicting the future values of our time series, x n + m, m = 1, 2, 3, …, using observed values, x 1 : n = {x 1, x 2, …, x n}.

In this section, we assume our time series is stationary and the model parameters are known.

The minimum mean square error predictor


x nn + m = E(x n + m | x 1 : n)E[x n + m − E(x n + m | x 1 : n)] 2 ≤ E[x n + m − g(x 1 : n)] 2 ∀g functions on x 1 : n

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 31/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Linear predictors
For an initial look at possible predictors, let’s restrict our attention to linear functions on the data.

Linear predictors only depend on the second moment of the process.

n
n
xn + m = α0 + ∑ α kx kα 0, α 1, …, α n ∈ R
k=1

Note: The coefficients depend on both n and m, for now we will drop this fact from the notation.

example
1 2
If n = 1, m = 1, x 2 is a one-step-ahead forecast of x 2 given x 1. A linear predictor is x 2 = α 0 + α 1x 1.

2 2
If n = 2, m = 1, x 3 is a one-step-ahead forecast of x 3 given x 1, x 2. A linear predictor is x 2 = α 0 + α 1x 1 + α 2x 2.

1 2
In general, the coefficients of x 2, x 3 will be different.

Best linear predictors


Linear predictors that minimize the mean square prediction are called best linear predictors.

Theorem B.1 Projection Theorem


Let H be a Hilbert space, M a closed subspace of H, then ∀y ∈ H, ∃ ! ŷ ∈ M and ∃ ! z ∈ H − M such that

y = ŷ + z is a unique representation of y, and


< z, v >= 0 ∀v ∈ M, and
| | y − ŷ | | ≤ | | y − v | | ∀v ∈ M

Proof
If y ∈ M, then ŷ = y, z = 0.

If y ∉ M, let δ = inf v ∈ M | | y − v | | . As an infinum, ∃ (v n) s. t. | | y − v n | | → δ. Select (v n) s. t. | | y − v n | | 2 ≤ δ 2 + 1 / n.

vn + vm
We need to show that (v n) is a Cauchy sequence. Since v n, v m are in M, | | y − | | 2 ≥ δ2
2

| | v n − v m | | 2 = | | (y − v n) − (y − v m) | | 2

= | | y − v n | | 2 + | | y − v m | | 2 − 2 < y − v n, y − v m >
vn + vm
= | | y − vn | | 2 + | | y − vm | | 2 + | | y − vn | | 2 + | | y − vm | | 2 − 4 | | y − | |2
2
vn + vm
= 2 | | y − vn | | 2 + 2 | | y − vm | | 2 − 4 | | y − | |2
2
≤ 2(δ 2 + 1 / n) + 2(δ 2 + 1 / m) − 4δ 2 = δ 2(1 / n + 1 / m)

Thus (v n) is a Cauchy sequence in a Hilbert space, therefore the sequence converges. Say to ŷ. This shows existence and the minimum norm
property.

To show uniqueness, say that there are two such vectors ŷ 1, ŷ 2.

y1 + y2
| | y1 − y2 | | 2 = 2 | | y − y1 | | 2 + 2 | | y − y2 | | 2 − 4 | | y − | |2
2
= 2δ 2 + 2δ 2 − 4δ 2
=0

Now we need to show that y − ŷ is orthogonal to M. Let y 0 be any length one vector in M.

| | y0 | | = 1
< y − ŷ − αy 0, y − ŷ − αy 0 >= < y − ŷ, y − ŷ > + α 2 < y 0, y 0 > − 2α < y − ŷ, y 0 >
This is a quadratic on α, Minimize this
2 < y − ŷ, y 0 > < y − ŷ, y 0 >
α= =
2 < y 0, y 0 > < y 0, y 0 >

< y − ŷ, y 0 > < y − ŷ, y 0 >


( < y − ŷ, y 0 >
) < y − ŷ, y 0 >
2
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > + < y 0, y 0 > − 2 < y − ŷ, y 0 >
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 > < y 0, y 0 >

< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 > 2 < y − ŷ, y 0 > 2
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > + −2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
2
< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > − 2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >≤ < y − ŷ, y − ŷ >
< y 0, y 0 > < y 0, y 0 >

For y − ŷ to be minimal, < y − ŷ, y o >= 0.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 32/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
We will skip a lot of details to focus on the meat and potatoes.

For our purpose, let’s set the dot product to the expected value of the product.

< X, Y >= cov(X, Y) = E((X − μ x)(Y − μ y))

Without loss of generality, we will proceed with univariate expected values of zero for ease of notation.

Theorem B.3 For a Gaussian process, the minimum mean square error predictor is
the best linear predictor (the projection is the expected value)
If (y, x1, . . . , xn) is multi-variate normal, then

E(y | x 1 : n) = projection span ( 1 , x . . . , x n ) (y)


1 , x2 ,

Proof:
Let ŷ = E M ( x ) y be the unique element of M(x) that satisfies the orthogonality principle.

E[(y − E M ( x ) y)w] = 0 ∀w ∈ M(x)

Our goal is to show that

E M ( x ) y = projection span ( 1 , x . . . , x n ) (y).


1 , x2 ,

Let ŷ = projection span ( 1 , x . . . , x n ) (y) from the orthogonality principle element,


1 , x2 ,

< y − ŷ, x i >= cov(y − ŷ, x i) = 0 i = 0, 1, …, n.

Since ŷ is fixed, (y − ŷ, x 0, x 1, . . . , x n) is multivariate normal. Thus zero covariance gives us independence between y − ŷ and x i.

From the independence, we can factor covariance into expected values.

0 =< y − ŷ, w >= E[(y − ŷ)w] = E(y − ŷ)E(w)x 0 = 10 =< y − ŷ, x 0 >= E[(y − ŷ)x 0] = E(y − ŷ)0 = E(y − ŷ)ŷ = E(y)

Property 3.3 Best Linear Prediction for Stationary Processes (the prediction
equations)
n n
Given observations x 1, x 2, . . . , x n, coefficients of the best linear predictor x n + m = α 0 + ∑ k = 1α kx k of x n + m for m ≥ 1 solves

n
E[(x n + m − x n + m)x k] = 0, k = 0, 1, . . . , n

where x 0 = 1.

n ∂Q
We can solve for α 0, α 1, . . . , α n by minimizing Q = E(x n + m − ∑ k = 0α kx k) 2 with respect to the αs; = 0, j = 0, 1, . . . , n.
∂α j

Set k = 0.

( ) ( )
n n n n
E[(x n + m − x nn + m)1] = 0E(x nn + m) = E(x n + m) = μE α 0 + ∑ α kx k = μα 0 + ∑ α kμ = μα 0 = μ 1− ∑ αk x nn + m = μ + ∑ α k(x k − μ)
k=1 k=1 k=1 k=1

The best linear predictor is

x nn + m = μ + ∑ α k(x k − μ)
k=1

one-step-ahead prediction
Given that x 1, . . . , x n were observed, we want to predict the next observation x n + 1.

Without loss of generality, say that μ = 0 (it follows that α = 0).

The best linear predictor is of the form

n
n
xn + 1 = ∑ ϕ njx n + 1 − j
j=1

Invoke orthogonality:

[( ) ]
n n n
E xn + 1 − ∑ ϕ njx n + 1 − j x n + 1 − k = 0, k = 1, . . . , nE(x n + 1x n + 1 − k) = ∑ ϕ njE(x n + 1 − jx n + 1 − k)γ(k) = ∑ ϕ njγ(k − j)
j=1 j=1 j=1

We can write the equations in matrix form.

()()
ϕ n1 γ(1)
Γ nϕ n = γ nΓ n(j, k) = γ(k − j), an n × n matrixϕ n = ⋮ γn = ⋮
ϕ nn γ(n)

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 33/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
Γ n is a symmetric matrix, and thus semi-positive definite. If Γ n is singular, then there are infinitely many solution for the coefficients, but by the
n
projection theorem x n + 1 is unique.

If Γ n is non-singular, then ϕ n is unique and

ϕ n = Γ n− 1γ n

2
For ARMA models, σ w > 0, and γ(h) → 0 as h → ∞ makes Γ n non-singular almost surely.

()
xn
xn − 1
n ′
xn + 1 = ϕn

x1

mean square one-step-ahead prediction error


n n
P n + 1 = E(x n + 1 − x n + 1) 2

= E(x n + 1 − ϕ n′ x) 2

= E(x n + 1 − γ n′ Γ n− 1x) 2
2 ′ −1 ′ −1 −1
= E(x n + 1 − 2γ nΓ n xx n + 1 + γ nΓ n xx ′ Γ n γ n)
2 ′ −1 ′ −1 −1
= E(x n + 1) − 2γ nΓ n E(xx n + 1) + γ nΓ n E(xx ′ )Γ n γ n)
′ −1 ′ −1 −1
= γ n(0) − 2γ nΓ n γ n + γ n Γ n Γ nΓ n γ n
′ −1
= γ n(0) − γ nΓ n γ n

Example 3.19 Prediction for an AR(2) (Verify that the matrix equation gives the
correct coefficients)
Suppose we have a causal AR(2) process

x t = ϕ 1x t − 1 + ϕ 2x t − 2 + w t

The one-step ahead prediction with one observation x 1 is

−1
1 1
γ(1)
ϕ n = Γ n γ nϕ 11 = γ(1)x 2 = ϕ 11x 1 = x 1 = ρ(1)x 1
γ(0) γ(0)

The one-step ahead prediction with two observations x 1, x 2 is

ϕ n = Γ n− 1γ n
( )(
ϕ 21
ϕ 22
=
γ(0)
γ(1)
γ(1)
γ(0) ) ( ) −1 γ(1)
γ(2)
=
1 γ(0)
γ(0) 2 − γ(1) 2 − γ(1) ( − γ(1)
γ(0) )( )
γ(1)
γ(2)

Now let’s invoke the structure of the AR(2) process.

2 2 2 2
x 3 = ϕ 1x 2 + ϕ 2x 1 + w 3x 3 = ϕ 1x 2 + ϕ 2x 1x 3 − x 3 = x 3 − (ϕ 1x 2 + ϕ 2x 1) = w 3E((x 3 − x 3)x 1) = E(w 3x 1) = 0E((x 3 − x 3)x 2) = E(w 3x 2) = 0

It follows by the uniqueness of coefficients that

( )()
ϕ 21
ϕ 22
=
ϕ1
ϕ2

If we repeat this for larger values of n, we see that

x nn + 1 = ϕ 1 + x n + ϕ 2x n − 1
( )() ϕ n1
ϕ n2
=
ϕ1
ϕ2

If we extend these computations to AR(p) processes, we see that

p
n
xn + 1 = ∑ ϕ jx n + 1 − j
j=1

3.4.2 Iterative Algorithms for Forecasting Time Series


The AR(p) one-step ahead prediction equation is much simpler than for ARMA models. The AR(p) equation techniques would lead to large
matrices for inversion. Instead we will use recursive and iterative methods.

Property 3.4 The Durbin–Levinson Algorithm


The matrix form of the prediction equations

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 34/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
Γ nϕ n = γ n

and the mean square one-step ahead prediction error

n n
P n + 1 = E(x n + 1 − x n + 1) 2

can be solved iteratively as follows:

ϕ 00 = 0, P 01 = γ(0)

For n ≥ 1,

n−1
ρ(n) − ∑ k = 1 ϕ n − 1 , kρ(n − k)
n n−1 2
ϕ nn = Pn + 1 = Pn (1 − ϕ nn)
1 − ∑ nk =− 11ϕ n − 1 , kρ(k)

The general formula for the mean square one-step-ahead prediction error is

P n + 1 = γ(0) ∏ [1 − ϕ jj]
n 2

j=1

For n ≥ 2:

ϕ nk = ϕ n − 1 , k − ϕ nnϕ n − 1 , n − k, k = 1, 2, . . . , n − 1

Example 3.20 Using the Durbin–Levinson Algorithm


To use the Durbin–Levinson algorithm, start with

n = 1:

1 2
ϕ 11 = ρ(1)P 2 = γ(0)[1 − ϕ 11]

n = 2:

ρ(2) − ϕ 11ρ(1)
ϕ 22 = ϕ 21 = ϕ 11 − ϕ 22ϕ 11P 23 = P 12[1 − ϕ 22 2] = γ(0)[1 − ϕ 211][1 − ϕ 22 2]
1 − ϕ 11ρ(1)

n = 3:

ρ(3) − ϕ 21ρ(2) − ϕ 22ρ(1)


ϕ 33 = ϕ 32 = ϕ 22 − ϕ 33ϕ 21ϕ 31 = ϕ 21 − ϕ 33ϕ 22P 34 = P 23[1 − ϕ 233] = γ(0)[1 − ϕ 211][1 − ϕ 22 2][1 − ϕ 233]
1 − ϕ 21ρ(1) − ϕ 22ρ(2)

Let’s use the Durbin-Levinson algorithm on some data. First using a function, then computing manually. Previously, we used PACF to establish that
the rec time-series should be modeled with an AR(2).

Load the time-series

data(
list = "rec",
package = "astsa"
)
acf_rec <- as.vector(acf(
x = rec
)$acf)

Use a function to run the Durbin-Levinson algorithm

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 35/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

gsignal::levinson(
acf = acf_rec,
p = 2
)

## $a
## [1] 1.0000000 -1.3315874 0.4445447
##
## $e
## [1] 0.1205793
##
## $k
## [1] -0.9218042 0.4445447

Compute Durbin-Levinson algorithm manually

phi00 <- 0
P10 <- var(
x = rec
)
phi11 <- acf_rec[2]
P21 <- P10*(1-phi11^2)
phi22 <- (acf_rec[3] - phi11*acf_rec[2])/(1 - phi11*acf_rec[2])
phi21 <- phi11 - phi22*phi11
P32 <- P10*(1 - phi11^2)*(1 - phi22^2)
print(
x = c(phi21,phi22)
)

## [1] 1.3315874 -0.4445447

Property 3.5 Iterative Solution for the PACF


The PACF of a stationary process x t, can be obtained iteratively as ϕ nn, n = 1, 2, 3, . . . .

Using iterative solution for the PACF and setting n = p, it follows that for an AR(p) model,

x pp + 1 = ϕ p1x p + ϕ p2x p − 1 + . . . + ϕ ppx 1 = ϕ 1x p + ϕ 2x p − 1 + . . . + ϕ px 1

This shows that for an AR(p) model, the partial autocoefficient at lag p, ϕ pp, is also the last coefficient in the model, ϕ p.

Example 3.21 The PACF of an AR(2)


Let’s use the previous example’s pencil and paper calculations (not the R code) to find the coefficients for an AR(2) model.

We are working with an AR(2), in the difference equations section, we showed that this equations hold.

ϕ1
ρ(h) − ϕ 1ρ(h − 1) − ϕ 2ρ(2) = 0, h ≥ 1ρ(1) = ρ(2) = ϕ 1ρ(1) + ϕ 2ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1) = 0
1 − ϕ2

Combining the difference equations results with the iterative solution equations, we get:

ϕ1
( ) ϕ1 2
ϕ1 + ϕ2 −
ϕ1 2 1 − ϕ2 1 − ϕ2 ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1)
ρ(2) − ρ(1)
ϕ 11 = ρ(1) = ϕ 22 = = = ϕ 2ϕ 21 = ρ(1)(1 − ϕ 2) = ϕ 1ϕ 33 = =0
1 − ϕ2 1 − ρ(1) 2 1 − ϕ 1ρ(1) − ϕ 2ρ(2)

( ) ϕ1 2
1−
1 − ϕ2

Notice that ϕ 22 = ϕ 2.

Best linear predictor for more than one-step ahead predictions


If we want to predict x n + m using observations x 1, x 2, . . . , x n and a linear predictor, then our model is

n (m) (m) (m)


x n + m = ϕ n1 x n + ϕ n2 x n − 1 + . . . + ϕ nn x 1

(m) (m) (m)


where the coefficients ϕ n1 , ϕ n2 , . . . , ϕ nn satisfy the prediction equations.

n n

∑ ϕ nj( m ) E(x n + 1 − jx n + 1 − k) = E(x n + mx n + 1 − k), k = 1, 2, . . . , n ∑ ϕ nj


(m)
γ(k − j) = γ(m + k − 1)
j=1 j=1

These prediction equations can be written in matrix form.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 36/37


3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models

( ) ()
(m)
γ(m) ϕ n1
(m) (m) (m) ⋮ (m) ⋮
Γ nϕ n = γn γn = ϕn =
γ(m + n − 1) (m)
ϕ nn

Mean square m-step ahead prediction error


(m)′
P nn + m = E(x n + m − x nn + m) 2 = γ(0) − γ n Γ n− 1γ n( m )

Property 3.6 The Innovations Algorithm


When x t is a mean-zero stationary time series,

s−1 t−1
cor(x s − x s , xt − xt ) = 0, s ≠ t

t−1
Using this uncorrelated property and the projection theorem, we can derive the innovations algorithm. x t − x t are called innovations.

One-step-ahead innovations algorithm


When we have observations x 1, x 2, . . . , x n, the one-step-ahead calculations are computed for t = 1, 2, . . . , n.

t t−1 j−1 k
γ(t − j) − ∑ k = 0θ j , j − kθ t , t − kP k + 1
x 01 = 0P 01 = γ(0)x tt + 1 = ∑ θ tj(x t + 1 − j − x tt −+ j1 − j), t = 1, 2, . . . P tt + 1 = γ(0) − ∑ θ 2t , t − jP jj + 1, t = 1, 2, . . . θ t , t − j =
j=1 j=0 P jj + 1

m-step-ahead innovations algorithm


After the one-step-ahead calculations are complete for t = 1, 2, . . . , n, the coefficients are obtained by continued iterations, and the following
formulas are used to get the m-step-ahead forecasts and mean-squared errors.

n+m−1 n+m−1

x nn + m = ∑ θ n + m − 1 , j(x n + m − j − x nn ++ m −j−1 n
m − j )P n + m = γ(0) − ∑ θ 2n + m − 1 , jP nn ++ m −j−1
m−j
j=m j=m

Example 3.22 Prediction for an MA(1)


The innovations algorithm lends itself well to prediction for moving average processes. Let’s apply the innovations algorithm to an MA(1) process.

2 2
x t = w t + θw t − 1γ(0) = (1 + θ 2)σ wγ(1) = θσ wγ(h) = 0 ∀h > 1

2 n−1 2
θσ w θ(x n − x n )σ w
0 0 2 n 2 n
x 1 = 0P 1 = (1 + θ 2)σ wθ n1 = θ = 0, j = 2, . . . , nP n + 1 = (1 + θ 2 − θθ n1)σ wx n + 1 =
n − 1 nj
Pn P nn − 1

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 37/37

You might also like