3 ARIMA Models - 3.1 Autoregressive Moving Average Models

3/1/24, 10:23 AM 3 ARIMA Models - 3.
1 Autoregressive Moving Average Models
3 ARIMA Models - 3.1 Autoregressive Moving

Average Models
Aaron Smith
2022-11-30
This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway, David S. Stoffer https://github.com/nickpoison/tsa4
(https://github.com/nickpoison/tsa4)
The most recent version of the package can be found at https://github.com/nickpoison/astsa/ (https://github.com/nickpoison/astsa/)
You can find demonstrations of astsa capabilities at https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md

(https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md)
In addition, the News and ChangeLog files are at https://github.com/nickpoison/astsa/blob/master/NEWS.md

(https://github.com/nickpoison/astsa/blob/master/NEWS.md).
The webpages for the texts and some help on using R for time series analysis can be found at https://nickpoison.github.io/
(https://nickpoison.github.io/).
UCF students can download it for free through the library.
Classic regression only allows the dependent variable to be influenced by current values of the independent variables.
In the time series we allow the dependent variable to be influenced by
the past values of the independent variables and

its own past values.
3.1.1 Introduction to Autoregressive Models

Autoregressive models are based on the idea that the current value of the series, x t, can be explained as a function of p past values,
x t − 1, x t − 2, . . . , x t − p, where p determines the number of steps into the past needed to forecast the current value.
Consider the case
x t = x t − 1 − 0.90x t − 2 + w tw t is Gaussian white noise with variance σ 2 = 1
N <- 5
n <- 500
M <- as.data.frame(
x = matrix(
data = NA,
nrow = n + 2,
ncol = N
)
)
for(j in 1:N){
v <- rep(NA,n+2)
v[1:2] <- runif(
n = 2,
min = -10,
max = 10
)
for(k in 1:n){
v[k+2] <- v[k+1] - 0.9*v[k] + rnorm(1)
}
M[,j] <- v
}
M$time <- (-1):n
gather_M <- tidyr::gather(
data = M[1:20,],
key = "time_series",
value = "value",
-time
)
library(ggplot2)
ggplot(gather_M) +
aes(x = time,y = value,color = time_series) +
geom_line() +
theme_bw() +
theme(legend.position = "none")
file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 1/37

3/1/24, 10:23 AM 3 ARIMA Models - 3.1 Autoregressive Moving Average Models
We have now assumed the current value is a particular linear function of past values.
If regularity persists, then forecasting for such a model might be a possibility.
x nn + 1 = x n − 0.9x n − 1x nn + 1 is the forecasted value using n observed data points
The feasiblity of such a model can be assessed using autocorrelation and lagged scatter plot matrices.
astsa::acf1(
series = M$V1,
max.lag = 10
)
## [1] 0.51 -0.39 -0.87 -0.52 0.28 0.77 0.52 -0.18 -0.66 -0.51
astsa::lag1.plot(
series = M$V1,
max.lag = 4
)

astsa::scatter.hist(
x = M$V1,
y = c(
M$V1[-1],M$V1[nrow(M)]
)
)
The lagged scatterplot matrix for the Southern Oscillation Index (SOI), indicates that lags 1 and 2 are linearly associated with the current value.
data(
list = c(
"soi","rec"
),
package = "astsa"
)
astsa::lag1.plot(
series = soi,
max.lag = 12
)

The ACF shows relatively large positive values at lags 1, 2, 12, 24, and 36 and large negative values at 18, 30, and 42.
astsa::acf1(
series = soi,
max.lag = 36
)
## [1] 0.60 0.37 0.21 0.05 -0.11 -0.19 -0.18 -0.10 0.05 0.22 0.36 0.41
## [13] 0.31 0.10 -0.06 -0.17 -0.29 -0.37 -0.32 -0.19 -0.04 0.15 0.31 0.35
## [25] 0.25 0.10 -0.03 -0.16 -0.28 -0.37 -0.32 -0.16 -0.02 0.17 0.33 0.39
We note also the possible relation between the SOI and Recruitment series indicated in the scatterplot matrix.
astsa::lag2.plot(
series1 = soi,
series2 = rec,
max.lag = 8
)

Definition 3.1 autoregressive model of order p

An autoregressive model of order p, abbreviated AR(p), is of the form
x t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w tx t is stationaryw t is white noise with expected value of zero and constant varianceϕ j are constant coefficientsϕ p ≠ 0E(x t) = 0
If E(x t) ≠ 0, then we can modify the model to get back to an expected value of zero
(x t − μ) = ϕ 1(x t − 1 − μ) + ϕ 2(x t − 2 − μ) + . . . + ϕ p(x t − p − μ) + w tx t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w t + αα = μ(1 − ϕ 1 − . . . − ϕ p)
Some equations are easier to work with using the backshift operator.
(B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)x t = w t
Definition 3.2 The autoregressive operator

The autoregressive operator is defined to be
ϕ(B) = (B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)
Combining this notation with an autoregressive model, we get
ϕ(B)x t = w t
Example 3.1 The AR(1) Model

Let’s look at a first-order model, AR(1), given by
x t = ϕx t − 1 + w t
Iterating backwards k times, we get
x t = ϕx t − 1 + w t
= ϕ 2x t − 2 + ϕw t − 1 + w t
⋮
k−1
= ϕ kx t − k + ∑ ϕ jw t − j
j=0
Going backwards all the way gives
t−1
x t = ϕ tx 0 + ∑ ϕ jw t − j
j=0
This naturally leads to using infinite series, to make the math work out we accept the assumptions that | ϕ | < 1, sup t variance(x t) < ∞, under these
assumptions
xt = ∑ ϕ jw t − j
j=0
Theorem A.1 Mean Square Convergence of a Sequence

Let {x t} be a sequence in L 2, then ∃ a x ∈ L 2 s.t.

ms
xn → x ⟺ lim sup n ≥ mE x n − x m
m→∞
( )2 = 0
Proposition: Convergence of an AR(1) model

When x t is an AR(1) process,
x t = ϕx t − 1 + w twith | ϕ | < 1, sup t variance(x t) < ∞
ms
∞ ∞
then x n → ∑ j = 0ϕ jw t − j and ∑ j = 0ϕ jw t − j ∈ L 2
Proof:
By theorem A.1 when need to show that
lim sup n ≥ mE x n − x m
m→∞
( )2 = 0
Without loss of generality, say that n = m + d, d ≥ 0.
([ ][ ])
m+d−1 m−1 2
(x m + d − x m ) 2 = ϕ m + dx 0 +
j=0
∑ ϕ j w m + d − j − ϕ mx 0 + ∑ ϕ jw m − j
j=0
( )
m+d−1 2
= (ϕ d − 1)ϕ mx 0 + ∑ ϕ jw m + d − j
j=m
m+d−1 m + d − 1m + d − 1
2
= (ϕ d − 1) 2ϕ 2mx 0 + 2(ϕ d − 1)ϕ mx 0 ∑ ϕ jw m + d − j + ∑ ∑ ϕ j + kw m + d − jw m + d − k
j=m j=m k=m
Take the expected value.
m+d−1 m + d − 1m + d − 1
(
E xm + d − xm ) 2 = (ϕd − 1)2ϕ2mx20 + 2(ϕd − 1)ϕmx0j =∑m ϕ jE(w m + d − j) +
j=m
∑
k=m
∑ ϕ j + kE(w m + d − jw m + d − k)
m+d−1
= (ϕ d − 1) 2ϕ 2mx 0 +
2
∑ 2
ϕ 2jE(w m + d − j)
j=m
m+d−1
2
= (ϕ d − 1) 2ϕ 2mx 0 + σ w
2
∑ ϕ 2j
j=m
ϕ 2m − ϕ 2m + 2d + 2
= (ϕ d − 1) 2ϕ 2mx 20 + σ 2w
1−ϕ
[ ]
2
σw
2
= ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
Let’s take the supremum with respect to d ≥ 0.
[ ]
2
σw
(
sup d ≥ 0E x m + d − x m ) 2 2
= sup d ≥ 0ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
σ 2w
Consider the series defined by the brackets with d as the index. Since | ϕ | < 1, the term in the bracket converges to x 20 + as d → ∞. This
1−ϕ
2
σw
means that for any given ϵ > 0, there is a finite number of terms that are greater than x 20 + + ϵ. Since there is a finite number of terms, the
1−ϕ
supremum is achieved as a maximum and it is finite.
(
sup d ≥ 0E x m + d − x m ) 2
[ 2
= ϕ 2mmax d ≥ 0 (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
σ 2w
] (
lim sup d ≥ 0E x m + d − x m
m→∞
)2 = 0
This shows that the AR(1) process converges. The formula for x t that uses the summation with all past w t gives the limit of the series.
Proposition: Convergence of an AR(1) model
( )
k−1 2
lim k → ∞E x t − ∑ ϕ jw t − j
j=0
( )
= lim k → ∞ϕ 2kE x 2t − k = 0
Combining the equations before the proof we get

∞ ∞ ∞
x t = ϕx t − 1 + w tx t = ∑ ϕ jw t − j ∑ ϕ jw t − j = ϕ ∑ ϕ jw t − j − 1 + w t
j=0 j=0 j=0
This gives an expected value of zero.
xt = ∑ ϕ jw t − j
j=0
( )
∞
E(x t) = E ∑ ϕ jw t − j
j=0
= ∑ ϕ jE ( w t − j )
j=0
=0
Autocorrelation function (the computation here is different from the book):
γ(h) = cov(x t + h, x t)
[( )( )]
∞ ∞
=E ∑ ϕ jw t + h − j ∑ ϕ kw t − k
j=0 k=0
[( )( )]
∞ ∞
=E ∑ ϕ jw t − ( j − h ) ∑ ϕ kw t − k
j=0 k=0
[( )( )]
∞ ∞
=E ∑ ϕ j + hw t − j ∑ ϕ kw t − k
j= −h k=0
[ ]
∞ ∞
=E ∑ ∑ ϕ j + hϕ k w t − j w t − k
j = − hk = 0
∞ ∞
= ∑ ∑ ϕ j + k + hE [w t − jw t − k ] (expected value is zero when t − j ≠ t − k)

j = − hk = 0
∞
= ∑ ϕ 2k + hE [w 2t − k ]
k=0
∞
= σ wϕ h ∑ ϕ 2k ( | ϕ | < 1)
2
k=0
2
1
= σ wϕ h
1 − ϕ2
Recall that
γ(h) = γ( − h)
The autocorrelation of our AR(1) model is
γ(h)
ρ(h) =
γ(h)
2
1
σ wϕ h
1 − ϕ2
=
1
σ 2wϕ 0
1 − ϕ2
= ϕh
Recursive formula for autocorrelation:
ρ(h) = ϕρ(h − 1)
Example 3.2 The Sample Path of an AR(1) Process

2
The shows a time plot of two AR(1) processes with σ w = 1,
one with ϕ = 0.9, ρ(h) = 0.9 h and one with ϕ = − 0.9, ρ(h) = ( − 0.9) h.
h ≥ 0.
In the first case observations close together in time are positively correlated with each other. This result means that observations at contiguous
time points will tend to be close in value to each other.
This fact shows up in the first figure as a very smooth sample path for x t.
Contrast this with the case in which ϕ = − 0.9, so that ρ(h) = ( − 0.9) h, for h ≥ 0. This result means that observations at contiguous time points are
negatively correlated but observations two time points apart are positively correlated.
This fact shows up in the second figure, where, for example, if an observation, x t, is positive, the next observation, x t + 1, is typically negative, and
the next observation, x t + 2, is typically positive. Thus, in this case, the sample path is very choppy.

#par(mfrow=c(2,1))
# in the expressions below, ~ is a space and == is equal
astsa::tsplot(
x = astsa::sarima.sim(
ar = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==+0.9
)
)
abline(
h = 0,
col = "red"
)
astsa::tsplot(
ar = -0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==-0.9
)
)
abline(
h = 0,
col = "red"
)

Example 3.3 Explosive AR Models and Causality

Random walks are not stationary;
xt = xt − 1 + wt
is not stationary.
Consider an AR(1) model with | ϕ | > 1. Such processes are called explosive because the values of the time series quickly become large in
magnitude.
Clearly, because | ϕ | j increases without bound as j → ∞.
k−1
∑ ϕ jw t − j
j=0
will not converge as k → ∞.
Let’s reverse the recursive equation:
k−1
x t + 1 = ϕx t + w t + 1x t = ϕ − 1x t + 1 − ϕ − 1w t + 1x t = ϕ − 1(ϕ − 1x t + 2 − ϕ − 1w t + 2) − ϕ − 1w t + 1 ⋮ x t = ϕ − kx t + k − ∑ ϕ − jw t + j
j=1
Since | ϕ − 1 | < 1, the AR(1) model is stationary future dependent. Unfortunately this model is useless.
Example 3.4 Every Explosion Has a Cause

Excluding explosive models from consideration is not a problem because the models have causal counterparts.
For example, if
2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)
then {x t} is non-causal stationary Gaussian process with
( )
∞ ∞
ϕ −2
E(x t) = 0γ x(h) = cov(x t + h, x t) = cov − ∑ ϕ − jw t + h + j, − ∑ ϕ − jw t + j 2
= σ wϕ − h
j=1 j=1 1 − ϕ −2
Let
2
y t = ϕ − 1y t − 1 + v tv t ∼ Normal(0, σ wϕ − 2)
x t and y t are stochastically the same, all finite distributions of the processes are the same.
Example:
If
2
x t = 2x t − 1 + w tσ w = 1
then
1 2 1
yt = y t − 1 + v tσ v =
2 4
is an equivalent causal process.

Iterating backwards
To iterate backwards, let’s invoke the backwards operator.
As a first step consider the AR(1) model.
∞
2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)x t = ∑ ϕ jw t − jϕ(B)x t = w tϕ(B) = B 0 − ϕB
j=0
Let’s define an operator for the stochastic term.
∞ ∞
xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j
j=0 j=0
Putting the two operators together we see that
ϕ(B)ψ(B)w t = w t
The coefficients on the left must match the coefficients on the right
(B 0 − ϕB)(B 0 + ψ 1B 1 + ψ 2B 2 + … + ψ jB j + …) = B 0B 0 + (ψ 1 − ϕ)B + (ψ 2 − ψ 1ϕ)B 2 + … + (ψ j − ψ j − 1ϕ)B j + … = B 0
Matching the coefficients we see that
ψ 0 = 1ψ 1 = ϕψ j = ψ j − 1ϕ
Leading to the solution
ψj = ϕj
Another approach is to invoke an inverse, ϕ − 1(B).
ϕ(B)x t = w tϕ − 1(B)ϕ(B)x t = ϕ − 1(B)w tx t = ϕ − 1(B)w t
Thus we see that
ϕ − 1(B) = ψ(B)
Consider the polynomial function and its rational function inverse
1
ϕ(z) = 1 − ϕz, | z | < 1ϕ(z) − 1 = = 1 + ϕz + ϕ 2z 2 + …
1 − ϕz
We will use similar polynomial to backshift operator techniques when we discuss ARMA models.
3.1.2 Introduction to Moving Average Models

Another way to model a time series is to assume that the time series is a linear combination of white noise.
Definition 3.3 The moving average model

The moving average model of order q, or MA(q) model, is defined to be
2
x t = w t + θ 1w t − 1 + θ 2w t − 2 + θ 3w t − 3 + … + θ qw t − qw t ∼ whitenoise(0, σ w)θ j are constant parametersθ q ≠ 0
Note: Some software and texts write the moving average model coefficients with negative coefficients. Check the help documentation before using.
The moving average model with backshift notation

x t = θ(B)w t
Definition 3.4 The moving average operator is

The moving average operator is
θ(B) = B 0 + θ 1B 1 + θ 2B 2 + … + θ qB q
Note that a moving average model is stationary.
Example 3.5 The MA(1) Process

Consider the MA(1) model
x t = w t + θw t − 1
then

{ {
2 1 h=0
(1 + θ 2)σ w h=0
θ
E(x t) = 0γ(h) = θσ 2w h = 1 ρ(h) = h=1
1 + θ2
0 h>1 0 h>1
Some quick derivatives show that
1
| ρ(1) | ≤ and the bounds are achieved for θ = ± 1
2
1
Also notice that ρ(h) are the same if we have the coefficient of θ or
θ
1
θ θ
=
θ2 + 1
1+ () 1
θ
2
Replacing
1
θ with , and
θ
σ w with θ 2σ 2w:
2
{ ( ) 1 2 2
1+ (θ 2σ w) = (θ 2 + 1)σ w h=0
θ2
γ(h) =
1 2 2
(θ 2σ w) = θσ w h=1
θ
0 h>1
MA(1) have zero correlation for two or greater backshits, while AR(1) never have zero correlation.
Notice how much smoother the MA(1) model with θ = 0.9 is than θ = − 0.9.
#par(mfrow=c(2,1))
astsa::tsplot(
ma = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
MA(1)~~~theta==+0.9
)
)

astsa::tsplot(
ma = -0.9,
n = 100
),
col = 4,
ylab = "",
main=expression(
MA(1)~~~theta==-0.9
)
)
Example 3.6 Non-uniqueness of MA Models and Invertibility

x t = w t + θw t − 1
These two MA(1) models have the same autocorrelation, the same autocovariance, and are stochastically the same.
2
1 1
σ w = 25, θ = , xt = wt + w t − 1, w t ∼ iid normal(0, 25)
5 5
2
σw = 1, θ = 5, y t = v t + 5v t − 1, v t ∼ iid normal(0, 1)
{
26 h=0
γ(h) = 5 h=1
0 h>1
If we observed one of these processes, we would not be able to mathematically tell which one we were looking at. When we have to select a
model, we prefer to use a model that is invertible. Choose the model with | θ | < 1
x t = w t + θw t − 1w t = x t − θw t − 1w t = x t − θ(x t − 1 − θw t − 2) = x t − θx t − 1 + θ 2w t − 2w t = x t − θx t − 1 + θ 2(x t − 2 − θw t − 3) = x t − θx t − 1 + θ 2x t − 2 − θ 3w t − 3 ⋮ w t = ( − θ) k + 1w t − k − 1 +
If | ϕ | < 1, using w t = x t − θw t − 1 and iterating backwards we get
∞
wt = ∑ ( − θ) jx t − j
j=0
For our example we would choose
∞
2
σ w = 25θ =
1
y = v t + 5v t − 1v t ∼ iid normal(0, 1)v t =
5 t
∑
j=0
( )
−1
5
j
yt − j
Series/polynomial tools for analyzing MA(1) models

x t = θ(B)w tθ(B) = B 0 + θB
If | θ | < 1, then
π(B)x t = w tπ(B) = θ − 1(B)
Let

∞ ∞
1
θ(z) = 1 + θzIf | θ | < 1, π(z) = θ − 1(z) =
1 + θz
= ∑ ( − θ) jz jπ(B) = ∑ ( − θ) jB j
j=0 j=0
3.1.3 Autoregressive Moving Average Models

Definition 3.5 ARMA(p,q)
A time series {x t | t ∈ Z} is ARMA(p, q) if it is stationary and
p q
xt = ∑ ϕ jx t − j + w t + ∑ θ kw t − kϕ p ≠ 0, θ q ≠ 0w t ∼ wn(0, σ 2w)σ 2w > 0
j=1 k=1
If the time series has a non-zero expected value, we adjust the model to get zero expected value.
( )
p p q
α=μ 1− ∑ ϕj xt = α + ∑ ϕ jx t − j + w t + ∑ θ kw t − k
j=1 j=1 k=1
If p = 0, then the model is a moving average model.

If q = 0, then the model is an autoregressive model
Let’s move all the autoregressive terms to the left hand side of the equation.
p q
xt − ∑ ϕ j x t − j = w t + ∑ θ k w t − k x t − ϕ 1x t − 1 − ϕ 2x t − 2 − … − ϕ px t − p = w t + θ 1w t − 1 + θ 2w t − 2 + … + θ qw t − q
j=1 k=1
Invoking the backshift operator, we can write this equation as
ϕ(B)x t = θ(B)w t
This presentation illuminates a potential pit-fall while modeling. If ϕ(B)x t = θ(B)w t is the correct model, but we mistakenly multiply both sides of the
equation by another operator on B, η(B), then we get a mathematically correct equation that will lead to over-parameterization.
η(B)ϕ(B)x t = η(B)θ(B)w t
Example 3.7 Parameter Redundancy

Consider the white noise process ARMA(0,0)
xt = wt
1
Say that while fitting out model, we make the error of multiplying both sides of the equation by η(B) = (B 0 − B).
2
1 1 1 1 1 1
(B 0 − B)x t = (B 0 − B)w tx t − xt − 1 = wt − w t − 1x t = xt − 1 + wt − wt − 1
2 2 2 2 2 2
The correct model is ARMA(0,0), but we go with an ARMA(1,1) model. x t is white noise, but we missed that fact.
Notice that the R code gives a statistically significant incorrect model.
The intercept is estimating the mean.
set.seed(
seed = 823
)
rnorm_5 = rnorm(
n = 100,
mean = 5
) # generate iid N(5,1)s
arima(
x = rnorm_5,
order = c(
1,0,1
) # since the observations are random noise, 0,0,0 is the correct order
)
##
## Call:
## arima(x = rnorm_5, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.7567 0.8308 4.9066
## s.e. 0.2621 0.2264 0.1028
##
## sigma^2 estimated as 0.9723: log likelihood = -140.51, aic = 289.02
Our over-parameterized model is
(B 0 + 0.76B)x t = (B 0 + 0.26)w t

Three major problems with ARMA(p,q) models

parameter redundant models (over-parameterized)
stationary autoregressive models that depend of the future
moving average models that are not unique
Definition 3.6 AR and MA polynomials

The AR and MA polynomials are defined as
ϕ(z) = 1 − ϕ 1z − ϕ 2z 2 − … − ϕ pz p, ϕ p ≠ 0θ(z) = 1 + θ 1z + θ 2z 2 + … + θ qz q, θ p ≠ 0z ∈ C
To protect us from parameter redundant models, we will require that ϕ(z) and θ(z) do not have a common factor. This will help protect from
incorrectly multiplying the correct model by an extraneous operator.
Definition 3.7 Causal ARMA(p,q)

An ARMA(p, q) model is said to be causal, if the time series {x t | t ∈ Z} can be written as a one-sided linear process:
∞ ∞ ∞
xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j ∑ | ψ j | < ∞ψ 0 = 1
j=0 j=0 j=0
Example
The AR(1) process
x t = ϕx t − 1 + w t
is causal when | ϕ | < 1 or equivalently the root of ϕ(z) = 1 − ϕz is greater than one in magnitude.
Property 3.1 Causality of an ARMA(p, q) Process

An ARMA(p,q) model is causal if and only if ϕ(z) ≠ 0 ∀ | z | ≤ 1. The coefficients of the linear process can be determined by solving
∞
θ(z)
ψ(z) = ∑ ψ jz j = ϕ(z)
|z| < 1
j=0
Another way to phrase this property is that an ARMA process is causal only when the roots of ϕ(z) lie outside the unit circle; that is, ϕ(z) = 0 only
when | z | > 1.
Finally, to address the problem of uniqueness, we choose the model that allows an infinite autoregressive representation.
Definition 3.8 Invertible ARMA(p,q)

An ARMA(p, q) model is said to be invertible, if the time series {x t | t ∈ Z} can be written as
∞ ∞ ∞
π(B)x t = ∑ π jx t − j = w tπ(B) = ∑ π jB j ∑ | π j | < ∞π 0 = 1
j=0 j=0 j=0
Property 3.2 Invertibility of an ARMA(p, q) Process

An ARMA(p,q) model is invertible if and only if θ(z) ≠ 0 ∀ | z | ≤ 1. The coefficients π j of π(B) can be determined by solving
∞
ϕ(z)
π(z) = ∑ π jz j = θ(z)
|z| < 1
j=0
an ARMA process is invertible only when the roots of θ(z) lie outside the unit circle; that is, θ(z) = 0 only when | z | > 1.
Example 3.8 Parameter Redundancy, Causality, Invertibility

x t = 0.4x t − 1 + 0.45x t − 2 + w t + w t − 1 + 0.25w t − 2x t − 0.4x t − 1 − 0.45x t − 2 = w t + w t − 1 + 0.25w t − 2(B 0 − 0.4B − 0.45B 2)x t = (B 0 + B + 0.25B 2)w tϕ(z) = 1 − 0.4z − 0.45z 2θ(z) = 1
This looks like an ARMA(2,2) model, but
ϕ(z) = 1 − 0.4z − 0.45z 2 = (1 + 0.5z)(1 − 0.9z)θ(z) = 1 + z + 0.25z 2 = (1 + 0.5z) 2
Reducing the model, we get an ARMA(1,1) model.
x t = 0.9x t − 1 + 0.5w t − 1 + w t(1 − 0.9B)x t = (1 + 0.5z)w tϕ(z) = 1 − 0.9zθ(z) = 1 + 0.5z
Let’s find ψ(z) s.t.
∞ ∞ ∞ ∞ ∞ ∞ ∞
θ(z)
ψ(z) == , | z | < 1ϕ(z)ψ(z) = θ(z)(1 − 0.9z) ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j − 0.9z ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ jz j + 1 = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ j − 1z j = 1 + 0.5zψ 0 +
ϕ(z)
j=0 j=0 j=0 j=0 j=0 j=0 j=1
This gives us these equations:
ψ 0 = 1ψ 1 − 0.9ψ 0 = 0.5ψ j − 0.9ψ j − 1 = 0 ∀j > 1

∞ ∞
14
ψ 0 = 1ψ j = 1.4 ∗ 0.9 j − 1 ∀j > 0ψ(z) = 1 + ∑ 1.4 ∗ 0.9 j − 1z jx t = w t + ∑ 9
∗ 0.9 jw t − j
j=1 j=1
Let’s compute the ψ(z) coefficients manually.
c(
1,
(14/9)*(0.9^{1:10})
)
## [1] 1.0000000 1.4000000 1.2600000 1.1340000 1.0206000 0.9185400 0.8266860

## [8] 0.7440174 0.6696157 0.6026541 0.5423887
Let’s use R to find the ψ(z) coefficients. Notice that it omits ψ 0
ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 psi-weights
## [1] 1.4000000 1.2600000 1.1340000 1.0206000 0.9185400 0.8266860 0.7440174

## [8] 0.6696157 0.6026541 0.5423887
Now, let’s find π(z) s.t.
∞ ∞ ∞ ∞ ∞ ∞
ϕ(z)
ϕ(z) = 1 − 0.9zθ(z) = 1 + 0.5zπ(z) = ∑ π jz j = θ(z)
, | z | < 1θ(z)π(z) = ϕ(z)(1 + 0.5z) ∑ π jz j = 1 − 0.9z ∑ π jz j + ∑ 0.5π jz j + 1 = 1 − 0.9z ∑ π jz j + ∑ 0.5π j − 1z j = 1 − 0.9zπ 0 + ∑
j=0 j=0 j=0 j=0 j=0 j=1 j
This leads to the equations:
π 0 = 1π 1 + 0.5π 0 = − 0.9π j + 0.5π j − 1 = 0 ∀j > 1
Solving gives these results. Notice how the exponents directly match the subscript (different from textbook).
π 0 = 1π j = ( − 1.4) ∗ ( − 0.5) j − 1 = − 2.8 ∗ ( − 0.5) jw t = 1 + ∑ 2.8 ∗ ( − 0.5) jx t − j

j=1
Let’s manually compute the coefficients, notice that the coefficients get cut in half as the index increases
c(
1,
2.8*((-1/2)^(1:10))
)
## [1] 1.000000000 -1.400000000 0.700000000 -0.350000000 0.175000000

## [6] -0.087500000 0.043750000 -0.021875000 0.010937500 -0.005468750
## [11] 0.002734375
Let’s use the astsa package to compute the weights of π(z)
astsa::ARMAtoAR(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 pi-weights
## [1] -1.400000000 0.700000000 -0.350000000 0.175000000 -0.087500000

## [6] 0.043750000 -0.021875000 0.010937500 -0.005468750 0.002734375
Example 3.9 Causal Conditions for an AR(2) Process

For an AR(1) model,
(1 − ϕB)x t = w t
to be causal, the root of
ϕ(z) = 1 − ϕz
must lie outside of the unit circle. In this case,
ϕ(z) = 0 when z = 1 / ϕ | ϕ | < 1
It is not so easy to establish this relationship for higher order models.
For example, the AR(2) model,

(1 − ϕ 1B − ϕ 2B 2)x t = w t
is causal when both of the two roots are outside the unit circle.
| |
2 2
2
− ϕ1 ±
√ϕ 1
+ 4ϕ 2 − ϕ 1 ±
√ϕ 1
+ 4ϕ 2
1 1 −1
ϕ(z) = 1 − ϕ 1z − ϕ 2z z roots = > 1ϕ 1 = + ϕ2 = ϕ 1 + ϕ 2 < 1ϕ 2 − ϕ 1 < 1 | ϕ 2 | < 1
2ϕ 2 2ϕ 2 z+ z− z +z −
ϕ(z) = 1 − ϕ 1z − ϕ 2z 2
−1 −1
= (1 − z + z)(1 − z − z)
(
= 1−
− ϕ1 +
2ϕ 2
√
2
ϕ1 + 4ϕ 2
z
)( 1−
− ϕ1 −
2ϕ 2
√ϕ
2
1 + 4ϕ 2
z
)
M <- expand.grid(
phi_1 = seq(
from = -3,
to = 3,
length = 100
),
phi_2 = seq(
from = -2,
to = 1,
length = 100
),
root_p = complex(
real = NA,
imaginary = NA
),
root_m = complex(
real = NA,
imaginary = NA
)
)
M <- M[M$phi_2 != 0,]
M$discriminant = M$phi_1^2 + 4*M$phi_2
M$root_p[M$discriminant >= 0] <- complex(
real = (-M$phi_1 + sqrt(M$phi_1^2 + 4*M$phi_2))/(2*M$phi_2),
imaginary = 0
)[M$discriminant >= 0]
## Warning in sqrt(M$phi_1^2 + 4 * M$phi_2): NaNs produced
M$root_m[M$discriminant >= 0] <- complex(

real = (-M$phi_1 - sqrt(M$phi_1^2 + 4*M$phi_2))/(2*M$phi_2),
imaginary = 0
)[M$discriminant >= 0]
## Warning in sqrt(M$phi_1^2 + 4 * M$phi_2): NaNs produced
M$root_p[M$discriminant < 0] <- complex(

real = -M$phi_1/(2*M$phi_2),
imaginary = sqrt(-M$phi_1^2 - 4*M$phi_2)/(2*M$phi_2)
)[M$discriminant < 0]
## Warning in sqrt(-M$phi_1^2 - 4 * M$phi_2): NaNs produced
M$root_m[M$discriminant < 0] <- complex(

real = -M$phi_1/(2*M$phi_2),
imaginary = -sqrt(-M$phi_1^2 - 4*M$phi_2)/(2*M$phi_2)
)[M$discriminant < 0]
## Warning in sqrt(-M$phi_1^2 - 4 * M$phi_2): NaNs produced

M$Mod_root_p <- Mod(

z = M$root_p
)
M$Mod_root_m <- Mod(
z = M$root_m
)
M$min_Mod_root <- apply(
X = M[,c("Mod_root_m","Mod_root_p")],
MARGIN = 1,
FUN = min
)
M$roots[M$discriminant >= 0 & M$min_Mod_root > 1] <- "real roots, outside unit circle"
M$roots[M$discriminant >= 0 & M$min_Mod_root <= 1] <- "real roots, inside unit circle"
M$roots[M$discriminant < 0 & M$min_Mod_root > 1] <- "complex roots, outside unit circle"
M$roots[M$discriminant < 0 & M$min_Mod_root <= 1] <- "complex roots, inside unit circle"
library(ggplot2)
ggplot(M) + aes(x = phi_1,y = phi_2,color = roots) +
geom_point() +
coord_fixed() +
geom_hline(yintercept = 0)

# this is how Figure 3.3 was generated

seg1 = seq(
from = 0,
to = 2,
by = 0.1
)
seg2 = seq(
from = -2,
to = 2,
by = 0.1
)
name1 = expression(
phi[1]
)
name2 = expression(
phi[2]
)
astsa::tsplot(
x = seg1,
y = 1-seg1,
ylim = c(-1,1),
xlim = c(-2,2),
ylab = name2,
xlab = name1,
main = 'Causal Region of an AR(2)'
)
lines(
x = -seg1,
y = 1-seg1,
ylim = c(-1,1),
xlim = c(-2,2)
)
abline(
h = 0,
v = 0,
lty = 2,
col = 8
)
lines(
x = seg2,
y = -(seg2^2/4),
ylim = c(-1,1)
)
lines(
x = c(-2,2),
y = c(-1,-1),
ylim = c(-1,1)
)
text(
x = 0,
y = .35,
labels = 'real roots'
)
text(
x = 0,
y = -0.5,
labels = 'complex roots'
)

3.2 Difference Equations

The study of ARMA processes and their autocorrelation functions is greatly enhanced by a basic knowledge of difference equations, because they
are difference equations.
Suppose {u n} is a time-series such that,
u n − αu n − 1 = 0α ≠ 0n ∈ N
To solve the equation
u 1 = αu 0
u 2 = αu 1 = α 2u 0
u 3 = αu 2 = α 3u 0
⋮
u n = α nu 0
Operator notation:
(B 0 − αB)u n = 0
Associated polynomial:
1 −1
α(z) = 1 − α(z)z 0 = , α = z0
α
If we know the initial condition u 0 = c, then
−1
u n = (z 0 ) nc
The solution to the difference equation depends on
the initial condition

the root of the associated polynomial
Autocorrelation of AR(1)
Here is an example of such a sequence:
The autocorrelation function of an AR(1) model x n = ϕx n − 1 + w n is
ρ(h) − ϕρ(h − 1) = 0
degree 2 difference equation

u n − α 1u n − 1 − α 2u n − 2 = 0, α 2 ≠ 0, n = 2, 3, 4, …
∞ ∞ ∞ ∞ ∞
u n − α 1u n − 1 − α 2u n − 2 = 0u nx n − α 1u n − 1z n − α 2u n − 2z n = 0u nz n − α 1zu n − 1z n − 1 − α 2z 2u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u n − 1z n − 1 − α 2z 2 ∑ u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u nz
n=2 n=2 n=2 n=2 n=1
Set
∞
U(z) = ∑ u nz n
n=0

∞ ∞ ∞
∑ u nz n − α 1z ∑ u nz n − α 2z 2 ∑ u nz n = 0 (U(z) − u 0 − u 1z ) − α 1z (U(z) − u 0 ) − α 2z 2U(z) = 0U(z) − u 0 − u 1z − α 1zU(z) + α 1zu 0 − α 2z 2U(z) = 0(1 − α 1z − α 2z 2)U(z) − u 0 − u 1z
n=2 n=1 n=0
Now let’s take the partial fraction decomposition of the right hand side.
If the associated quadratic has two distinct roots, z 1 ≠ z 2, then for some constants c 1, c 2:
∞ ∞ ∞ ∞ ∞ ∞
c1 c2
1 − α 1z − α 2z 2 = (1 − z 1− 1z)(1 − z 2− 1z), z 1 ≠ z 2U(z) = −1
+ −1
∑ u nz n = c 1 ∑ (z 1− 1z) n + c 2 ∑ (z 2− 1z) n, | z 1− 1z | < 1, | z 2− 1z | < 1 ∑ u nz n = c 1 ∑ z 1− nz n + c 2 ∑ z
1 − z1 z 1 − z2 z n = 0 n=0 n=0 n=0 n=0 n=0
If the associated quadratic is square with root z 0, then we get the following equations. Notice that undetermined constants which are multiplied
together are merged into c 2.
∞ ∞ ∞ ∞
c1 c2 d 1
−1
1 − α 1z − α 2z 2 = (1 − z 0 z) 2U(z) = + ∑ u nz n = c 1 ∑ (z 0− 1z) n + c 2 dz (c 2 is merged with constant from antiderivative) ∑ u nz n = c 1 ∑ z 0 z n +
−n
1 − z 0− 1z (1 − z 0− 1z) 2
n=0 n=0 1 − z 0− 1z n=0 n=0
Check that the solutions solve the original equation

−n −n
u n = c 1z 1 + c 2z 2
u n − α 1u n − 1 − α 2u n − 2 = 0
c 1z 1− n + c 2z 2− n − α 1(c 1z 1− n + 1 + c 2z 2− n + 1) − α 2(c 1z 1− n + 2 + c 2z 2− n + 2) = 0
−n −n −n+1 −n+1 −n+2 −n+2
c 1z 1 + c 2z 2 − α 1c 1z 1 − α 1c 2z 2 − α 2c 1z 1 − α 2c 2z 2 =0
−n −n+1 −n+2 −n −n+1 −n+2
c 1z 1 − α 1c 1z 1 − α 2c 1z 1 + c 2z 2 − α 1c 2z 2 − α 2c 2z 2 =0
−n 2 −n 2
c 1z 1 (1 − α 1z 1 − α 2z 1) + c 2z 2 (1 − α 1z 2 − α 2z 2) =0
−n −n
c 1z 1 (0) + c 2z 2 (0) =0
When the associated polynomial is square, our solution is:
−n − (n+1)
u n = c 1z 0 + c 2(n + 1)z 0
Factoring the associated polynomial gives:
−1 −1 −2 −1 −2 −1 −2
u n − α 1u n − 1 − α 2u n − 2 = 01 − α 1z − α 2z 2 = (1 − z 0 z) 2 = 1 − 2z 0 z + z 0 z 2 − α 1 = − 2z 0 − α 2 = z 0 u n − 2z 0 u n − 1 + z 0 u n − 2 = 0
Plugging our solution into the difference equations with factored coefficients shows our solution is correct.
−n −n+1 −1 −n+1 −1 −n −2 −n+2 −2 −n+1 −n −n −n −n−1 −n−1 −n−1 −n−

c 1z 0 + c 2(n + 1)z 0 − 2z 0 c 1z 0 − 2z 0 c 2nz 0 + z 0 c 1z 0 + z 0 c 2(n − 1)z 0 = 0(c 1z 0 − 2c 1z 0 + c 1z 0 ) + (c 2nz 0 − 2c 2nz 0 + c 2nz 0 ) + (c 2z 0
Example 3.10 The ACF of an AR(2) Process

Suppose we have a causal AR(2) process, multiply both sides of the equation by x t − h, take expected values, then divide by γ(0).
(
∞
x t = ϕ 1x t − 1 + ϕ 2x t − 2 + w tx tx t − h = ϕ 1x t − 1x t − h + ϕ 2x t − 2x t − h + w tx t − hE(x tx t − h) = ϕ 1E(x t − 1x t − h) + ϕ 2E(x t − 2x t − h) + E(w tx t − h)γ(h) = ϕ 1γ(h − 1) + ϕ 2γ(h − 2) + E w t ∑ ψ jw t − h

j=0
This gives us a difference equation with an associated polynomial.
ρ(h) − ϕ 1ρ(h − 1) − ϕ 2ρ(h − 2) = 0ϕ(z) = 1 − ϕ 1z − ϕ 2z 2
Let’s take the initial conditions.
ϕ1
ρ(0) = 1ρ(1) = ρ( − 1) =
1 − ϕ2
When the roots of the associated polynomial are distinct and real:
−h −h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + (1 − c 1)z 2
When the associated polynomial is square:
−1 −h − (h+1)
ρ(h) = (1 − c 2z 0 )z 0 + c 2(h + 1)z 0
When the roots are complex, the associated polynomial has real coefficients, so the roots are conjugate. To get real ρ(h), the constants will need to
be conjugate.
¯ ¯ −h
−h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + c 1z 1
Write the roots in polar coordinates:
¯
z 1 = | z 1 | e θiz 1 = | z 1 | e − θi
¯
− h − hθi − h hθi −h
ρ(h) = c 1 | z 1 | e + c1 | z1 | e ρ(h) = a | z 1 | cos(hθ + b)
a, and b are constants determined by initial conditions.

Example 3.11 An AR(2) with Complex Roots

Let’s consider the model
2
x t = 1.5x t − 1 − 0.75x t − 2 + w tσ w = 1roots = 1 ± i / √3θ = arctan(1 / √3) = 2π / 121 / 12 cycles per unit time
Set coefficients of autoregressive model, establish polynomial.
coef_phi <- c(
1,-1.5,0.75
) # coefficients of the polynomial
polyroot_phi <- polyroot(
z = coef_phi
)
a <- polyroot_phi[1] # = 1+0.57735i, print one root which is 1 + i 1/sqrt(3)
Arg_a = Arg(
z = a
)/(2*pi) # arg in cycles/pt
1/Arg_a # = 12, the period
## [1] 12
Simulate the autoregressive time-series
set.seed(
seed = 823
)
sarima.sim_phi = astsa::sarima.sim(
ar = c(
1.5,-.75
),
n = 144,
S = 12
)
## Note that S > 0 but no seasonal parameter is specified
Plot the simulated autoregressive time-series
astsa::tsplot(
x = sarima.sim_phi,
xlab = "Year"
)
Compute and plot the autocorrelation function of the autoregressive model using model (not simulation)

ARMAacf_phi = ARMAacf(
ar = c(
1.5,-0.75
),
ma = 0,
lag.max = 50
)
astsa::tsplot(
x = ARMAacf_phi,
type = "h",
xlab = "lag"
)
abline(
h = 0,
col = 8
)
Convert the autoregressive model to a moving average
# psi-weights - not in text

psi = ts(
data = ARMAtoMA(
ar = c(
1.5,-0.75
),
ma = 0,
lag.max = 50
),
start = 0,
freq = 12
)
astsa::tsplot(
x = psi,
type = 'o',
cex = 1.1,
ylab = expression(
psi-weights
),
xaxt = 'n',
xlab = 'Index'
)
axis(
side = 1,
at = 0:4,
labels = c(
'0','12','24','36','48'
)
)

Example 3.12 The ψ-weights for an ARMA Model

Consider a causal ARMA(p,q) model with roots outside the unit circle.
∞
ϕ(B)x t = θ(B)w tx t = ∑ ψ jw t − j
j=0
For a pure MA(q) model (p = 0),
ψ 0 = 1ψ j = θ j, j = 1, 2, …, qψ j = 0, j > q
Otherwise there is we need to do some multiplication.
ϕ(z)ψ(z) = θ(z)(1 − ϕ 1z − … − ϕ pz p)(ψ 0 + ψ 1z + ψ 2z 2 + …) = (1 + θ 1z + θ 2z 2 + … + θ qz q)
This gives us these equations:
ψ0 = 1
ψ 1 − ϕ 1ψ 0 = θ 1
ψ 2 − ϕ 1ψ 1 − ϕ 2ψ 1 = θ 2
ψ 3 − ϕ 1ψ 2 − ϕ 2ψ 1 − ϕ 3ψ 0 = θ 3
⋮
Notice that ψ j = 0 for larger j.
The actual solution will depend on the roots of the polynomials and the initial conditions.
Create an ARMA model with geometric autoregressive portion, geometric moving average portion
psi = ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 50
) # for a list
astsa::tsplot(
x = psi,
type = 'h',
ylab = expression(
psi-weights
),
xlab = 'Index'
) # for a graph

3.3 Autocorrelation and Partial Autocorrelation
Behavior of the ACF and PACF of ARMA processes

Metric AR(P) MA(q) ARMA(p,q)
ACF tails off cuts off after lag q tails off
PACF cuts of after lag p tails off tails off
3.3.1 The Autocorrelation Function (ACF)

Consider a MA(q) process, it is a finite linear combination of white noise, its expected value is zero.
q q
x t = θ(B)w tθ(B) = B 0 + θ 1B + … + θ qB qx t = ∑ θ jw t − jθ 0 = 1E(x t) = ∑ θ jE(w t − j) = 0
j=0 j=0
The autocovariance function for a MA(q) process is
( )
q q
= cov ∑ θ jw t + h − j, ∑ θ kw t − k
j=0 k=0
=
{ σ 2w ∑ qj =−0hθ jθ j + h
0
0≤h≤q
h>q
Recall:
γ(h) = γ( − h)
Note:
θ q ≠ 0 ⇒ γ(q) = σ 2wθ 0θ q = σ 2wθ q ≠ 0
The autocorrelation of a MA(q) process:
{
q−h
∑ j = 0 θ jθ j + h
q 2
0≤h≤q
ρ(h) = ∑ j = 0θ j
0 h>q
For a causal ARMA(p,q) process
( )
∞ ∞ ∞
ϕ(B)x t = θ(B)w t | z | > 1 ∀z s. t. ϕ(z) = 0x t = ∑ ψ jw t − jE(x t) = E ∑ ψ jw t − j = ∑ ψ jE(w t − j) = 0
j=0 j=0 j=0
Autocovariance of a causal ARMA(p,q) process

∞
γ(h)
γ(h) = cov(x t + h, x t) = σ 2w ∑ ψ jψ j + h h ≥ 0ρ(h) =
γ(0)
j=0
Let’s write the autocovariance equation in another way.
( )
p q
= cov ∑ ϕ jx t + h − j + ∑ θ jw t + h − j, x t
j=1 j=0
( ) ( )
p q
= cov ∑ ϕ jx t + h − j, x t + cov ∑ θ jw t + h − j, x t
j=1 j=0
p q
= ∑ ϕ jcov (x t + h − j, x t ) + ∑ θ jcov (w t + h − j, x t )
j=1 j=0
( )
p q ∞
= ∑ ϕ jγ(h − j) + ∑ θ jcov w t + h − j, ∑ ψ kw t − k
j=1 j=0 k=0
p q ∞
= ∑ ϕ jγ(h − j) + ∑ ∑ θ jψ kcov (w t + h − j, w t − k )
j=1 j = 0k = 0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=h
p q
= ∑ ϕ jγ(h − j) + σ 2w ∑ θ jψ j − h
j=1 j=h
general homogeneous equation for the ACF of a causal ARMA process:
γ(h) − ∑ ϕ jγ(h − j) = 0, h ≥ max(p, q + 1)

j=1
with initial conditions
p q
γ(h) − ∑ ϕ jγ(h − j) = σ 2w ∑ θ jψ j − h, 0 ≤ h ≤ max(p, q + 1)

j=1 j=h
Example 3.13 The ACF of an AR(p) (q = 0)

p
γ(h) − ∑ ϕ jγ(h − j) = 0, h ≥ p
j=1
Say that z 1, …, z r are the roots of ϕ(z) with multiplicity m 1, …, \m r, m 1 + … + m r = p. Using difference equations, we see that there are polynomials in
h of degree m j − 1, P j(h) such that
r
ρ(h) = ∑ z j− hP j(h), h≥p
j=1
If the process is causal, then | z j | > 1 ∀j, and ρ(h) → 0 exponentially fast as h → ∞. If there are conjugate roots, conjugates will cancel the
imaginary parts and the dampening will be sinusoidal (the time series will appear cyclic).
Example 3.14 The ACF of an ARMA(1,1)

Consider an ARMA(1,1) process
x t = ϕx t − 1 + θw t − 1 + w t | ϕ | < 1
The autocovariance function is
γ(h) − ϕγ(h − 1) = 0, h ≥ 2γ(h) = cϕ h h ≥ 1
with initial conditions
(1 + θϕ)(ϕ + θ)
γ(0) = ϕγ(1) + σ 2w[1 + θϕ + θ 2]γ(1) = σ 2w
1 − ϕ2
γ(1) γ(1) h (1 + θϕ)(ϕ + θ) h − 1 (1 + θϕ)(ϕ + θ) h − 1

γ(1) = cϕc = γ(h) = ϕ = σ 2w ϕ ρ(h) = ϕ , h≥1
ϕ ϕ 1 − ϕ2 1 + 2θϕ + θ 2
Note: ρ(h) for AR(1) and ARMA(1,1) are similar. We will be unable to tell the difference between AR(1) and ARMA(1,1) using ACF only.

3.3.2 The Partial Autocorrelation Function (PACF)

MA(q) models will have zero ACF for lags greater than q, and ρ(q) ≠ 0.
We can use these facts to identify the order of a moving average process.
However if the process is AR or ARMA, the ACF tells us little.
MA process: use ACF to select your model

AR or ARMA process: use PACF to select model
ρ XY | Z = corr(X − X̂, Y − Ŷ)X̂ is X regressed on ZŶ is Y regressed on Z
ρ XY | Z measures the correlation between X and Y with the linear effect of Z removed (partialled out).
If X, Y, Z are multivariate normal, then
ρ XY | Z = corr(X, Y | Z)
Example
Say that we have an AR(1) model
x t = ϕx t − 1 + w t
γ x(2) = cov(x t, x t − 2)
= cov(ϕx t − 1 + w t, x t − 2)
= cov(ϕ 2x t − 2 + ϕw t − 1 + w t − 2, x t − 2)
= ϕ 2γ x(0)
h
x t + h = ϕ hx t + ∑ ϕ h − jw t + jγ x(h) = cov(x t + h, x t) = ϕ hγ x(0)
j=1
x t + h is dependent on x t + h − 1, x t + h − 1 is dependent on x t + h − 2, x t + h − 2 is dependent on x t + h − 3,…,x t + 1 is dependent on x t.
Because of this chain, x t + h is dependent on x t.
For x t and x t − 2, let’s remove the middle.
(
cov x t − ϕx t − 1, x t − 2 −
1
x
ϕ t−1 ) ( 1
= cov w t, − w t − 1 = 0
ϕ )
PACF for mean-zero stationary time series
For h ≥ 2, let x̂ t + h be the regression of x t + h onto {x t + h − 1, x t + h − 2, …, x t + 1} (minimizing the mean squared error).
x̂ t + h = β 1x t + h − 1 + β 2x t + h − 2 + … + β h − 1x t + 1
No intercept is needed because E(x t) = 0 ∀t, if this is not the case, replace x t with x t − μ x.
Because of stationarity, the coefficient are the same if we shift the index.
x̂ t = β 1x t − 1 + β 2x t − 2 + … + β h − 1x t − h + 1
Definition 3.9 partial autocorrelation function (PACF)

The partial autocorrelation function (PACF) of a stationary process, x t, denoted ϕ hh, h = 1, 2, 3, … is
ϕ 11 = corr(x t + 1, x t)ϕ hh = corr(x t + h − x̂ t + h, x t − x̂ t), h ≥ 2
The partial autocorrelation function, ϕ hh, is the correlation between x t + h and x t with the linear dependence on {x t + 1, x t + 2, …, x t + h − 1} removed
If x t is Gaussian, then ϕ hh is the correlation coefficient between x t + h and x t in the bivariate distribution (x t + h, x t) conditioned on
{x t + 1, x t + 2, …, x t + h − 1}.
ϕ hh = corr(x t + h, x t | x t + 1, x t + 2, …, x t + h − 1)
Example 3.15 The PACF of an AR(1)

Consider the AR(1) process
x t = ϕx t − 1 + w t | ϕ | < 1ϕ 11 = ρ(1) = ϕ
To calculate ϕ 22, let’s regress x t + 2 onto x t + 1.
2 2
x̂ t + 2 = βx t + 1E(x t + 2 − x̂ t + 2) 2 = E(x t + 2 − βx t + 1) 2 = E(x t + 2 − 2βx t + 2x t + 1 + β 2x t + 1) = γ(0) − 2βγ(1) + β 2γ(0)
Minimizing the quadratic on β gives
γ(1) ϕγ(0)
β= = =ϕ
γ(0) γ(0)
ϕ 22 = corr(x t + 2 − x̂ t + 2, x t − x̂ t) = corr(x t + 2 − ϕx t + 1, x t − ϕx t − 1) = corr(w t + 2, w t) = 0
In fact ϕ hh = 0 ∀h > 1.

Example 3.16 The PACF of an AR(p)

Say the x t is an AR(p) process with roots outside the unit circle.
xt + h = ∑ ϕ jx t + h − j + w t + h
j=1
When h > p, the regression of x t + h onto {x t + 1, x t + 2, …, x t + h − 1} is
x̂ t + h = ∑ ϕ jx t + h − j
j=1
When h > p, there is zero partial autocorrelation between observations.
ϕ hh = corr(x t + h − x̂ t + h, x t − x̂ t) = corr(w t + h, x t − x̂ t) = 0
Note that x t − x̂ t depends on white noise of lower indices.
When h ≤ p, ϕ hh ≠ 0, and ϕ 11, ϕ 22, …, ϕ p = 1 , p − 1 may or may not be zero.
Let’s demonstrate this with an AR(2) process.
coef_ar_2 <- c(
1.5,-0.75
)
ARMAacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24
)[-1]
ARMApacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24,
pacf = TRUE
)
#par(mfrow=1:2)
astsa::tsplot(
x = ARMAacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)

astsa::tsplot(
x = ARMApacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)
Example 3.17 The PACF of an Invertible MA(q)

Say that we have an invertible MA(q) process.
∞
xt = − ∑ π jx t − j + w t
j=1
No finite representation exists. The PACF will never cut off (in contrast to an AR(p) process).
If we have a MA(1) process
x t = w t + θw t − 1 | θ | < 1
then
( − θ) h(1 − θ 2)
ϕ hh = , h≥1
1 − θ2 ( h + 1 )
Example 3.18 Preliminary Analysis of the Recruitment Series

Let’s use ACF and PACF to select a model for the rec time series.
The PACF cuts of after 2, and ACF tails off. Let’s use an AR(2) model.
data(
list = "rec",
package = "astsa"
)
astsa::acf2(
series = rec,
max.lag = 48
) # will produce values and a graphic

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.92 0.78 0.63 0.48 0.36 0.26 0.18 0.13 0.09 0.07 0.06 0.02 -0.04
## PACF 0.92 -0.44 -0.05 -0.02 0.07 -0.03 -0.03 0.04 0.05 -0.02 -0.05 -0.14 -0.15
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.12 -0.19 -0.24 -0.27 -0.27 -0.24 -0.19 -0.11 -0.03 0.03 0.06 0.06
## PACF -0.05 0.05 0.01 0.01 0.02 0.09 0.11 0.03 -0.03 -0.01 -0.07 -0.12
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.02 -0.02 -0.06 -0.09 -0.12 -0.13 -0.11 -0.05 0.02 0.08 0.12 0.10
## PACF -0.03 0.05 -0.08 -0.04 -0.03 0.06 0.05 0.15 0.09 -0.04 -0.10 -0.09
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.06 0.01 -0.02 -0.03 -0.03 -0.02 0.01 0.06 0.12 0.17 0.20
## PACF -0.02 0.05 0.08 -0.02 -0.01 -0.02 0.05 0.01 0.05 0.08 -0.04
ar.ols_rec = ar.ols(
x = rec,
order = 2,
demean = FALSE,
intercept = TRUE
) # regression
ar.ols_rec
##
## Call:
## ar.ols(x = rec, order.max = 2, demean = FALSE, intercept = TRUE)
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: 6.737 (1.111)
##
## Order selected 2 sigma^2 estimated as 89.72
ar.ols_rec$asy.se.coef # standard errors
## $x.mean
## [1] 1.110599
##
## $ar
## [1] 0.04178901 0.04187942
list_ar <- list()

for(j in c("yule-walker","burg","ols","mle","yw")) list_ar[[j]] <- ar(
x = rec,
method = j,
order.max = 12
)
list_ar

## $`yule-walker`
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
##
## $burg
##
## Call:
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
##
## $ols
##
## Call:
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: -0.05644 (0.446)
##
##
## $mle
##
## Call:
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
##
## $yw
##
## Call:
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
ar.burg(
x = rec,
order.max = 12
)
##
## Call:
## ar.burg.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
ar.yw(
x = rec,
order.max = 12
)

##
## Call:
## ar.yw.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
ar.mle(
x = rec
)
##
## Call:
## ar.mle(x = rec)
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
Plot the time series and the modeled values
predict_rec <- predict(

object = ar.ols_rec,
newdata = rec,
n.ahead = 24
)
ts.plot(
ts.union(
rec,predict_rec$pred
),
col = 1:2
)
3.4 Forecasting
3.4.1 Forecasting AR Processes
When we forecast, we are predicting the future values of our time series, x n + m, m = 1, 2, 3, …, using observed values, x 1 : n = {x 1, x 2, …, x n}.
In this section, we assume our time series is stationary and the model parameters are known.
The minimum mean square error predictor

x nn + m = E(x n + m | x 1 : n)E[x n + m − E(x n + m | x 1 : n)] 2 ≤ E[x n + m − g(x 1 : n)] 2 ∀g functions on x 1 : n

Linear predictors
For an initial look at possible predictors, let’s restrict our attention to linear functions on the data.
Linear predictors only depend on the second moment of the process.
n
n
xn + m = α0 + ∑ α kx kα 0, α 1, …, α n ∈ R
k=1
Note: The coefficients depend on both n and m, for now we will drop this fact from the notation.
example
1 2
If n = 1, m = 1, x 2 is a one-step-ahead forecast of x 2 given x 1. A linear predictor is x 2 = α 0 + α 1x 1.
2 2
If n = 2, m = 1, x 3 is a one-step-ahead forecast of x 3 given x 1, x 2. A linear predictor is x 2 = α 0 + α 1x 1 + α 2x 2.
1 2
In general, the coefficients of x 2, x 3 will be different.
Best linear predictors

Linear predictors that minimize the mean square prediction are called best linear predictors.
Theorem B.1 Projection Theorem

Let H be a Hilbert space, M a closed subspace of H, then ∀y ∈ H, ∃ ! ŷ ∈ M and ∃ ! z ∈ H − M such that
y = ŷ + z is a unique representation of y, and

< z, v >= 0 ∀v ∈ M, and
| | y − ŷ | | ≤ | | y − v | | ∀v ∈ M
Proof
If y ∈ M, then ŷ = y, z = 0.
If y ∉ M, let δ = inf v ∈ M | | y − v | | . As an infinum, ∃ (v n) s. t. | | y − v n | | → δ. Select (v n) s. t. | | y − v n | | 2 ≤ δ 2 + 1 / n.
vn + vm
We need to show that (v n) is a Cauchy sequence. Since v n, v m are in M, | | y − | | 2 ≥ δ2
2
| | v n − v m | | 2 = | | (y − v n) − (y − v m) | | 2
= | | y − v n | | 2 + | | y − v m | | 2 − 2 < y − v n, y − v m >
vn + vm
= | | y − vn | | 2 + | | y − vm | | 2 + | | y − vn | | 2 + | | y − vm | | 2 − 4 | | y − | |2
2
vn + vm
= 2 | | y − vn | | 2 + 2 | | y − vm | | 2 − 4 | | y − | |2
2
≤ 2(δ 2 + 1 / n) + 2(δ 2 + 1 / m) − 4δ 2 = δ 2(1 / n + 1 / m)
Thus (v n) is a Cauchy sequence in a Hilbert space, therefore the sequence converges. Say to ŷ. This shows existence and the minimum norm
property.
To show uniqueness, say that there are two such vectors ŷ 1, ŷ 2.
y1 + y2
| | y1 − y2 | | 2 = 2 | | y − y1 | | 2 + 2 | | y − y2 | | 2 − 4 | | y − | |2
2
= 2δ 2 + 2δ 2 − 4δ 2
=0
Now we need to show that y − ŷ is orthogonal to M. Let y 0 be any length one vector in M.
| | y0 | | = 1
< y − ŷ − αy 0, y − ŷ − αy 0 >= < y − ŷ, y − ŷ > + α 2 < y 0, y 0 > − 2α < y − ŷ, y 0 >
This is a quadratic on α, Minimize this
2 < y − ŷ, y 0 > < y − ŷ, y 0 >
α= =
2 < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 >

( < y − ŷ, y 0 >
) < y − ŷ, y 0 >
2
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > + < y 0, y 0 > − 2 < y − ŷ, y 0 >
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 > 2 < y − ŷ, y 0 > 2
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > + −2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
2
< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > − 2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >≤ < y − ŷ, y − ŷ >
< y 0, y 0 > < y 0, y 0 >
For y − ŷ to be minimal, < y − ŷ, y o >= 0.

We will skip a lot of details to focus on the meat and potatoes.
For our purpose, let’s set the dot product to the expected value of the product.
< X, Y >= cov(X, Y) = E((X − μ x)(Y − μ y))
Without loss of generality, we will proceed with univariate expected values of zero for ease of notation.
Theorem B.3 For a Gaussian process, the minimum mean square error predictor is
the best linear predictor (the projection is the expected value)
If (y, x1, . . . , xn) is multi-variate normal, then
E(y | x 1 : n) = projection span ( 1 , x . . . , x n ) (y)

1 , x2 ,
Proof:
Let ŷ = E M ( x ) y be the unique element of M(x) that satisfies the orthogonality principle.
E[(y − E M ( x ) y)w] = 0 ∀w ∈ M(x)
Our goal is to show that
E M ( x ) y = projection span ( 1 , x . . . , x n ) (y).

1 , x2 ,
Let ŷ = projection span ( 1 , x . . . , x n ) (y) from the orthogonality principle element,

1 , x2 ,
< y − ŷ, x i >= cov(y − ŷ, x i) = 0 i = 0, 1, …, n.
Since ŷ is fixed, (y − ŷ, x 0, x 1, . . . , x n) is multivariate normal. Thus zero covariance gives us independence between y − ŷ and x i.
From the independence, we can factor covariance into expected values.
0 =< y − ŷ, w >= E[(y − ŷ)w] = E(y − ŷ)E(w)x 0 = 10 =< y − ŷ, x 0 >= E[(y − ŷ)x 0] = E(y − ŷ)0 = E(y − ŷ)ŷ = E(y)
Property 3.3 Best Linear Prediction for Stationary Processes (the prediction
equations)
n n
Given observations x 1, x 2, . . . , x n, coefficients of the best linear predictor x n + m = α 0 + ∑ k = 1α kx k of x n + m for m ≥ 1 solves
n
E[(x n + m − x n + m)x k] = 0, k = 0, 1, . . . , n
where x 0 = 1.
n ∂Q
We can solve for α 0, α 1, . . . , α n by minimizing Q = E(x n + m − ∑ k = 0α kx k) 2 with respect to the αs; = 0, j = 0, 1, . . . , n.
∂α j
Set k = 0.
( ) ( )
n n n n
E[(x n + m − x nn + m)1] = 0E(x nn + m) = E(x n + m) = μE α 0 + ∑ α kx k = μα 0 + ∑ α kμ = μα 0 = μ 1− ∑ αk x nn + m = μ + ∑ α k(x k − μ)
k=1 k=1 k=1 k=1
The best linear predictor is
x nn + m = μ + ∑ α k(x k − μ)
k=1
one-step-ahead prediction
Given that x 1, . . . , x n were observed, we want to predict the next observation x n + 1.
Without loss of generality, say that μ = 0 (it follows that α = 0).
The best linear predictor is of the form
n
n
xn + 1 = ∑ ϕ njx n + 1 − j
j=1
Invoke orthogonality:
[( ) ]
n n n
E xn + 1 − ∑ ϕ njx n + 1 − j x n + 1 − k = 0, k = 1, . . . , nE(x n + 1x n + 1 − k) = ∑ ϕ njE(x n + 1 − jx n + 1 − k)γ(k) = ∑ ϕ njγ(k − j)
j=1 j=1 j=1
We can write the equations in matrix form.
()()
ϕ n1 γ(1)
Γ nϕ n = γ nΓ n(j, k) = γ(k − j), an n × n matrixϕ n = ⋮ γn = ⋮
ϕ nn γ(n)

Γ n is a symmetric matrix, and thus semi-positive definite. If Γ n is singular, then there are infinitely many solution for the coefficients, but by the
n
projection theorem x n + 1 is unique.
If Γ n is non-singular, then ϕ n is unique and
ϕ n = Γ n− 1γ n
2
For ARMA models, σ w > 0, and γ(h) → 0 as h → ∞ makes Γ n non-singular almost surely.
()
xn
xn − 1
n ′
xn + 1 = ϕn
⋮
x1
mean square one-step-ahead prediction error

n n
P n + 1 = E(x n + 1 − x n + 1) 2
= E(x n + 1 − ϕ n′ x) 2
= E(x n + 1 − γ n′ Γ n− 1x) 2
2 ′ −1 ′ −1 −1
= E(x n + 1 − 2γ nΓ n xx n + 1 + γ nΓ n xx ′ Γ n γ n)
2 ′ −1 ′ −1 −1
= E(x n + 1) − 2γ nΓ n E(xx n + 1) + γ nΓ n E(xx ′ )Γ n γ n)
′ −1 ′ −1 −1
= γ n(0) − 2γ nΓ n γ n + γ n Γ n Γ nΓ n γ n
′ −1
= γ n(0) − γ nΓ n γ n
Example 3.19 Prediction for an AR(2) (Verify that the matrix equation gives the
correct coefficients)
Suppose we have a causal AR(2) process
x t = ϕ 1x t − 1 + ϕ 2x t − 2 + w t
The one-step ahead prediction with one observation x 1 is
−1
1 1
γ(1)
ϕ n = Γ n γ nϕ 11 = γ(1)x 2 = ϕ 11x 1 = x 1 = ρ(1)x 1
γ(0) γ(0)
The one-step ahead prediction with two observations x 1, x 2 is
ϕ n = Γ n− 1γ n
( )(
ϕ 21
ϕ 22
=
γ(0)
γ(1)
γ(1)
γ(0) ) ( ) −1 γ(1)
γ(2)
=
1 γ(0)
γ(0) 2 − γ(1) 2 − γ(1) ( − γ(1)
γ(0) )( )
γ(1)
γ(2)
Now let’s invoke the structure of the AR(2) process.
2 2 2 2
x 3 = ϕ 1x 2 + ϕ 2x 1 + w 3x 3 = ϕ 1x 2 + ϕ 2x 1x 3 − x 3 = x 3 − (ϕ 1x 2 + ϕ 2x 1) = w 3E((x 3 − x 3)x 1) = E(w 3x 1) = 0E((x 3 − x 3)x 2) = E(w 3x 2) = 0
It follows by the uniqueness of coefficients that
( )()
ϕ 21
ϕ 22
=
ϕ1
ϕ2
If we repeat this for larger values of n, we see that
x nn + 1 = ϕ 1 + x n + ϕ 2x n − 1
( )() ϕ n1
ϕ n2
=
ϕ1
ϕ2
If we extend these computations to AR(p) processes, we see that
p
n
xn + 1 = ∑ ϕ jx n + 1 − j
j=1
3.4.2 Iterative Algorithms for Forecasting Time Series

The AR(p) one-step ahead prediction equation is much simpler than for ARMA models. The AR(p) equation techniques would lead to large
matrices for inversion. Instead we will use recursive and iterative methods.
Property 3.4 The Durbin–Levinson Algorithm

The matrix form of the prediction equations

Γ nϕ n = γ n
and the mean square one-step ahead prediction error
n n
P n + 1 = E(x n + 1 − x n + 1) 2
can be solved iteratively as follows:
ϕ 00 = 0, P 01 = γ(0)
For n ≥ 1,
n−1
ρ(n) − ∑ k = 1 ϕ n − 1 , kρ(n − k)
n n−1 2
ϕ nn = Pn + 1 = Pn (1 − ϕ nn)
1 − ∑ nk =− 11ϕ n − 1 , kρ(k)
The general formula for the mean square one-step-ahead prediction error is
P n + 1 = γ(0) ∏ [1 − ϕ jj]
n 2
j=1
For n ≥ 2:
ϕ nk = ϕ n − 1 , k − ϕ nnϕ n − 1 , n − k, k = 1, 2, . . . , n − 1
Example 3.20 Using the Durbin–Levinson Algorithm

To use the Durbin–Levinson algorithm, start with
n = 1:
1 2
ϕ 11 = ρ(1)P 2 = γ(0)[1 − ϕ 11]
n = 2:
ρ(2) − ϕ 11ρ(1)
ϕ 22 = ϕ 21 = ϕ 11 − ϕ 22ϕ 11P 23 = P 12[1 − ϕ 22 2] = γ(0)[1 − ϕ 211][1 − ϕ 22 2]
1 − ϕ 11ρ(1)
n = 3:
ρ(3) − ϕ 21ρ(2) − ϕ 22ρ(1)

ϕ 33 = ϕ 32 = ϕ 22 − ϕ 33ϕ 21ϕ 31 = ϕ 21 − ϕ 33ϕ 22P 34 = P 23[1 − ϕ 233] = γ(0)[1 − ϕ 211][1 − ϕ 22 2][1 − ϕ 233]
1 − ϕ 21ρ(1) − ϕ 22ρ(2)
Let’s use the Durbin-Levinson algorithm on some data. First using a function, then computing manually. Previously, we used PACF to establish that
the rec time-series should be modeled with an AR(2).
Load the time-series
data(
list = "rec",
package = "astsa"
)
acf_rec <- as.vector(acf(
x = rec
)$acf)
Use a function to run the Durbin-Levinson algorithm

gsignal::levinson(
acf = acf_rec,
p = 2
)
## $a
## [1] 1.0000000 -1.3315874 0.4445447
##
## $e
## [1] 0.1205793
##
## $k
## [1] -0.9218042 0.4445447
Compute Durbin-Levinson algorithm manually
phi00 <- 0
P10 <- var(
x = rec
)
phi11 <- acf_rec[2]
P21 <- P10*(1-phi11^2)
phi22 <- (acf_rec[3] - phi11*acf_rec[2])/(1 - phi11*acf_rec[2])
phi21 <- phi11 - phi22*phi11
P32 <- P10*(1 - phi11^2)*(1 - phi22^2)
print(
x = c(phi21,phi22)
)
## [1] 1.3315874 -0.4445447
Property 3.5 Iterative Solution for the PACF

The PACF of a stationary process x t, can be obtained iteratively as ϕ nn, n = 1, 2, 3, . . . .
Using iterative solution for the PACF and setting n = p, it follows that for an AR(p) model,
x pp + 1 = ϕ p1x p + ϕ p2x p − 1 + . . . + ϕ ppx 1 = ϕ 1x p + ϕ 2x p − 1 + . . . + ϕ px 1
This shows that for an AR(p) model, the partial autocoefficient at lag p, ϕ pp, is also the last coefficient in the model, ϕ p.
Example 3.21 The PACF of an AR(2)

Let’s use the previous example’s pencil and paper calculations (not the R code) to find the coefficients for an AR(2) model.
We are working with an AR(2), in the difference equations section, we showed that this equations hold.
ϕ1
ρ(h) − ϕ 1ρ(h − 1) − ϕ 2ρ(2) = 0, h ≥ 1ρ(1) = ρ(2) = ϕ 1ρ(1) + ϕ 2ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1) = 0
1 − ϕ2
Combining the difference equations results with the iterative solution equations, we get:
ϕ1
( ) ϕ1 2
ϕ1 + ϕ2 −
ϕ1 2 1 − ϕ2 1 − ϕ2 ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1)
ρ(2) − ρ(1)
ϕ 11 = ρ(1) = ϕ 22 = = = ϕ 2ϕ 21 = ρ(1)(1 − ϕ 2) = ϕ 1ϕ 33 = =0
1 − ϕ2 1 − ρ(1) 2 1 − ϕ 1ρ(1) − ϕ 2ρ(2)
( ) ϕ1 2
1−
1 − ϕ2
Notice that ϕ 22 = ϕ 2.
Best linear predictor for more than one-step ahead predictions

If we want to predict x n + m using observations x 1, x 2, . . . , x n and a linear predictor, then our model is
n (m) (m) (m)

x n + m = ϕ n1 x n + ϕ n2 x n − 1 + . . . + ϕ nn x 1
(m) (m) (m)

where the coefficients ϕ n1 , ϕ n2 , . . . , ϕ nn satisfy the prediction equations.
n n
∑ ϕ nj( m ) E(x n + 1 − jx n + 1 − k) = E(x n + mx n + 1 − k), k = 1, 2, . . . , n ∑ ϕ nj

(m)
γ(k − j) = γ(m + k − 1)
j=1 j=1
These prediction equations can be written in matrix form.

( ) ()
(m)
γ(m) ϕ n1
(m) (m) (m) ⋮ (m) ⋮
Γ nϕ n = γn γn = ϕn =
γ(m + n − 1) (m)
ϕ nn
Mean square m-step ahead prediction error

(m)′
P nn + m = E(x n + m − x nn + m) 2 = γ(0) − γ n Γ n− 1γ n( m )
Property 3.6 The Innovations Algorithm

When x t is a mean-zero stationary time series,
s−1 t−1
cor(x s − x s , xt − xt ) = 0, s ≠ t
t−1
Using this uncorrelated property and the projection theorem, we can derive the innovations algorithm. x t − x t are called innovations.
One-step-ahead innovations algorithm

When we have observations x 1, x 2, . . . , x n, the one-step-ahead calculations are computed for t = 1, 2, . . . , n.
t t−1 j−1 k
γ(t − j) − ∑ k = 0θ j , j − kθ t , t − kP k + 1
x 01 = 0P 01 = γ(0)x tt + 1 = ∑ θ tj(x t + 1 − j − x tt −+ j1 − j), t = 1, 2, . . . P tt + 1 = γ(0) − ∑ θ 2t , t − jP jj + 1, t = 1, 2, . . . θ t , t − j =
j=1 j=0 P jj + 1
m-step-ahead innovations algorithm

After the one-step-ahead calculations are complete for t = 1, 2, . . . , n, the coefficients are obtained by continued iterations, and the following
formulas are used to get the m-step-ahead forecasts and mean-squared errors.
n+m−1 n+m−1
x nn + m = ∑ θ n + m − 1 , j(x n + m − j − x nn ++ m −j−1 n
m − j )P n + m = γ(0) − ∑ θ 2n + m − 1 , jP nn ++ m −j−1
m−j
j=m j=m
Example 3.22 Prediction for an MA(1)

The innovations algorithm lends itself well to prediction for moving average processes. Let’s apply the innovations algorithm to an MA(1) process.
2 2
x t = w t + θw t − 1γ(0) = (1 + θ 2)σ wγ(1) = θσ wγ(h) = 0 ∀h > 1
2 n−1 2
θσ w θ(x n − x n )σ w
0 0 2 n 2 n
x 1 = 0P 1 = (1 + θ 2)σ wθ n1 = θ = 0, j = 2, . . . , nP n + 1 = (1 + θ 2 − θθ n1)σ wx n + 1 =
n − 1 nj
Pn P nn − 1

3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 ARIMA Models - 3.1 Autoregressive Moving Average Models

Uploaded by

Copyright:

Available Formats

3/1/24, 10:23 AM 3 ARIMA Models - 3.

1 Autoregressive Moving Average Models

3 ARIMA Models - 3.1 Autoregressive Moving

You can find demonstrations of astsa capabilities at https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md

In addition, the News and ChangeLog files are at https://github.com/nickpoison/astsa/blob/master/NEWS.md

UCF students can download it for free through the library.

In the time series we allow the dependent variable to be influenced by

the past values of the independent variables and

3.1.1 Introduction to Autoregressive Models

Consider the case

x t = x t − 1 − 0.90x t − 2 + w tw t is Gaussian white noise with variance σ 2 = 1

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 1/37

If regularity persists, then forecasting for such a model might be a possibility.

x nn + 1 = x n − 0.9x n − 1x nn + 1 is the forecasted value using n observed data points

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 2/37

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 3/37

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 4/37

Definition 3.1 autoregressive model of order p

(x t − μ) = ϕ 1(x t − 1 − μ) + ϕ 2(x t − 2 − μ) + . . . + ϕ p(x t − p − μ) + w tx t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w t + αα = μ(1 − ϕ 1 − . . . − ϕ p)

Definition 3.2 The autoregressive operator

Combining this notation with an autoregressive model, we get

Example 3.1 The AR(1) Model

Iterating backwards k times, we get

Going backwards all the way gives

Theorem A.1 Mean Square Convergence of a Sequence

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 5/37

Proposition: Convergence of an AR(1) model

x t = ϕx t − 1 + w twith | ϕ | < 1, sup t variance(x t) < ∞

Take the expected value.

Let’s take the supremum with respect to d ≥ 0.

Proposition: Convergence of an AR(1) model

Combining the equations before the proof we get

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 6/37

This gives an expected value of zero.

Autocorrelation function (the computation here is different from the book):

= ∑ ∑ ϕ j + k + hE [w t − jw t − k ] (expected value is zero when t − j ≠ t − k)

The autocorrelation of our AR(1) model is

Recursive formula for autocorrelation:

Example 3.2 The Sample Path of an AR(1) Process

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 7/37

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 8/37

Example 3.3 Explosive AR Models and Causality

Clearly, because | ϕ | j increases without bound as j → ∞.

will not converge as k → ∞.

Let’s reverse the recursive equation:

Example 3.4 Every Explosion Has a Cause

then {x t} is non-causal stationary Gaussian process with

is an equivalent causal process.

file:///G:/My Drive/Time Series/NEU/Tài liệu/3-ARIMA-Models---3.1-Autoregressive-Moving-Average-Models.html 9/37

As a first step consider the AR(1) model.

Let’s define an operator for the stochastic term.

Putting the two operators together we see that

(B 0 − ϕB)(B 0 + ψ 1B 1 + ψ 2B 2 + … + ψ jB j + …) = B 0B 0 + (ψ 1 − ϕ)B + (ψ 2 − ψ 1ϕ)B 2 + … + (ψ j − ψ j − 1ϕ)B j + … = B 0

Matching the coefficients we see that

Leading to the solution

Another approach is to invoke an inverse, ϕ − 1(B).

ϕ(B)x t = w tϕ − 1(B)ϕ(B)x t = ϕ − 1(B)w tx t = ϕ − 1(B)w t

Thus we see that

Consider the polynomial function and its rational function inverse

3.1.2 Introduction to Moving Average Models

Definition 3.3 The moving average model

The moving average model with backshift notation

Definition 3.4 The moving average operator is