Professional Documents
Culture Documents
This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway, David S. Stoffer https://github.com/nickpoison/tsa4
(https://github.com/nickpoison/tsa4)
The most recent version of the package can be found at https://github.com/nickpoison/astsa/ (https://github.com/nickpoison/astsa/)
The webpages for the texts and some help on using R for time series analysis can be found at https://nickpoison.github.io/
(https://nickpoison.github.io/).
Classic regression only allows the dependent variable to be influenced by current values of the independent variables.
N <- 5
n <- 500
M <- as.data.frame(
x = matrix(
data = NA,
nrow = n + 2,
ncol = N
)
)
for(j in 1:N){
v <- rep(NA,n+2)
v[1:2] <- runif(
n = 2,
min = -10,
max = 10
)
for(k in 1:n){
v[k+2] <- v[k+1] - 0.9*v[k] + rnorm(1)
}
M[,j] <- v
}
M$time <- (-1):n
gather_M <- tidyr::gather(
data = M[1:20,],
key = "time_series",
value = "value",
-time
)
library(ggplot2)
ggplot(gather_M) +
aes(x = time,y = value,color = time_series) +
geom_line() +
theme_bw() +
theme(legend.position = "none")
We have now assumed the current value is a particular linear function of past values.
The feasiblity of such a model can be assessed using autocorrelation and lagged scatter plot matrices.
astsa::acf1(
series = M$V1,
max.lag = 10
)
## [1] 0.51 -0.39 -0.87 -0.52 0.28 0.77 0.52 -0.18 -0.66 -0.51
astsa::lag1.plot(
series = M$V1,
max.lag = 4
)
astsa::scatter.hist(
x = M$V1,
y = c(
M$V1[-1],M$V1[nrow(M)]
)
)
The lagged scatterplot matrix for the Southern Oscillation Index (SOI), indicates that lags 1 and 2 are linearly associated with the current value.
data(
list = c(
"soi","rec"
),
package = "astsa"
)
astsa::lag1.plot(
series = soi,
max.lag = 12
)
The ACF shows relatively large positive values at lags 1, 2, 12, 24, and 36 and large negative values at 18, 30, and 42.
astsa::acf1(
series = soi,
max.lag = 36
)
## [1] 0.60 0.37 0.21 0.05 -0.11 -0.19 -0.18 -0.10 0.05 0.22 0.36 0.41
## [13] 0.31 0.10 -0.06 -0.17 -0.29 -0.37 -0.32 -0.19 -0.04 0.15 0.31 0.35
## [25] 0.25 0.10 -0.03 -0.16 -0.28 -0.37 -0.32 -0.16 -0.02 0.17 0.33 0.39
We note also the possible relation between the SOI and Recruitment series indicated in the scatterplot matrix.
astsa::lag2.plot(
series1 = soi,
series2 = rec,
max.lag = 8
)
x t = ϕ 1x t − 1 + ϕ 2x t − 2 + . . . + ϕ px t − p + w tx t is stationaryw t is white noise with expected value of zero and constant varianceϕ j are constant coefficientsϕ p ≠ 0E(x t) = 0
If E(x t) ≠ 0, then we can modify the model to get back to an expected value of zero
Some equations are easier to work with using the backshift operator.
(B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)x t = w t
ϕ(B) = (B 0 − ϕ 1B 1 − ϕ 2B 2 − . . . − ϕ pB p)
ϕ(B)x t = w t
x t = ϕx t − 1 + w t
x t = ϕx t − 1 + w t
= ϕ 2x t − 2 + ϕw t − 1 + w t
⋮
k−1
= ϕ kx t − k + ∑ ϕ jw t − j
j=0
t−1
x t = ϕ tx 0 + ∑ ϕ jw t − j
j=0
This naturally leads to using infinite series, to make the math work out we accept the assumptions that | ϕ | < 1, sup t variance(x t) < ∞, under these
assumptions
xt = ∑ ϕ jw t − j
j=0
ms
∞ ∞
then x n → ∑ j = 0ϕ jw t − j and ∑ j = 0ϕ jw t − j ∈ L 2
Proof:
By theorem A.1 when need to show that
lim sup n ≥ mE x n − x m
m→∞
( )2 = 0
Without loss of generality, say that n = m + d, d ≥ 0.
([ ][ ])
m+d−1 m−1 2
(x m + d − x m ) 2 = ϕ m + dx 0 +
j=0
∑ ϕ j w m + d − j − ϕ mx 0 + ∑ ϕ jw m − j
j=0
( )
m+d−1 2
= (ϕ d − 1)ϕ mx 0 + ∑ ϕ jw m + d − j
j=m
m+d−1 m + d − 1m + d − 1
2
= (ϕ d − 1) 2ϕ 2mx 0 + 2(ϕ d − 1)ϕ mx 0 ∑ ϕ jw m + d − j + ∑ ∑ ϕ j + kw m + d − jw m + d − k
j=m j=m k=m
m+d−1 m + d − 1m + d − 1
(
E xm + d − xm ) 2 = (ϕd − 1)2ϕ2mx20 + 2(ϕd − 1)ϕmx0j =∑m ϕ jE(w m + d − j) +
j=m
∑
k=m
∑ ϕ j + kE(w m + d − jw m + d − k)
m+d−1
= (ϕ d − 1) 2ϕ 2mx 0 +
2
∑ 2
ϕ 2jE(w m + d − j)
j=m
m+d−1
2
= (ϕ d − 1) 2ϕ 2mx 0 + σ w
2
∑ ϕ 2j
j=m
ϕ 2m − ϕ 2m + 2d + 2
= (ϕ d − 1) 2ϕ 2mx 20 + σ 2w
1−ϕ
[ ]
2
σw
2
= ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
[ ]
2
σw
(
sup d ≥ 0E x m + d − x m ) 2 2
= sup d ≥ 0ϕ 2m (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
σ 2w
Consider the series defined by the brackets with d as the index. Since | ϕ | < 1, the term in the bracket converges to x 20 + as d → ∞. This
1−ϕ
2
σw
means that for any given ϵ > 0, there is a finite number of terms that are greater than x 20 + + ϵ. Since there is a finite number of terms, the
1−ϕ
supremum is achieved as a maximum and it is finite.
(
sup d ≥ 0E x m + d − x m ) 2
[ 2
= ϕ 2mmax d ≥ 0 (1 − ϕ d) 2x 0 + (1 − ϕ 2d + 2)
1−ϕ
σ 2w
] (
lim sup d ≥ 0E x m + d − x m
m→∞
)2 = 0
This shows that the AR(1) process converges. The formula for x t that uses the summation with all past w t gives the limit of the series.
( )
k−1 2
lim k → ∞E x t − ∑ ϕ jw t − j
j=0
( )
= lim k → ∞ϕ 2kE x 2t − k = 0
xt = ∑ ϕ jw t − j
j=0
( )
∞
E(x t) = E ∑ ϕ jw t − j
j=0
= ∑ ϕ jE ( w t − j )
j=0
=0
γ(h) = cov(x t + h, x t)
[( )( )]
∞ ∞
=E ∑ ϕ jw t + h − j ∑ ϕ kw t − k
j=0 k=0
[( )( )]
∞ ∞
=E ∑ ϕ jw t − ( j − h ) ∑ ϕ kw t − k
j=0 k=0
[( )( )]
∞ ∞
=E ∑ ϕ j + hw t − j ∑ ϕ kw t − k
j= −h k=0
[ ]
∞ ∞
=E ∑ ∑ ϕ j + hϕ k w t − j w t − k
j = − hk = 0
∞ ∞
k=0
2
1
= σ wϕ h
1 − ϕ2
Recall that
γ(h) = γ( − h)
γ(h)
ρ(h) =
γ(h)
2
1
σ wϕ h
1 − ϕ2
=
1
σ 2wϕ 0
1 − ϕ2
= ϕh
ρ(h) = ϕρ(h − 1)
one with ϕ = 0.9, ρ(h) = 0.9 h and one with ϕ = − 0.9, ρ(h) = ( − 0.9) h.
h ≥ 0.
In the first case observations close together in time are positively correlated with each other. This result means that observations at contiguous
time points will tend to be close in value to each other.
This fact shows up in the first figure as a very smooth sample path for x t.
Contrast this with the case in which ϕ = − 0.9, so that ρ(h) = ( − 0.9) h, for h ≥ 0. This result means that observations at contiguous time points are
negatively correlated but observations two time points apart are positively correlated.
This fact shows up in the second figure, where, for example, if an observation, x t, is positive, the next observation, x t + 1, is typically negative, and
the next observation, x t + 2, is typically positive. Thus, in this case, the sample path is very choppy.
#par(mfrow=c(2,1))
# in the expressions below, ~ is a space and == is equal
astsa::tsplot(
x = astsa::sarima.sim(
ar = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==+0.9
)
)
abline(
h = 0,
col = "red"
)
astsa::tsplot(
x = astsa::sarima.sim(
ar = -0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
AR(1)~~~phi==-0.9
)
)
abline(
h = 0,
col = "red"
)
xt = xt − 1 + wt
is not stationary.
Consider an AR(1) model with | ϕ | > 1. Such processes are called explosive because the values of the time series quickly become large in
magnitude.
k−1
∑ ϕ jw t − j
j=0
k−1
x t + 1 = ϕx t + w t + 1x t = ϕ − 1x t + 1 − ϕ − 1w t + 1x t = ϕ − 1(ϕ − 1x t + 2 − ϕ − 1w t + 2) − ϕ − 1w t + 1 ⋮ x t = ϕ − kx t + k − ∑ ϕ − jw t + j
j=1
Since | ϕ − 1 | < 1, the AR(1) model is stationary future dependent. Unfortunately this model is useless.
For example, if
2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)
( )
∞ ∞
ϕ −2
E(x t) = 0γ x(h) = cov(x t + h, x t) = cov − ∑ ϕ − jw t + h + j, − ∑ ϕ − jw t + j 2
= σ wϕ − h
j=1 j=1 1 − ϕ −2
Let
2
y t = ϕ − 1y t − 1 + v tv t ∼ Normal(0, σ wϕ − 2)
x t and y t are stochastically the same, all finite distributions of the processes are the same.
Example:
If
2
x t = 2x t − 1 + w tσ w = 1
then
1 2 1
yt = y t − 1 + v tσ v =
2 4
Iterating backwards
To iterate backwards, let’s invoke the backwards operator.
∞
2
x t = ϕx t − 1 + w t, | ϕ | > 1w t ∼ iid Normal(0, σ w)x t = ∑ ϕ jw t − jϕ(B)x t = w tϕ(B) = B 0 − ϕB
j=0
∞ ∞
xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j
j=0 j=0
ϕ(B)ψ(B)w t = w t
The coefficients on the left must match the coefficients on the right
ψ 0 = 1ψ 1 = ϕψ j = ψ j − 1ϕ
ψj = ϕj
ϕ − 1(B) = ψ(B)
1
ϕ(z) = 1 − ϕz, | z | < 1ϕ(z) − 1 = = 1 + ϕz + ϕ 2z 2 + …
1 − ϕz
We will use similar polynomial to backshift operator techniques when we discuss ARMA models.
2
x t = w t + θ 1w t − 1 + θ 2w t − 2 + θ 3w t − 3 + … + θ qw t − qw t ∼ whitenoise(0, σ w)θ j are constant parametersθ q ≠ 0
Note: Some software and texts write the moving average model coefficients with negative coefficients. Check the help documentation before using.
θ(B) = B 0 + θ 1B 1 + θ 2B 2 + … + θ qB q
x t = w t + θw t − 1
then
{ {
2 1 h=0
(1 + θ 2)σ w h=0
θ
E(x t) = 0γ(h) = θσ 2w h = 1 ρ(h) = h=1
1 + θ2
0 h>1 0 h>1
1
| ρ(1) | ≤ and the bounds are achieved for θ = ± 1
2
1
Also notice that ρ(h) are the same if we have the coefficient of θ or
θ
1
θ θ
=
θ2 + 1
1+ () 1
θ
2
Replacing
1
θ with , and
θ
σ w with θ 2σ 2w:
2
{ ( ) 1 2 2
1+ (θ 2σ w) = (θ 2 + 1)σ w h=0
θ2
γ(h) =
1 2 2
(θ 2σ w) = θσ w h=1
θ
0 h>1
MA(1) have zero correlation for two or greater backshits, while AR(1) never have zero correlation.
Notice how much smoother the MA(1) model with θ = 0.9 is than θ = − 0.9.
#par(mfrow=c(2,1))
astsa::tsplot(
x = astsa::sarima.sim(
ma = 0.9,
n = 100
),
col = 4,
ylab = "",
main = expression(
MA(1)~~~theta==+0.9
)
)
astsa::tsplot(
x = astsa::sarima.sim(
ma = -0.9,
n = 100
),
col = 4,
ylab = "",
main=expression(
MA(1)~~~theta==-0.9
)
)
These two MA(1) models have the same autocorrelation, the same autocovariance, and are stochastically the same.
2
1 1
σ w = 25, θ = , xt = wt + w t − 1, w t ∼ iid normal(0, 25)
5 5
2
σw = 1, θ = 5, y t = v t + 5v t − 1, v t ∼ iid normal(0, 1)
{
26 h=0
γ(h) = 5 h=1
0 h>1
If we observed one of these processes, we would not be able to mathematically tell which one we were looking at. When we have to select a
model, we prefer to use a model that is invertible. Choose the model with | θ | < 1
x t = w t + θw t − 1w t = x t − θw t − 1w t = x t − θ(x t − 1 − θw t − 2) = x t − θx t − 1 + θ 2w t − 2w t = x t − θx t − 1 + θ 2(x t − 2 − θw t − 3) = x t − θx t − 1 + θ 2x t − 2 − θ 3w t − 3 ⋮ w t = ( − θ) k + 1w t − k − 1 +
∞
wt = ∑ ( − θ) jx t − j
j=0
∞
2
σ w = 25θ =
1
y = v t + 5v t − 1v t ∼ iid normal(0, 1)v t =
5 t
∑
j=0
( )
−1
5
j
yt − j
If | θ | < 1, then
Let
p q
xt = ∑ ϕ jx t − j + w t + ∑ θ kw t − kϕ p ≠ 0, θ q ≠ 0w t ∼ wn(0, σ 2w)σ 2w > 0
j=1 k=1
If the time series has a non-zero expected value, we adjust the model to get zero expected value.
( )
p p q
α=μ 1− ∑ ϕj xt = α + ∑ ϕ jx t − j + w t + ∑ θ kw t − k
j=1 j=1 k=1
Let’s move all the autoregressive terms to the left hand side of the equation.
p q
xt − ∑ ϕ j x t − j = w t + ∑ θ k w t − k x t − ϕ 1x t − 1 − ϕ 2x t − 2 − … − ϕ px t − p = w t + θ 1w t − 1 + θ 2w t − 2 + … + θ qw t − q
j=1 k=1
ϕ(B)x t = θ(B)w t
This presentation illuminates a potential pit-fall while modeling. If ϕ(B)x t = θ(B)w t is the correct model, but we mistakenly multiply both sides of the
equation by another operator on B, η(B), then we get a mathematically correct equation that will lead to over-parameterization.
η(B)ϕ(B)x t = η(B)θ(B)w t
xt = wt
1
Say that while fitting out model, we make the error of multiplying both sides of the equation by η(B) = (B 0 − B).
2
1 1 1 1 1 1
(B 0 − B)x t = (B 0 − B)w tx t − xt − 1 = wt − w t − 1x t = xt − 1 + wt − wt − 1
2 2 2 2 2 2
The correct model is ARMA(0,0), but we go with an ARMA(1,1) model. x t is white noise, but we missed that fact.
set.seed(
seed = 823
)
rnorm_5 = rnorm(
n = 100,
mean = 5
) # generate iid N(5,1)s
arima(
x = rnorm_5,
order = c(
1,0,1
) # since the observations are random noise, 0,0,0 is the correct order
)
##
## Call:
## arima(x = rnorm_5, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.7567 0.8308 4.9066
## s.e. 0.2621 0.2264 0.1028
##
## sigma^2 estimated as 0.9723: log likelihood = -140.51, aic = 289.02
(B 0 + 0.76B)x t = (B 0 + 0.26)w t
ϕ(z) = 1 − ϕ 1z − ϕ 2z 2 − … − ϕ pz p, ϕ p ≠ 0θ(z) = 1 + θ 1z + θ 2z 2 + … + θ qz q, θ p ≠ 0z ∈ C
To protect us from parameter redundant models, we will require that ϕ(z) and θ(z) do not have a common factor. This will help protect from
incorrectly multiplying the correct model by an extraneous operator.
∞ ∞ ∞
xt = ∑ ψ jw t − j = ψ(B)w tψ(B) = ∑ ψ jB j ∑ | ψ j | < ∞ψ 0 = 1
j=0 j=0 j=0
Example
The AR(1) process
x t = ϕx t − 1 + w t
is causal when | ϕ | < 1 or equivalently the root of ϕ(z) = 1 − ϕz is greater than one in magnitude.
∞
θ(z)
ψ(z) = ∑ ψ jz j = ϕ(z)
|z| < 1
j=0
Another way to phrase this property is that an ARMA process is causal only when the roots of ϕ(z) lie outside the unit circle; that is, ϕ(z) = 0 only
when | z | > 1.
Finally, to address the problem of uniqueness, we choose the model that allows an infinite autoregressive representation.
∞ ∞ ∞
π(B)x t = ∑ π jx t − j = w tπ(B) = ∑ π jB j ∑ | π j | < ∞π 0 = 1
j=0 j=0 j=0
∞
ϕ(z)
π(z) = ∑ π jz j = θ(z)
|z| < 1
j=0
an ARMA process is invertible only when the roots of θ(z) lie outside the unit circle; that is, θ(z) = 0 only when | z | > 1.
∞ ∞ ∞ ∞ ∞ ∞ ∞
θ(z)
ψ(z) == , | z | < 1ϕ(z)ψ(z) = θ(z)(1 − 0.9z) ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j − 0.9z ∑ ψ jz j = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ jz j + 1 = 1 + 0.5z ∑ ψ jz j + ∑ − 0.9ψ j − 1z j = 1 + 0.5zψ 0 +
ϕ(z)
j=0 j=0 j=0 j=0 j=0 j=0 j=1
c(
1,
(14/9)*(0.9^{1:10})
)
ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 psi-weights
∞ ∞ ∞ ∞ ∞ ∞
ϕ(z)
ϕ(z) = 1 − 0.9zθ(z) = 1 + 0.5zπ(z) = ∑ π jz j = θ(z)
, | z | < 1θ(z)π(z) = ϕ(z)(1 + 0.5z) ∑ π jz j = 1 − 0.9z ∑ π jz j + ∑ 0.5π jz j + 1 = 1 − 0.9z ∑ π jz j + ∑ 0.5π j − 1z j = 1 − 0.9zπ 0 + ∑
j=0 j=0 j=0 j=0 j=0 j=1 j
Solving gives these results. Notice how the exponents directly match the subscript (different from textbook).
Let’s manually compute the coefficients, notice that the coefficients get cut in half as the index increases
c(
1,
2.8*((-1/2)^(1:10))
)
astsa::ARMAtoAR(
ar = 0.9,
ma = 0.5,
lag.max = 10
) # first 10 pi-weights
(1 − ϕB)x t = w t
ϕ(z) = 1 − ϕz
is causal when both of the two roots are outside the unit circle.
| |
2 2
2
− ϕ1 ±
√ϕ 1
+ 4ϕ 2 − ϕ 1 ±
√ϕ 1
+ 4ϕ 2
1 1 −1
ϕ(z) = 1 − ϕ 1z − ϕ 2z z roots = > 1ϕ 1 = + ϕ2 = ϕ 1 + ϕ 2 < 1ϕ 2 − ϕ 1 < 1 | ϕ 2 | < 1
2ϕ 2 2ϕ 2 z+ z− z +z −
ϕ(z) = 1 − ϕ 1z − ϕ 2z 2
−1 −1
= (1 − z + z)(1 − z − z)
(
= 1−
− ϕ1 +
2ϕ 2
√
2
ϕ1 + 4ϕ 2
z
)( 1−
− ϕ1 −
2ϕ 2
√ϕ
2
1 + 4ϕ 2
z
)
M <- expand.grid(
phi_1 = seq(
from = -3,
to = 3,
length = 100
),
phi_2 = seq(
from = -2,
to = 1,
length = 100
),
root_p = complex(
real = NA,
imaginary = NA
),
root_m = complex(
real = NA,
imaginary = NA
)
)
M <- M[M$phi_2 != 0,]
M$discriminant = M$phi_1^2 + 4*M$phi_2
M$root_p[M$discriminant >= 0] <- complex(
real = (-M$phi_1 + sqrt(M$phi_1^2 + 4*M$phi_2))/(2*M$phi_2),
imaginary = 0
)[M$discriminant >= 0]
u n − αu n − 1 = 0α ≠ 0n ∈ N
u 1 = αu 0
u 2 = αu 1 = α 2u 0
u 3 = αu 2 = α 3u 0
⋮
u n = α nu 0
Operator notation:
(B 0 − αB)u n = 0
Associated polynomial:
1 −1
α(z) = 1 − α(z)z 0 = , α = z0
α
−1
u n = (z 0 ) nc
Autocorrelation of AR(1)
Here is an example of such a sequence:
ρ(h) − ϕρ(h − 1) = 0
∞ ∞ ∞ ∞ ∞
u n − α 1u n − 1 − α 2u n − 2 = 0u nx n − α 1u n − 1z n − α 2u n − 2z n = 0u nz n − α 1zu n − 1z n − 1 − α 2z 2u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u n − 1z n − 1 − α 2z 2 ∑ u n − 2z n − 2 = 0 ∑ u nz n − α 1z ∑ u nz
n=2 n=2 n=2 n=2 n=1
Set
∞
U(z) = ∑ u nz n
n=0
Now let’s take the partial fraction decomposition of the right hand side.
If the associated quadratic has two distinct roots, z 1 ≠ z 2, then for some constants c 1, c 2:
∞ ∞ ∞ ∞ ∞ ∞
c1 c2
1 − α 1z − α 2z 2 = (1 − z 1− 1z)(1 − z 2− 1z), z 1 ≠ z 2U(z) = −1
+ −1
∑ u nz n = c 1 ∑ (z 1− 1z) n + c 2 ∑ (z 2− 1z) n, | z 1− 1z | < 1, | z 2− 1z | < 1 ∑ u nz n = c 1 ∑ z 1− nz n + c 2 ∑ z
1 − z1 z 1 − z2 z n = 0 n=0 n=0 n=0 n=0 n=0
If the associated quadratic is square with root z 0, then we get the following equations. Notice that undetermined constants which are multiplied
together are merged into c 2.
∞ ∞ ∞ ∞
c1 c2 d 1
−1
1 − α 1z − α 2z 2 = (1 − z 0 z) 2U(z) = + ∑ u nz n = c 1 ∑ (z 0− 1z) n + c 2 dz (c 2 is merged with constant from antiderivative) ∑ u nz n = c 1 ∑ z 0 z n +
−n
1 − z 0− 1z (1 − z 0− 1z) 2
n=0 n=0 1 − z 0− 1z n=0 n=0
c 1z 1− n + c 2z 2− n − α 1(c 1z 1− n + 1 + c 2z 2− n + 1) − α 2(c 1z 1− n + 2 + c 2z 2− n + 2) = 0
−n −n −n+1 −n+1 −n+2 −n+2
c 1z 1 + c 2z 2 − α 1c 1z 1 − α 1c 2z 2 − α 2c 1z 1 − α 2c 2z 2 =0
−n −n+1 −n+2 −n −n+1 −n+2
c 1z 1 − α 1c 1z 1 − α 2c 1z 1 + c 2z 2 − α 1c 2z 2 − α 2c 2z 2 =0
−n 2 −n 2
c 1z 1 (1 − α 1z 1 − α 2z 1) + c 2z 2 (1 − α 1z 2 − α 2z 2) =0
−n −n
c 1z 1 (0) + c 2z 2 (0) =0
−n − (n+1)
u n = c 1z 0 + c 2(n + 1)z 0
−1 −1 −2 −1 −2 −1 −2
u n − α 1u n − 1 − α 2u n − 2 = 01 − α 1z − α 2z 2 = (1 − z 0 z) 2 = 1 − 2z 0 z + z 0 z 2 − α 1 = − 2z 0 − α 2 = z 0 u n − 2z 0 u n − 1 + z 0 u n − 2 = 0
Plugging our solution into the difference equations with factored coefficients shows our solution is correct.
(
∞
ϕ1
ρ(0) = 1ρ(1) = ρ( − 1) =
1 − ϕ2
When the roots of the associated polynomial are distinct and real:
−h −h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + (1 − c 1)z 2
−1 −h − (h+1)
ρ(h) = (1 − c 2z 0 )z 0 + c 2(h + 1)z 0
When the roots are complex, the associated polynomial has real coefficients, so the roots are conjugate. To get real ρ(h), the constants will need to
be conjugate.
¯ ¯ −h
−h −h −h
ρ(h) = c 1z 1 + c 2z 2 ρ(h) = c 1z 1 + c 1z 1
¯
z 1 = | z 1 | e θiz 1 = | z 1 | e − θi
¯
− h − hθi − h hθi −h
ρ(h) = c 1 | z 1 | e + c1 | z1 | e ρ(h) = a | z 1 | cos(hθ + b)
2
x t = 1.5x t − 1 − 0.75x t − 2 + w tσ w = 1roots = 1 ± i / √3θ = arctan(1 / √3) = 2π / 121 / 12 cycles per unit time
coef_phi <- c(
1,-1.5,0.75
) # coefficients of the polynomial
polyroot_phi <- polyroot(
z = coef_phi
)
a <- polyroot_phi[1] # = 1+0.57735i, print one root which is 1 + i 1/sqrt(3)
Arg_a = Arg(
z = a
)/(2*pi) # arg in cycles/pt
1/Arg_a # = 12, the period
## [1] 12
set.seed(
seed = 823
)
sarima.sim_phi = astsa::sarima.sim(
ar = c(
1.5,-.75
),
n = 144,
S = 12
)
astsa::tsplot(
x = sarima.sim_phi,
xlab = "Year"
)
Compute and plot the autocorrelation function of the autoregressive model using model (not simulation)
ARMAacf_phi = ARMAacf(
ar = c(
1.5,-0.75
),
ma = 0,
lag.max = 50
)
astsa::tsplot(
x = ARMAacf_phi,
type = "h",
xlab = "lag"
)
abline(
h = 0,
col = 8
)
∞
ϕ(B)x t = θ(B)w tx t = ∑ ψ jw t − j
j=0
ψ 0 = 1ψ j = θ j, j = 1, 2, …, qψ j = 0, j > q
ψ0 = 1
ψ 1 − ϕ 1ψ 0 = θ 1
ψ 2 − ϕ 1ψ 1 − ϕ 2ψ 1 = θ 2
ψ 3 − ϕ 1ψ 2 − ϕ 2ψ 1 − ϕ 3ψ 0 = θ 3
⋮
The actual solution will depend on the roots of the polynomials and the initial conditions.
Create an ARMA model with geometric autoregressive portion, geometric moving average portion
psi = ARMAtoMA(
ar = 0.9,
ma = 0.5,
lag.max = 50
) # for a list
astsa::tsplot(
x = psi,
type = 'h',
ylab = expression(
psi-weights
),
xlab = 'Index'
) # for a graph
q q
x t = θ(B)w tθ(B) = B 0 + θ 1B + … + θ qB qx t = ∑ θ jw t − jθ 0 = 1E(x t) = ∑ θ jE(w t − j) = 0
j=0 j=0
γ(h) = cov(x t + h, x t)
( )
q q
= cov ∑ θ jw t + h − j, ∑ θ kw t − k
j=0 k=0
=
{ σ 2w ∑ qj =−0hθ jθ j + h
0
0≤h≤q
h>q
Recall:
γ(h) = γ( − h)
Note:
{
q−h
∑ j = 0 θ jθ j + h
q 2
0≤h≤q
ρ(h) = ∑ j = 0θ j
0 h>q
( )
∞ ∞ ∞
ϕ(B)x t = θ(B)w t | z | > 1 ∀z s. t. ϕ(z) = 0x t = ∑ ψ jw t − jE(x t) = E ∑ ψ jw t − j = ∑ ψ jE(w t − j) = 0
j=0 j=0 j=0
γ(h) = cov(x t + h, x t)
( )
p q
= cov ∑ ϕ jx t + h − j + ∑ θ jw t + h − j, x t
j=1 j=0
( ) ( )
p q
= cov ∑ ϕ jx t + h − j, x t + cov ∑ θ jw t + h − j, x t
j=1 j=0
p q
= ∑ ϕ jcov (x t + h − j, x t ) + ∑ θ jcov (w t + h − j, x t )
j=1 j=0
( )
p q ∞
= ∑ ϕ jγ(h − j) + ∑ θ jcov w t + h − j, ∑ ψ kw t − k
j=1 j=0 k=0
p q ∞
= ∑ ϕ jγ(h − j) + ∑ ∑ θ jψ kcov (w t + h − j, w t − k )
j=1 j = 0k = 0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=0
p q
= ∑ ϕ jγ(h − j) + ∑ θ jψ j − hcov (w t + h − j, w t + h − j )
j=1 j=h
p q
= ∑ ϕ jγ(h − j) + σ 2w ∑ θ jψ j − h
j=1 j=h
p q
Say that z 1, …, z r are the roots of ϕ(z) with multiplicity m 1, …, \m r, m 1 + … + m r = p. Using difference equations, we see that there are polynomials in
h of degree m j − 1, P j(h) such that
r
ρ(h) = ∑ z j− hP j(h), h≥p
j=1
If the process is causal, then | z j | > 1 ∀j, and ρ(h) → 0 exponentially fast as h → ∞. If there are conjugate roots, conjugates will cancel the
imaginary parts and the dampening will be sinusoidal (the time series will appear cyclic).
x t = ϕx t − 1 + θw t − 1 + w t | ϕ | < 1
(1 + θϕ)(ϕ + θ)
γ(0) = ϕγ(1) + σ 2w[1 + θϕ + θ 2]γ(1) = σ 2w
1 − ϕ2
Note: ρ(h) for AR(1) and ARMA(1,1) are similar. We will be unable to tell the difference between AR(1) and ARMA(1,1) using ACF only.
We can use these facts to identify the order of a moving average process.
ρ XY | Z measures the correlation between X and Y with the linear effect of Z removed (partialled out).
ρ XY | Z = corr(X, Y | Z)
Example
Say that we have an AR(1) model
x t = ϕx t − 1 + w t
γ x(2) = cov(x t, x t − 2)
= cov(ϕx t − 1 + w t, x t − 2)
= cov(ϕ 2x t − 2 + ϕw t − 1 + w t − 2, x t − 2)
= ϕ 2γ x(0)
h
x t + h = ϕ hx t + ∑ ϕ h − jw t + jγ x(h) = cov(x t + h, x t) = ϕ hγ x(0)
j=1
(
cov x t − ϕx t − 1, x t − 2 −
1
x
ϕ t−1 ) ( 1
= cov w t, − w t − 1 = 0
ϕ )
PACF for mean-zero stationary time series
For h ≥ 2, let x̂ t + h be the regression of x t + h onto {x t + h − 1, x t + h − 2, …, x t + 1} (minimizing the mean squared error).
x̂ t + h = β 1x t + h − 1 + β 2x t + h − 2 + … + β h − 1x t + 1
No intercept is needed because E(x t) = 0 ∀t, if this is not the case, replace x t with x t − μ x.
Because of stationarity, the coefficient are the same if we shift the index.
x̂ t = β 1x t − 1 + β 2x t − 2 + … + β h − 1x t − h + 1
The partial autocorrelation function, ϕ hh, is the correlation between x t + h and x t with the linear dependence on {x t + 1, x t + 2, …, x t + h − 1} removed
If x t is Gaussian, then ϕ hh is the correlation coefficient between x t + h and x t in the bivariate distribution (x t + h, x t) conditioned on
{x t + 1, x t + 2, …, x t + h − 1}.
ϕ hh = corr(x t + h, x t | x t + 1, x t + 2, …, x t + h − 1)
x t = ϕx t − 1 + w t | ϕ | < 1ϕ 11 = ρ(1) = ϕ
2 2
x̂ t + 2 = βx t + 1E(x t + 2 − x̂ t + 2) 2 = E(x t + 2 − βx t + 1) 2 = E(x t + 2 − 2βx t + 2x t + 1 + β 2x t + 1) = γ(0) − 2βγ(1) + β 2γ(0)
γ(1) ϕγ(0)
β= = =ϕ
γ(0) γ(0)
In fact ϕ hh = 0 ∀h > 1.
xt + h = ∑ ϕ jx t + h − j + w t + h
j=1
x̂ t + h = ∑ ϕ jx t + h − j
j=1
ϕ hh = corr(x t + h − x̂ t + h, x t − x̂ t) = corr(w t + h, x t − x̂ t) = 0
coef_ar_2 <- c(
1.5,-0.75
)
ARMAacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24
)[-1]
ARMApacf_2 = ARMAacf(
ar = coef_ar_2,
ma = 0,
lag.max = 24,
pacf = TRUE
)
#par(mfrow=1:2)
astsa::tsplot(
x = ARMAacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)
astsa::tsplot(
x = ARMApacf_2,
type = "h",
xlab = "lag",
lwd = 3,
nxm = 5,
col=c(
rep(4,11),6
)
)
∞
xt = − ∑ π jx t − j + w t
j=1
No finite representation exists. The PACF will never cut off (in contrast to an AR(p) process).
x t = w t + θw t − 1 | θ | < 1
then
( − θ) h(1 − θ 2)
ϕ hh = , h≥1
1 − θ2 ( h + 1 )
The PACF cuts of after 2, and ACF tails off. Let’s use an AR(2) model.
data(
list = "rec",
package = "astsa"
)
astsa::acf2(
series = rec,
max.lag = 48
) # will produce values and a graphic
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.92 0.78 0.63 0.48 0.36 0.26 0.18 0.13 0.09 0.07 0.06 0.02 -0.04
## PACF 0.92 -0.44 -0.05 -0.02 0.07 -0.03 -0.03 0.04 0.05 -0.02 -0.05 -0.14 -0.15
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.12 -0.19 -0.24 -0.27 -0.27 -0.24 -0.19 -0.11 -0.03 0.03 0.06 0.06
## PACF -0.05 0.05 0.01 0.01 0.02 0.09 0.11 0.03 -0.03 -0.01 -0.07 -0.12
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.02 -0.02 -0.06 -0.09 -0.12 -0.13 -0.11 -0.05 0.02 0.08 0.12 0.10
## PACF -0.03 0.05 -0.08 -0.04 -0.03 0.06 0.05 0.15 0.09 -0.04 -0.10 -0.09
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.06 0.01 -0.02 -0.03 -0.03 -0.02 0.01 0.06 0.12 0.17 0.20
## PACF -0.02 0.05 0.08 -0.02 -0.01 -0.02 0.05 0.01 0.05 0.08 -0.04
ar.ols_rec = ar.ols(
x = rec,
order = 2,
demean = FALSE,
intercept = TRUE
) # regression
ar.ols_rec
##
## Call:
## ar.ols(x = rec, order.max = 2, demean = FALSE, intercept = TRUE)
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: 6.737 (1.111)
##
## Order selected 2 sigma^2 estimated as 89.72
## $x.mean
## [1] 1.110599
##
## $ar
## [1] 0.04178901 0.04187942
## $`yule-walker`
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8
##
## $burg
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
## Order selected 2 sigma^2 estimated as 89.34
##
## $ols
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3541 -0.4632
##
## Intercept: -0.05644 (0.446)
##
## Order selected 2 sigma^2 estimated as 89.72
##
## $mle
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
## Order selected 2 sigma^2 estimated as 89.34
##
## $yw
##
## Call:
## ar(x = rec, order.max = 12, method = j)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8
ar.burg(
x = rec,
order.max = 12
)
##
## Call:
## ar.burg.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3515 -0.4620
##
## Order selected 2 sigma^2 estimated as 89.34
ar.yw(
x = rec,
order.max = 12
)
##
## Call:
## ar.yw.default(x = rec, order.max = 12)
##
## Coefficients:
## 1 2
## 1.3316 -0.4445
##
## Order selected 2 sigma^2 estimated as 94.8
ar.mle(
x = rec
)
##
## Call:
## ar.mle(x = rec)
##
## Coefficients:
## 1 2
## 1.3513 -0.4613
##
## Order selected 2 sigma^2 estimated as 89.34
3.4 Forecasting
3.4.1 Forecasting AR Processes
When we forecast, we are predicting the future values of our time series, x n + m, m = 1, 2, 3, …, using observed values, x 1 : n = {x 1, x 2, …, x n}.
In this section, we assume our time series is stationary and the model parameters are known.
Linear predictors
For an initial look at possible predictors, let’s restrict our attention to linear functions on the data.
n
n
xn + m = α0 + ∑ α kx kα 0, α 1, …, α n ∈ R
k=1
Note: The coefficients depend on both n and m, for now we will drop this fact from the notation.
example
1 2
If n = 1, m = 1, x 2 is a one-step-ahead forecast of x 2 given x 1. A linear predictor is x 2 = α 0 + α 1x 1.
2 2
If n = 2, m = 1, x 3 is a one-step-ahead forecast of x 3 given x 1, x 2. A linear predictor is x 2 = α 0 + α 1x 1 + α 2x 2.
1 2
In general, the coefficients of x 2, x 3 will be different.
Proof
If y ∈ M, then ŷ = y, z = 0.
vn + vm
We need to show that (v n) is a Cauchy sequence. Since v n, v m are in M, | | y − | | 2 ≥ δ2
2
| | v n − v m | | 2 = | | (y − v n) − (y − v m) | | 2
= | | y − v n | | 2 + | | y − v m | | 2 − 2 < y − v n, y − v m >
vn + vm
= | | y − vn | | 2 + | | y − vm | | 2 + | | y − vn | | 2 + | | y − vm | | 2 − 4 | | y − | |2
2
vn + vm
= 2 | | y − vn | | 2 + 2 | | y − vm | | 2 − 4 | | y − | |2
2
≤ 2(δ 2 + 1 / n) + 2(δ 2 + 1 / m) − 4δ 2 = δ 2(1 / n + 1 / m)
Thus (v n) is a Cauchy sequence in a Hilbert space, therefore the sequence converges. Say to ŷ. This shows existence and the minimum norm
property.
y1 + y2
| | y1 − y2 | | 2 = 2 | | y − y1 | | 2 + 2 | | y − y2 | | 2 − 4 | | y − | |2
2
= 2δ 2 + 2δ 2 − 4δ 2
=0
Now we need to show that y − ŷ is orthogonal to M. Let y 0 be any length one vector in M.
| | y0 | | = 1
< y − ŷ − αy 0, y − ŷ − αy 0 >= < y − ŷ, y − ŷ > + α 2 < y 0, y 0 > − 2α < y − ŷ, y 0 >
This is a quadratic on α, Minimize this
2 < y − ŷ, y 0 > < y − ŷ, y 0 >
α= =
2 < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 > 2 < y − ŷ, y 0 > 2
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > + −2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
2
< y − ŷ, y 0 > < y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >= < y − ŷ, y − ŷ > − 2
< y 0, y 0 > < y 0, y 0 > < y 0, y 0 >
< y − ŷ, y 0 > < y − ŷ, y 0 >
< y − ŷ − y 0, y − ŷ − y 0 >≤ < y − ŷ, y − ŷ >
< y 0, y 0 > < y 0, y 0 >
For our purpose, let’s set the dot product to the expected value of the product.
Without loss of generality, we will proceed with univariate expected values of zero for ease of notation.
Theorem B.3 For a Gaussian process, the minimum mean square error predictor is
the best linear predictor (the projection is the expected value)
If (y, x1, . . . , xn) is multi-variate normal, then
Proof:
Let ŷ = E M ( x ) y be the unique element of M(x) that satisfies the orthogonality principle.
Since ŷ is fixed, (y − ŷ, x 0, x 1, . . . , x n) is multivariate normal. Thus zero covariance gives us independence between y − ŷ and x i.
0 =< y − ŷ, w >= E[(y − ŷ)w] = E(y − ŷ)E(w)x 0 = 10 =< y − ŷ, x 0 >= E[(y − ŷ)x 0] = E(y − ŷ)0 = E(y − ŷ)ŷ = E(y)
Property 3.3 Best Linear Prediction for Stationary Processes (the prediction
equations)
n n
Given observations x 1, x 2, . . . , x n, coefficients of the best linear predictor x n + m = α 0 + ∑ k = 1α kx k of x n + m for m ≥ 1 solves
n
E[(x n + m − x n + m)x k] = 0, k = 0, 1, . . . , n
where x 0 = 1.
n ∂Q
We can solve for α 0, α 1, . . . , α n by minimizing Q = E(x n + m − ∑ k = 0α kx k) 2 with respect to the αs; = 0, j = 0, 1, . . . , n.
∂α j
Set k = 0.
( ) ( )
n n n n
E[(x n + m − x nn + m)1] = 0E(x nn + m) = E(x n + m) = μE α 0 + ∑ α kx k = μα 0 + ∑ α kμ = μα 0 = μ 1− ∑ αk x nn + m = μ + ∑ α k(x k − μ)
k=1 k=1 k=1 k=1
x nn + m = μ + ∑ α k(x k − μ)
k=1
one-step-ahead prediction
Given that x 1, . . . , x n were observed, we want to predict the next observation x n + 1.
n
n
xn + 1 = ∑ ϕ njx n + 1 − j
j=1
Invoke orthogonality:
[( ) ]
n n n
E xn + 1 − ∑ ϕ njx n + 1 − j x n + 1 − k = 0, k = 1, . . . , nE(x n + 1x n + 1 − k) = ∑ ϕ njE(x n + 1 − jx n + 1 − k)γ(k) = ∑ ϕ njγ(k − j)
j=1 j=1 j=1
()()
ϕ n1 γ(1)
Γ nϕ n = γ nΓ n(j, k) = γ(k − j), an n × n matrixϕ n = ⋮ γn = ⋮
ϕ nn γ(n)
ϕ n = Γ n− 1γ n
2
For ARMA models, σ w > 0, and γ(h) → 0 as h → ∞ makes Γ n non-singular almost surely.
()
xn
xn − 1
n ′
xn + 1 = ϕn
⋮
x1
= E(x n + 1 − ϕ n′ x) 2
= E(x n + 1 − γ n′ Γ n− 1x) 2
2 ′ −1 ′ −1 −1
= E(x n + 1 − 2γ nΓ n xx n + 1 + γ nΓ n xx ′ Γ n γ n)
2 ′ −1 ′ −1 −1
= E(x n + 1) − 2γ nΓ n E(xx n + 1) + γ nΓ n E(xx ′ )Γ n γ n)
′ −1 ′ −1 −1
= γ n(0) − 2γ nΓ n γ n + γ n Γ n Γ nΓ n γ n
′ −1
= γ n(0) − γ nΓ n γ n
Example 3.19 Prediction for an AR(2) (Verify that the matrix equation gives the
correct coefficients)
Suppose we have a causal AR(2) process
x t = ϕ 1x t − 1 + ϕ 2x t − 2 + w t
−1
1 1
γ(1)
ϕ n = Γ n γ nϕ 11 = γ(1)x 2 = ϕ 11x 1 = x 1 = ρ(1)x 1
γ(0) γ(0)
ϕ n = Γ n− 1γ n
( )(
ϕ 21
ϕ 22
=
γ(0)
γ(1)
γ(1)
γ(0) ) ( ) −1 γ(1)
γ(2)
=
1 γ(0)
γ(0) 2 − γ(1) 2 − γ(1) ( − γ(1)
γ(0) )( )
γ(1)
γ(2)
2 2 2 2
x 3 = ϕ 1x 2 + ϕ 2x 1 + w 3x 3 = ϕ 1x 2 + ϕ 2x 1x 3 − x 3 = x 3 − (ϕ 1x 2 + ϕ 2x 1) = w 3E((x 3 − x 3)x 1) = E(w 3x 1) = 0E((x 3 − x 3)x 2) = E(w 3x 2) = 0
( )()
ϕ 21
ϕ 22
=
ϕ1
ϕ2
x nn + 1 = ϕ 1 + x n + ϕ 2x n − 1
( )() ϕ n1
ϕ n2
=
ϕ1
ϕ2
p
n
xn + 1 = ∑ ϕ jx n + 1 − j
j=1
n n
P n + 1 = E(x n + 1 − x n + 1) 2
ϕ 00 = 0, P 01 = γ(0)
For n ≥ 1,
n−1
ρ(n) − ∑ k = 1 ϕ n − 1 , kρ(n − k)
n n−1 2
ϕ nn = Pn + 1 = Pn (1 − ϕ nn)
1 − ∑ nk =− 11ϕ n − 1 , kρ(k)
The general formula for the mean square one-step-ahead prediction error is
P n + 1 = γ(0) ∏ [1 − ϕ jj]
n 2
j=1
For n ≥ 2:
ϕ nk = ϕ n − 1 , k − ϕ nnϕ n − 1 , n − k, k = 1, 2, . . . , n − 1
n = 1:
1 2
ϕ 11 = ρ(1)P 2 = γ(0)[1 − ϕ 11]
n = 2:
ρ(2) − ϕ 11ρ(1)
ϕ 22 = ϕ 21 = ϕ 11 − ϕ 22ϕ 11P 23 = P 12[1 − ϕ 22 2] = γ(0)[1 − ϕ 211][1 − ϕ 22 2]
1 − ϕ 11ρ(1)
n = 3:
Let’s use the Durbin-Levinson algorithm on some data. First using a function, then computing manually. Previously, we used PACF to establish that
the rec time-series should be modeled with an AR(2).
data(
list = "rec",
package = "astsa"
)
acf_rec <- as.vector(acf(
x = rec
)$acf)
gsignal::levinson(
acf = acf_rec,
p = 2
)
## $a
## [1] 1.0000000 -1.3315874 0.4445447
##
## $e
## [1] 0.1205793
##
## $k
## [1] -0.9218042 0.4445447
phi00 <- 0
P10 <- var(
x = rec
)
phi11 <- acf_rec[2]
P21 <- P10*(1-phi11^2)
phi22 <- (acf_rec[3] - phi11*acf_rec[2])/(1 - phi11*acf_rec[2])
phi21 <- phi11 - phi22*phi11
P32 <- P10*(1 - phi11^2)*(1 - phi22^2)
print(
x = c(phi21,phi22)
)
Using iterative solution for the PACF and setting n = p, it follows that for an AR(p) model,
This shows that for an AR(p) model, the partial autocoefficient at lag p, ϕ pp, is also the last coefficient in the model, ϕ p.
We are working with an AR(2), in the difference equations section, we showed that this equations hold.
ϕ1
ρ(h) − ϕ 1ρ(h − 1) − ϕ 2ρ(2) = 0, h ≥ 1ρ(1) = ρ(2) = ϕ 1ρ(1) + ϕ 2ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1) = 0
1 − ϕ2
Combining the difference equations results with the iterative solution equations, we get:
ϕ1
( ) ϕ1 2
ϕ1 + ϕ2 −
ϕ1 2 1 − ϕ2 1 − ϕ2 ρ(3) − ϕ 1ρ(2) − ϕ 2ρ(1)
ρ(2) − ρ(1)
ϕ 11 = ρ(1) = ϕ 22 = = = ϕ 2ϕ 21 = ρ(1)(1 − ϕ 2) = ϕ 1ϕ 33 = =0
1 − ϕ2 1 − ρ(1) 2 1 − ϕ 1ρ(1) − ϕ 2ρ(2)
( ) ϕ1 2
1−
1 − ϕ2
Notice that ϕ 22 = ϕ 2.
n n
( ) ()
(m)
γ(m) ϕ n1
(m) (m) (m) ⋮ (m) ⋮
Γ nϕ n = γn γn = ϕn =
γ(m + n − 1) (m)
ϕ nn
s−1 t−1
cor(x s − x s , xt − xt ) = 0, s ≠ t
t−1
Using this uncorrelated property and the projection theorem, we can derive the innovations algorithm. x t − x t are called innovations.
t t−1 j−1 k
γ(t − j) − ∑ k = 0θ j , j − kθ t , t − kP k + 1
x 01 = 0P 01 = γ(0)x tt + 1 = ∑ θ tj(x t + 1 − j − x tt −+ j1 − j), t = 1, 2, . . . P tt + 1 = γ(0) − ∑ θ 2t , t − jP jj + 1, t = 1, 2, . . . θ t , t − j =
j=1 j=0 P jj + 1
n+m−1 n+m−1
x nn + m = ∑ θ n + m − 1 , j(x n + m − j − x nn ++ m −j−1 n
m − j )P n + m = γ(0) − ∑ θ 2n + m − 1 , jP nn ++ m −j−1
m−j
j=m j=m
2 2
x t = w t + θw t − 1γ(0) = (1 + θ 2)σ wγ(1) = θσ wγ(h) = 0 ∀h > 1
2 n−1 2
θσ w θ(x n − x n )σ w
0 0 2 n 2 n
x 1 = 0P 1 = (1 + θ 2)σ wθ n1 = θ = 0, j = 2, . . . , nP n + 1 = (1 + θ 2 − θθ n1)σ wx n + 1 =
n − 1 nj
Pn P nn − 1