Professional Documents
Culture Documents
1 / 77
Overview
2 ARIMA Models
AR Process
MA Process
ARMA Models
ARIMA Models
3 ARIMA Modeling:
A Toy Problem
2 / 77
Time Series
3 / 77
Categories and Terminologies
4 / 77
Categories and Terminologies
(cont.)
• univariate vs. multivariate
A time series containing records of a single variable is
termed as univariate, but if records of more than one
variable are considered then it is termed as multivariate.
• linear vs. non-linear
A time series model is said to be linear or non-linear
depending on whether the current value of the series is
a linear or non-linear function of past observations.
• discrete vs. continuous
In a continuous time series observations are measured at
every instance of time, whereas a discrete time series
contains observations measured at discrete points in time.
• This lecture will focus on univariate, linear, discrete
time series.
5 / 77
Components of a Time
• In general, a timeSeries
series is affected by four components,
i.e. trend, seasonal,cyclical and irregular components.
– Trend
The general tendency of a time series to increase, decrease
or stagnate over a long period of time.
110
cents per pound
90
80
70
60
2005 2010
2015
The price of chicken: monthly whole bird spot price, Georgia docks, US
cents per pound, August 2001 to July 2016, with fitted linear trend line.
6 / 77
Components of a Time Series
(cont.)
• In general, a time series is affected by four components,
i.e. trend, seasonal,cyclical and irregular components.
– Seasonal variation
This component explains fluctuations within a year during the
season, usually caused by climate and weather conditions,
customs, traditional habits, etc.
Quarterly Earnings per Share
15
10
5
0
Johnson & Johnson quarterly earnings per share, 84 quarters, 1960-I to 1980-IV.
7 / 77
Components of a Time Series
(cont.)
• In general, a time series is affected by four components,
i.e. trend, seasonal,cyclical and irregular components.
– Cyclical variation
This component describes the medium-term changes caused by
circumstances, which repeat in cycles. The duration of a cycle
extends over longer period of time.
130
110
90
70
9 / 77
Combination of Four Components
– Additive Model
Y (t ) = T (t ) + S (t ) + C (t ) + I (t )
– Multiplicative Model
Y (t ) = T (t ) × S (t ) × C (t ) × I (t )
11 / 77
Time Series Example: Random
•A
Walk
random walk is the process by which randomly-
moving objects wander away from where they started.
• Consider a simple 1-D process:
– The value of the time series at time t is the value of the series
at time t − 1 plus a completely random movement determined
by wt . More generally, a constant drift factor δ is
introduced.
Σt
Xt = δ + Xt− 1 + w = δt + i
wt i
=1
20 40 60 80
−20 0
12 / 77
Time Series Analysis
13 / 77
Measures of
Dependence
• A complete description of a time series, observed as a
collection of n random variables at arbitrary time points
t1, t2, . . . , t n , for any positive integer n, is provided by
the
joint distribution function, evaluated as the probability that
the values of the series are jointly less than the n constants,
c1, c2, . . . , cn; i.e.,
• Mean function
– The mean function is defined as
∫ ∞
µt = µXt = E [Xt ] = xf
(xt )dx, −∞
Σt
µX t = E [X ] = δt + E [wi ] =
δtt i
=1
15 / 77
Autocovariance for Time
Series
• Lack of independence between adjacent values in time series
Xs and Xt can be numerically assessed.
• Autocovariance Function
– Assuming the variance of Xt is finite, the autocovariance
function is defined as the second moment product
17 / 77
Stationarity of Stochastic Process
18 / 77
Which of these are
stationary?
19 / 77
Strict Stationarity
20 / 77
Weak Stationarity
• Weak Stationarity
– The time series {Xt , t ∈ Z} is said to be weakly stationary
if 1 E [Xt2 ] < ∞, ∀t ∈
Z;
2 E [X t ] = µ, ∀t ∈ Z;
3 γX (s, t) = γX (s + h, t + h), ∀s, t, h ∈ Z.
– In other words, a weakly stationary time series {X t } must have
three features: finite variation, constant first moment, and
that the second moment γX (s, t ) only depends on |t − s|
and not depends on s or t.
• Usually the term stationary means weakly stationary, and
when people want to emphasize a process is stationary in
the strict sense, they will use strictly stationary.
21 / 77
Remarks on Stationarity
γ(t + h, t ) γ(h)
ρ(h) = √ =
γ(t + h, t + h)γ(t , t ) 23 / 77
Partial Autocorrelation
• Another important measure is called partial
autocorrelation, which is the correlation between Xs and
Xt with the linear effect of “everything in the middle”
removed.
• Partial Autocorrelation Function (PACF)
– For a stationary process Xt , the PACF (denoted as φhh ),
for
h = 1, 2, . . . is defined as
φ11 = corr(Xt+1 , Xt ) = ρ1
25 / 77
Notations
• Intuition
– Autoregressive models are based on the idea that current value
of the series, Xt , can be explained as a linear combination of p
past values, X t − 1 , X t − 2 , . . . , X t − p , together with a random
error in the same series.
• Definition
– An autoregressive model of order p, abbreviated AR(p), is of
the form
Σp
Xt = φ1 Xt− 1 + φ2 Xt− 2 + · · · p+ φ
t− pX t +i t− i +
wt = φX i w
=1
where X is stationary, w ∼ wn(0, σ2 ), and φ ,
t t w 1 2
(φp, .=/0)
φ . . , are
φ model parameters. The hyperparameter p
p
represents the length of the “direct look back” in the series.
27 / 77
Backshift Operator
BXt = Xt−1 .
Xt = B − 1 BX t = B − 1 X t − 1 .
28 / 77
Autoregressive Operator of AR
• RecallProcess
the definition for AR(p) process:
it as:
Xt − φ1 Xt− 1 − φ2 Xt− 2 − · · ·
− φp Xt− p = wt
p
(1 − φ12B − φ2 B 2 − p· · · −Σ
φ(B) = 1 −p φ 1 B − 2φ j B − · · p· − φ B = 1 −
φp B )X t = wt
φB, j j =1
• The autoregressive operator is defined as:
then the AR(p) can be rewritten more concisely as:
φ(B)Xt = wt
29 / 77
AR Example: AR(0) and AR(1)
• Mean E [Xt ] =
0
• Variance Var(Xt ) = σ2w
(1 − 2
1
φ ) 31 / 77
AR Examples: AR(1) Process
• Autocorrelation Function (ACF)
ρh = φ1h
32 / 77
AR Examples: AR(1) Process
• Partial Autocorrelation Function (PACF)
φ11 = ρ1 = φ1 φhh =
0, ∀h ≥ 2
33 / 77
AR Examples: AR(1) Process
• Simulated AR(1) Process Xt = −0.9Xt− 1 + wt
:
• Mean E [Xt ] =
0
• Variance Var(Xt ) = σ2w
(1 − 2
1
φ ) 34 / 77
AR Examples: AR(1) Process
• Autocorrelation Function (ACF)
ρh = φ1h
35 / 77
AR Examples: AR(1) Process
• Partial Autocorrelation Function (PACF)
φ11 = ρ1 = φ1 φhh =
0, ∀h ≥ 2
36 / 77
Stationarity of AR(1)
• We can iteratively expand AR(1) representation
as:
Xt =
=φ φ11 X
(φt− 1X+ w
1 t− 2 +
t ) + wt = φ1 Xt− 2 + φ1 wt− 1 +
w t− 1 w 2 t
..
k− 1
Σ
= φ1k t− k + φj1 t− j
X w j =0
38 / 77
General AR(p) Process
39 / 77
AR Models: Parameters Estimation
40 / 77
Moving Average Models (MA)
41 / 77
Moving Average Models (MA)
• Definition
– A moving average model of order q, or MA(q), is defined to
be
Σq
Xt = w t + θ1 wt− 1 + θ2 wt−2 + · · · q+ θ
t− qw t =j t−j
w + θw j
=1
where w ∼ wn(0, σ2 ), and θ , θ , . . . , θ (θ =/
t w 1 2 q
parameters. 0)
q
are
– Although it looks like a regression model, the difference is that
the wt is not observable.
• Contrary to AR model, finite MA model is always
stationary, because the observation is just a weighted
moving average over past forecast errors.
42 / 77
Moving Average Operator
θ(B) = 1 + θ1 B + θ2 B 2 + · · · + θq B q ,
43 / 77
MA Examples: MA(1) Process
• Simulated MA(1) Process Xt = wt + 0.8 ×
wt− 1 :
• Mean E [Xt ] = 0
• Variance Var(Xt ) = σ2 (1 +
θ2 )
w 1
44 / 77
MA Examples: MA(1) Process
• Autocorrelation Function (ACF)
θ1
ρ1 = ρh = 0, ∀h ≥
2 1+ 1
θ2
45 / 77
MA Examples: MA(1) Process
• Partial Autocorrelation Function (PACF)
(−θ1 )h (1 −
φhh = − θ ) 2 2(h+1) 1 , h≥
1− 1 1
θ
46 / 77
MA Examples: MA(2) Process
• Simulated MA(2) Process Xt = wt + 0.5 × wt− 1 + 0.3 ×
wt− 2 :
• Mean E [Xt ] = 0
• Variance Var(Xt ) = σ2 (1 + θ2 +
θ2 )
w
1 47 / 77
MA Examples: MA(2) Process
• Autocorrelation Function (ACF)
θ1 + θ1 θ2 θ2
ρ1 = ρ2 = ρh = 0, ∀h
1 + θ12 + ≥31 + θ12 +
θ2 2 θ2 2
48 / 77
General MA(q) Process
49 / 77
MA Problem: Non-unique MA Process
50 / 77
Comparisons between AR and MA
51 / 77
MA Models: Parameters Estimation
52 / 77
ARMA Models
φ(B)Xt = θ(B)wt
53 / 77
ARMA Problems: Redundant Parameters
η(B)φ(B)Xt = η(B)θ(B)wt
(1 − 0.5B)Xt = (1 − 0.5B)wt
Xt = 0.5Xt− 1 − 0.5wt− 1 + wt
• Now it looks exactly like a ARMA(1, 1) process!
• If we were unaware of parameter redundancy, we might
claim the data are correlated when in fact they are not.
54 / 77
Choosing Model Specification
AR(p) MA(q)
ARMA(p, q) Cuts off
ACF Tails off Tails off
after lag
Cuts off q
PACF Tails off
after lag
Tails off
p
• Other criterions can be used for choosing q and q too, such as
AIC (Akaike Information Criterion), AICc (corrected AIC) and
BIC (Bayesian Information Criterion).
• Note that the selection for p and q is not unique.
55 / 77
“Stationarize” Nonstationary Time Series
• One limitation of ARMA models is the stationarity condition.
• In many situations, time series can be thought of as being
composed of two components, a non-stationary trend
series and a zero-mean stationary series, i.e. Xt = µt +
Yt .
• Strategies
– Detrending: Subtracting with an estimate for trend and
deal with residuals.
Yˆt = Xt − µˆt
– Differencing: Recall that random walk with drift is capable
of representing trend, thus we can model trend as a
stochastic component as well.
µt = δ + µt− 1 + wt
∇Xt = Xt − X t − 1 = δ + wt + (Yt − Y t − 1 ) = δ + wt + ∇Yt
∇X
∇ Xt t==X∇(∇X
2 t − Xt− = Xt −t BX
t )1 = ∇(X − t = (1 − B)Xt
Xt− 1 ) = (Xt − Xt− 1 ) − (Xt− 1 − Xt− 2 )
= Xt − 2Xt− 1 + Xt− 2 = Xt − 2BXt +
B 2 Xt
= (1 − 2B + B 2 )Xt = (1 − B)2 Xt
57 / 77
Detrending vs. Differencing
original
●●● ●●
●● ●
●●●● ●
● ●●●
● ●● ● ●●● ● ●
●
●
●● ● ●
● ● ● ● ●● ●
●●
● ●●
chicken
●●
●
●● ● ●
●●
90
●
●●
●●●● ● ●●● ● ● ●● ● ●● ●● ● ●
● ● ●●●
● ● ● ● ●
● ●● ● ●●
● ● ● ● ●
●●●
80
●● ●
●● ● ● ●
● ● ● ●
● ●
● ● ●●●
● ● ●●●●
● ● ●●● ● ● ● ●
● ●
60 70
● ● ●
●● ● ● ● ● ●●
● ● ●● ●
● ● ● ●●●
●●● ●
● ●
110
● ● ● ● ● ●● ● ● ● ● ●●
●
●
●
●
Time
detrended
●
10
● ●
● ● ●● ●
●
● ●● ● ●
● ●● ● ● ●
●
● ●●
● ●
5
●
● ●● ●● ●
●
resid(fit)
● ●
● ●
● ●
● ● ●
●● ● ● ● ●
●● ●
● ● ●
● ● ● ● ● ●● ● ● ●● ● ●
● ● ● ● ●● ●
● ● ● ●
● ●● ●
● ● ●● ●● ● ● ●●●
● ● ● ●
●
0
● ●
● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ●●● ●
● ● ● ●●
−5
● ● ●● ●
● ● ●
● ●
● ●●● ●
● ●● ● ●
●● ● ●
● ● ● ● ● ● ●●●
● ●●
●●
●●
Time
first difference
●
3
● ●
●
●
●
● ● ● ●
●
2
●
● ●
● ●
●
diff(chicken)
● ● ●
● ● ● ● ●●
● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ●
1
●●● ● ●● ●● ● ● ● ● ●
● ● ● ●
● ●● ● ●●
●● ● ● ● ●
●● ● ● ●
● ● ● ● ● ● ● ●
●●● ● ●● ● ●
●● ● ●●
●●
● ● ● ● ● ● ●●
● ● ● ●● ● ●
● ● ●● ● ● ● ● ●
● ●
0
● ● ● ● ●
●
●
● ● ●●
● ● ● ●● ● ● ●
● ● ● ● ●
● ●
● ● ● ●
●
−2 −1
● ●
● ●● ● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
Time
58 / 77
From ARMA to ARIMA
• Order of Differencing
– Differences of order d are defined as
d d
∇ = (1 − B) ,
is ARMA(p, q).
– In general, ARIMA(p, d, q) model
can be written as:
. 59 / 77
Box-Jenkins Methods
• As we have seen ARIMA models have numerous parameters
and hyper parameters, Box and Jenkins suggests an
iterative three-stage approach to estimate an ARIMA
model.
• Procedures
1 Model identification: Checking stationarity and seasonality,
performing differencing if necessary, choosing model
specification ARIMA(p, d, q).
2 Parameter estimation: Computing coefficients that best fit the
selected ARIMA model using maximum likelihood estimation
or non-linear least-squares estimation.
3 Model checking: Testing whether the obtained model
conforms
to the specifications of a stationary univariate process (i.e. the
residuals should be independent of each other and have
constant mean and variance). If failed go back to step 1.
• Let’s go through a concrete example together for
60 / 77
Air Passenger Data
61 / 77
Model Identification
62 / 77
Stationarity Test: Rolling Statistics
63 / 77
Stationarity Test: ACF Plot
64 / 77
Stationarity Test: ADF Test
65 / 77
Stationarize Time Series
66 / 77
Stationarized Time Series: ACF Plot
67 / 77
Stationarized Time Series: ADF
Test
• Results of Augmented Dickey-Fuller Test
Item Value
Test Statistic -2.717131
p-value 0.071121
#Lags Used 14.000000
Number of Observations Used 128.000000
Critical Value (1%) -3.482501
Critical Value (5%) -2.884398
Critical Value (10%) -2.578960
• From the ACF plot, we can see that the mean and
std variations have much smaller variations with
time.
• Also, the ADF test statistic is less than the 10% critical
value, indicating the time series is stationary with 90%
confidence. 68 / 77
Choosing Model Specification
71 / 77
ARMA(2, 2): Predicted on
Residuals
• Here we can see that the AR(2) and MA(2) models have
almost the same RSS but combined is significantly
better.
72 / 77
Forecasting
• The last step is to reverse the transformations we’ve done
to get the prediction on original scale.
73 / 77
SARIMA: Seasonal ARIMA Models
• One problem in the previous model is the lack of
seasonality, which can be addressed in a generalized version
of ARIMA model called seasonal ARIMA.
• Definition
– A seasonal ARIMA model is formed by including additional
seasonal terms in the ARIMA models, denoted as
i.e.
year.
– The seasonal part of the model consists of terms that are
similar to the non-seasonal components, but involve backshifts
of the seasonal period. 74 / 77
Off-the-shelf Packages
76 / 77