Lect04 Handouts

Time Series Analysis
Lasse Engbo Christiansen
Department of Applied Mathematics and Computer Science

Technical University of Denmark
September 26, 2017
1 / 36
Outline of the lecture
I Regression based methods, 3rd part:
I Regression and exponential smoothing (Sec. 3.4)
I Global and local trend models - an example (Sec. 3.6)

I Operators; the backward shift operator; sec. 4.5.
2 / 36
Predictions In Time Series
I Are model-based - one for all data (so far).

I What if no model fits our need (and data)?
I Methods that aren’t model based in the usual sense applies.
3 / 36
Exponential smoothing
Given a forgetting factor λ ∈ ]0; 1[

N
X −1
µ̂N = c λj YN −j = c[YN + λYN −1 + · · · + λN −1 Y1 ]
j =0
The constant c is chosen so that the weights sum to one, which implies
that c = (1 − λ)/(1 − λN ).
When N is large c ≈ 1 − λ:
µ̂N = (1 − λ)[YN + λYN −1 + · · · + λN −1 Y1 ]

= (1 − λ)YN + (1 − λ)[λYN −1 + · · · + λN −1 Y1 ]
= (1 − λ)YN + λ(1 − λ)[YN −1 + · · · + λN −2 Y1 ]
= (1 − λ)YN + λµ̂N −1
4 / 36
Exponential Smoothing and prediction
Used as a prediction model:
Y
bN +`|N = µ̂N
Updating predictions with new observations:
bN +`+1|N +1 = (1 − λ)YN +1 + λY
Y bN +`|N
5 / 36
Simple Exponential Smoothing
For large N :
µ̂N +1 = (1 − λ)YN +1 + λµ̂N
Definition (Simple Exponential Smoothing):
The sequence SN defined by
SN = (1 − λ)YN + λSN −1
is called the simple exponential smoothing or first order exponential
smoothing of the time series Y.
I The smoothing constant α = 1 − λ (or the forgetting factor λ)
determines how much the latest observation influence the prediction.
6 / 36
Choice of smoothing constant α = 1 − λ
I Given a data set t = 1, . . . , N we construct

N
X N
X
S (α) = bt|t−l (α))2 =
(Yt − Y (Yt − µ̂t−l (α))2
t=1 t=1
The value minimizing S (α) is chosen.

I If the data set is large we eliminate the influence of the initial
estimate by dropping the first part of the errors when evaluating
S (α)
I Keep in mind however what the smoothing is used for, and modify
the criteria accordingly. The next slides show an example of this.
7 / 36
Example – wind speed 76 m a.g.l. at DTU Risø
I Measurements of wind speed every 10th minute
I Task: Forecast up to approximately 3 hours ahead using exponential
smoothing
25
10min. avg. of wind speed (m/s)
20
15
10
5
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
2002 2003
8 / 36
S (α) for horizons 10 and 70 minutes
10 minutes 70 minutes
50000
100000
40000 95000
SSE
SSE
90000
30000
85000
20000
80000
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
Weight on most recent observation Weight on most recent observation
I 10 minutes (1-step): Use α = 0.95 or higher

I 70 minutes (7-step): Use α ≈ 0.7
9 / 36
S (α) for horizons 130 and 190 minutes
130 minutes 190 minutes
145000 186000
184000
140000 182000
SSE
SSE
180000
135000 178000
176000
130000
174000
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
Weight on most recent observation Weight on most recent observation

10 / 36
Example of forecasts with optimal α
Measurements
10 minute forecast
190 minute forecast
14
12
10
8
6
4
12:00 14:00 16:00 18:00 20:00 22:00 0:00 2:00 4:00 6:00 8:00 10:00 12:00
Jan 11 2003 Jan 12 2003 11 / 36
From global to local trend models
Last week we worked with the global trend model
YN +j = f T (j )θ + εN +j
which was solved iteratively by
F N +1 = F N + f (−N )f T (−N )
h N +1 = L−1 h N + f (0)YN +1
θbN +1 = F −1
N +1 h N +1
Could we do that locally?
12 / 36
Local trend models
We forget old observations in an exponential manner:
θbN = arg min S (θ; N )

θ
where for 0 < λ < 1

N
X −1
S (θ; N ) = λj [YN −j − f T (−j )θ]2
0.8
j =0
Weight
0.4
0.0
0 5 10 15 20 25 30
j (age of observation)
13 / 36
WLS formulation
The criterion:
N
X −1
S (θ; N ) = λj [YN −j − f T (−j )θ]2
j =0
can be written as:

T 
Y1 − f T (N − 1)θ λN −1 Y1 − f T (N − 1)θ
  
0 ··· 0
 Y2 − f T (N − 2)θ   0 λN −2 ··· 0  Y2 − f T (N − 2)θ 
    
 ..   .. .. .. ..  .. 
 .   . . . .  . 
YN − f T (0)θ 0 0 0 1 YN − f T (0)θ
which is a WLS criterion with Σ = diag[1/λN −1 , . . . , 1/λ, 1]
14 / 36
WLS solution
−1 −1
θbN = (x T
NΣ x N )−1 x T
NΣ Y
or
θbN = F −1
N hN
N
X −1
FN = λj f (−j )f T (−j )
j =0
N
X −1
hN = λj f (−j )YN −j
j =0
15 / 36
Updating the estimates when YN +1 is available
F N +1 = F N + λN f (−N )f T (−N )
h N +1 = λL−1 h N + f (0)YN +1
θbN +1 = F −1
N +1 h N +1
As initial values we can use h 0 = 0 and F 0 = 0
For many functions λN f (−N )f T (−N ) → 0 for N → ∞ and we get the

stationary result F N +1 = F N = F . Hence:
θbN +1 = LT θbN + F −1 f (0)[YN +1 − Y

bN +1|N ]
16 / 36
Variance estimation in local trend models
Define the total memory
N −1
X 1 − λN
T = λj =
j =0
1−λ
T is a measure of how many observations estimation is essentially based

upon.
A variance estimator is therefore
σ̂ 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p

Notice that the restriction on T is a restriction on λ. How do you
interpret this?
(Note: This estimator is not in the book.)
17 / 36
Global and Local Trend Models - an Example
6 observations (N = 6):
4.0
3.5
3.0
y
2.5
2.0
1 2 3 4 5 6
18 / 36
Global Linear Trend:
T
YN +j = θ0 + θ1 j + εN +j ⇒ f (j ) = 1 j
Linear Model form:
2.0 1 −5
     
ε1
2.5 1 −4 ε2 
 
   
3.5 1 −3 θ0 + ε3  ⇔ y = x 6 θ + ε
 
 =
3.0 1 −2 θ1
ε4 
    
4.0 1 −1  ε5 
3.5 1 0 ε6
19 / 36
Global linear trend: Estimation

6 −15
F6 = xT
6 x6 =
−15 55

18.5
h6 = xT
6 y =
−40.5

0.5238 0.1429 18.5 3.905
θb6 = F −1
6 h6 = =
0.1429 0.0571 −40.5 0.329
20 / 36
Global linear trend: Prediction
Linear predictor:
b6+`|6 = f (`)T θb6 = 3.905 + 0.328`

Y
LS-estimate for σ 2 :
b2 = (y − x 6 θb6 )T (y − x 6 θb6 )/(6 − 2) = 0.4532

σ
Prediction error:
ε6 (`) = Y6+` − Y
b6+`|6
b2 1 + f T (`)F −1

Var(ε
d 6 (`)) = σ 6 f (`)
For example,
d 6 (1)) = 0.6192 .
b7|6 = 4.234 with Var(ε
Y
90% prediction interval:

q
b7|6 ± t0.05 (6 − 2)
Y Var(ε
d 6 (1)) = 4.234 ± 1.320
21 / 36
Global linear trend: Estimation - global linear trend
5.5
5.0
4.5
4.0
y
3.5
3.0
2.5
2.0
1 2 3 4 5 6 7
22 / 36
Global linear trend: Updating the parameters
I New observation: y7 = 3.5.
F7 = F 6 + f T (−6)f (−6)

6 −15 1 7 −21
= + 1 −6 = ,
−15 55 −6 −21 91
h7 = L−1 h 6 + f (0)y 7
−1
1 0 18.5 1 22
= + 3.5 = ,
1 1 −40.5 0 −59

0.4643 0.1071 22 3.896
θb7 = = .
0.1071 0.0357 −59 0.250
23 / 36
Global linear trend: Updating - global linear trend
5.5
Global trend given t=6 (+- sd)

Global trend given t=7 (+- sd)
5.0
4.5
4.0
y
3.5
3.0
2.5
2.0
1 2 3 4 5 6 7
24 / 36
Local linear trend: Estimation
I Forgetting factor λ = 0.9. Linear model unchanged.

P5 j T 4.6856 −10.284
F6 = j =0 λ f (−j )f (−j ) = −10.284 35.961

P5 j 14.902
h6 = j =0 λ f (−j )Y6−j =
−28.580

−1 0.573 0.164 14.902 3.85
θb6 = F 6 h6 = =
0.164 0.075 −28.580 0.308
σ̂ 2 = (Y − x 6 θb6 )T Σ−1 (Y − x 6 θb6 )/(T − 2) = 0.4962
b2 1 + f T (1)F −1
2
Var(ε
d 6 (1)) = σ 6 f (1) = 0.697
25 / 36
Local linear trend: Predicting for t = 7
5.5
Global trend
Local trend
5.0
4.5
4.0
y
3.5
3.0
2.5
2.0
1 2 3 4 5 6 7
26 / 36
Local linear trend: Estimating σ̂ 2
I We can use the WLS estimator for θ̂N
I but not for σ̂ 2 !?!
I Reason: Local trend models assume that t are i.i.d.
I The proposed estimator:
σˆN 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p
provides a local estimate.

I A global estimator is given as:
N
1 X (Yj − Ŷj |j −1 )2
σˆN 2 =
N − n j =n+1 1 + f T (1)F −1
j −1 f (1)
I Note that the first n predictions are ignored to stabilize the estimator
I Note that the prediction errors are normed.
27 / 36
Local linear trend: Estimation
4.0
Global linear |t=6

Global linear |t=7
local linear |t=6
3.5
3.0
y
2.5
2.0
1 2 3 4 5 6 7
Which of the (green) local trend models have the highest λ?

28 / 36
Operators; The backwards shift operator B
I An operator A is (here) a function of a time series {xt } (or a

stochastic proces {Xt }).
I Application of an operator on a time series {xt } yields a new time
series {Axt }. Likewise of a stochastic process {AXt }.
I Most important operator for us: The backwards shift operator B :
Bxt = xt−1 . Obviously, B j xt = xt−j .
I All other operators we shall consider in this lecture may be expressed
in terms of B .
29 / 36
The forward shift F and difference ∇
The forward shift operator

I Fxt = xt+1 ; F j xt = xt+j ;
I Obviously, combining a forward and backward shift yields the
identity operator 1, ie. F and B are each others inverse: B −1 = F
and F −1 = B .
The difference operator

I ∇xt = xt − xt−1 = 1xt − Bxt = (1 − B )xt .
I Thus: ∇ = 1 − B .
30 / 36
The summation S
Sxt = xt + xt−1 + xt−2 + . . .

= xt + Bxt + B 2 xt . . .
= (1 + B + B 2 + . . .)xt
I Summation, then difference (remember Sxt = xt + Sxt−1 )

∇Sxt = Sxt − Sxt−1 = xt + Sxt−1 − Sxt−1 = xt
I Difference, then summation

S ∇xt = (1 + B + B 2 . . .)xt − (1 + B + B 2 . . .)xt−1
= (1 + B + B 2 . . .)xt − (B + B 2 . . .)xt = xt
I So ∇ and S are each others inverse:

1
∇−1 = = 1 + B + B2 + . . . = S
1−B
31 / 36
Properties of B , F , ∇ and S
I The operators are all linear, ie.
H [λxt + (1 − λ)yt ] = λH [xt ] + (1 − λ)H [yt ]
I The operators may be combined into new operators:

The power series
∞
X
a(z ) = ai z i
i=0
defines a new operator from an operator H by linear combinations:

∞
X
a(H ) = ai H i
i=0
32 / 36
Examples of combined operators
I ∇−1 :
∞ ∞
1 X
−1 1 X
= zi so ∇ = = Bi = S
1−z i=0
1−B i=0
I Operator polynomial of order q:
q
X
θ(z ) = θi z i
i=0
ie. θi = 0 for i > q.
θ(B ) = (1 + θ1 B + · · · + θq B q )
where θ0 is chosen to be 1
33 / 36
The Cauchy product (discrete convolution)
The equation
{λi } ∗ {ψi } = {πi }

means that
π0 = λ0 ψ0
π1 = λ1 ψ0 + λ0 ψ1
..
.
πi = λi ψ0 + λi−1 ψ1 + . . . + λ0 ψi
..
.
34 / 36
Multiplying combined operators
Theorem 4.13
I For the operator H the following operators are given:

∞
X X∞ ∞
X
λ(H ) = λi H i , ψ(H ) = ψi H i , π(H ) = πi H i
i=0 i=0 i=0
such that λ(H )ψ(H ) = π(H ).

I Then λ, ψ, π satisfies the equation
{λi } ∗ {ψi } = {πi }.
35 / 36
Highlights
PN −1
I Local trend model: S (θ; N ) = j =0 λj [YN −j − f T (−j )θ]2
I Iterative updates
N
X −1
FN = λj f (−j )f T (−j )
j =0
N
X −1
hN = λj f (−j )YN −j
j =0
θbN = F −1
N hN
I Backwards shift operator: BXt = Xt−1

I Difference operator: ∇Xt = Xt − Xt−1
36 / 36

Lect04 Handouts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect04 Handouts

Uploaded by

Copyright:

Available Formats

Time Series Analysis

Lasse Engbo Christiansen

Department of Applied Mathematics and Computer Science

September 26, 2017

I Regression based methods, 3rd part:

I Regression and exponential smoothing (Sec. 3.4)

I Global and local trend models - an example (Sec. 3.6)

I Are model-based - one for all data (so far).

Given a forgetting factor λ ∈ ]0; 1[

µ̂N = (1 − λ)[YN + λYN −1 + · · · + λN −1 Y1 ]

Used as a prediction model:

Updating predictions with new observations:

Definition (Simple Exponential Smoothing):

The sequence SN defined by

I Given a data set t = 1, . . . , N we construct

The value minimizing S (α) is chosen.

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Weight on most recent observation Weight on most recent observation

I 10 minutes (1-step): Use α = 0.95 or higher

130 minutes 190 minutes

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Weight on most recent observation Weight on most recent observation

I 130 minutes (13-step): Use α ≈ 0.6

Last week we worked with the global trend model

which was solved iteratively by

Could we do that locally?

We forget old observations in an exponential manner:

θbN = arg min S (θ; N )

where for 0 < λ < 1

can be written as:

which is a WLS criterion with Σ = diag[1/λN −1 , . . . , 1/λ, 1]

As initial values we can use h 0 = 0 and F 0 = 0

For many functions λN f (−N )f T (−N ) → 0 for N → ∞ and we get the

θbN +1 = LT θbN + F −1 f (0)[YN +1 − Y

T is a measure of how many observations estimation is essentially based

A variance estimator is therefore

σ̂ 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p

b6+`|6 = f (`)T θb6 = 3.905 + 0.328`

b2 = (y − x 6 θb6 )T (y − x 6 θb6 )/(6 − 2) = 0.4532

90% prediction interval:

I New observation: y7 = 3.5.

Global trend given t=6 (+- sd)

I Forgetting factor λ = 0.9. Linear model unchanged.

σ̂ 2 = (Y − x 6 θb6 )T Σ−1 (Y − x 6 θb6 )/(T − 2) = 0.4962

σˆN 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p

provides a local estimate.

Global linear |t=6

Which of the (green) local trend models have the highest λ?

I An operator A is (here) a function of a time series {xt } (or a

The forward shift operator

The difference operator

Sxt = xt + xt−1 + xt−2 + . . .

I Summation, then difference (remember Sxt = xt + Sxt−1 )

I Difference, then summation

I So ∇ and S are each others inverse:

I The operators are all linear, ie.

H [λxt + (1 − λ)yt ] = λH [xt ] + (1 − λ)H [yt ]

I The operators may be combined into new operators:

defines a new operator from an operator H by linear combinations:

ie. θi = 0 for i > q.

{λi } ∗ {ψi } = {πi }

I For the operator H the following operators are given:

such that λ(H )ψ(H ) = π(H ).

I Backwards shift operator: BXt = Xt−1